Skip to main content

Command Palette

Search for a command to run...

Mastering GitLab Parent‑Child Pipelines: Scalable CI for Multi‑Module Maven Monorepos

Updated
13 min read
Mastering GitLab Parent‑Child Pipelines: Scalable CI for Multi‑Module Maven Monorepos
J

Advocate for better developer's productivity and experience

Documenting my learnings and sharing it

Background

Recently, I redesigned my team's pipeline running on a multi-module Maven monorepo using GitLab CI. It wasn't that the previous setup was broken, but my team faced a few persistent issues that I hoped to resolve to bring the pipeline into a more robust state. I was the one who did the initial setup as well, but that was done when I was still relatively new to GitLab CI. As I worked on this redesign, I learned quite a few lessons along the way that I hope to share with the "future me."

Objective

I set out to resolve several key pain points:

  • Double Pipelines: Preventing both branch and MR pipelines from triggering for the same commit.

  • Artifact Management: Resolving issues where build artifacts weren't reliably uploaded or shared.

  • Visibility: Fixing cases where test coverage (JaCoCo) wasn't being correctly captured.

Beyond just fixing issues, I also wanted to implement several strategic improvements:

  • Scalability: Creating a more robust setup using !reference tags and reusable job templates.

  • Security: Integrating automated SAST scans.

  • Performance: Reducing overall pipeline runtime through optimized cache strategies and surgical artifact sharing.

Pipeline design

Project Structure

This is a sample setup, but quite similar to one that I do have. In the actual setup, there's around 10 - 12 modules.

. (Root Aggregator)
├── .gitlab-ci.yml           # Root Orchestrator
├── .gitlab-ci-base.yml      # Central CI Blueprint
├── pom.xml                  # Root Aggregator POM
├── parent-pom/              # Shared configuration & dependency management
│   └── pom.xml
└── project/                 # Functional module aggregator
    ├── pom.xml
    ├── mmm-security/        # Foundation module
    │   ├── .gitlab-ci.yml   # Child Pipeline
    │   └── pom.xml
    ├── mmm-core/            # Core application module
    │   ├── .gitlab-ci.yml   # Child Pipeline
    │   └── pom.xml
    ├── mmm-search/          # Search module
    │   ├── .gitlab-ci.yml   # Child Pipeline
    │   └── pom.xml
    └── mmm-report/          # Coverage reporter
        └── pom.xml

Pipeline setup

Configure a parent-child pipeline with the following setup

  • Root Orchestrator (.gitlab-ci.yml)

  • Central Blueprint (.gitlab-ci-base.yml)

  • Independent Child Pipelines (.gitlab-ci.yml housed within each module)

Overview

Best Practices

Root Orchestrator

This is the main controller that defines the workflow rules, the stages available, and when to trigger the downstream child pipelines.

workflow:
  rules:
    - if: $CI_FULL_PIPELINE == "true"                  # Manual override
    - if: $CI_PIPELINE_SOURCE == "parent_pipeline"
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: \(CI_COMMIT_BRANCH && \)CI_OPEN_MERGE_REQUESTS # Double-pipeline suppression
      when: never
    - if: $CI_COMMIT_BRANCH                             # All branch pushes (including main)
  • CI_FULL_PIPELINE allows me to manually trigger full pipeline run

  • CI_COMMIT_BRANCH && CI_OPEN_MERGE_REQUESTS prevents Double Pipelines: a common issue where GitLab triggers both a branch pipeline and an MR pipeline for the same commit.

stages:
  - configuration
  - foundation
  - application
  - test # setup to run Gitlab built-in SAST scans
  - report

pre:
  stage: .pre
  script:
    - env

.pre is a built-in default stage that always run first. This is useful to see what are all the available environment variables available to the job for ease of troubleshooting.

While env is great for debugging, it should be used with caution (or removed before production) to avoid accidentally logging environment metadata, even though GitLab masks secrets by default.

.trigger-rules:
  rules:
    - if: "$CI_FULL_PIPELINE == 'true'"
      when: always
    # Merge Request: Accuracy check using compare_to
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        compare_to: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME
        paths:
          - $MODULE_PATH/**/*
          - parent-pom/**/*
    # Main Branch: Standard change detection
    - if: \(CI_COMMIT_BRANCH == \)CI_DEFAULT_BRANCH && $CI_COMMIT_BEFORE_SHA != "0000000000000000000000000000000000000000"
      changes:
        - $MODULE_PATH/**/*
        - parent-pom/**/*

trigger-core:
  stage: application
  extends: .trigger-rules
  needs:
    - job: trigger-security
      optional: true
    - job: deploy-parent-pom
      optional: true
  variables:
    MODULE_PATH: "project/mmm-core"
    PARENT_SOURCE: $CI_PIPELINE_SOURCE
    PARENT_BRANCH: $CI_COMMIT_BRANCH
  trigger:
    include: project/mmm-core/.gitlab-ci.yml
    strategy: mirror
  • .trigger-rules: Define reusable rules to determine when the job should be triggered. I used compare_to: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME to ensure change detection is calculated accurately against the target branch—eliminating the "rebase noise" that often plagues monorepo pipelines.

  • extends: .trigger-rules: By extending this template, it inherits the centralized change-detection logic. This ensures that only file changes under a specific module (or the parent-pom) trigger a pipeline run—preventing unnecessary resource waste.

  • needs: These are upstream jobs that this job depends on, and if there are changes to those modules, then this job has to wait until those are done before it can run. Having the optional: true is the secret sauce for monorepos: it prevents the pipeline from failing if an upstream module wasn't triggered due to lack of changes.

  • variables: This is especially important in parent-child pipelines as this allows us to pass context to the child pipeline in order to evaluate the rules correctly. When the child pipeline runs, the value for $CI_PIPELINE_SOURCE could be different from when it is run in the parent pipeline.

  • trigger:mirror: Ensures the parent pipeline reflects the downstream (child pipeline) status accurately. See docs for more detailed explanation.

This parent-child isolation means a failure in the "Search" module doesn't block the "Core" module's deployment—significantly reducing the "blast radius" of failures in a large monorepo.

Central Blueprint

This is where all the global variables, job-templates, stages, cache strategy, and reusable snippets are declared.

Parameterized Templates (spec:inputs)

I treated our CI templates like "functions" with a defined interface using spec:inputs. This allows the Root Orchestrator to pass specific configurations (like forcing a full pipeline) without relying on fragile global variables.

# In .gitlab-ci-base.yml
spec:
  inputs:
    full_pipeline:
      default: "false"

# Usage in .gitlab-ci.yml
include:
  - local: '.gitlab-ci-base.yml'
    inputs:
      full_pipeline: $CI_FULL_PIPELINE

Cache Strategy

cache: 
  key: "maven-$CI_COMMIT_REF_SLUG"
  paths:
    - .m2/repository/
  policy: pull-push

Using $CI_COMMIT_REF_SLUG is a standard strategy to define a shared branch-level cache, allowing modules to share internal dependencies. This ensures that once the first job has pulled the necessary dependencies, they are cached locally on the runner (or remotely, like in Minio in my setup). Subsequent jobs or pipelines in the same branch will reuse this cache to prevent redundant downloads, saving bandwidth, preventing race conditions, and shaving time off every run.

.deploy-snapshot-template:
  extends: .base-maven-job
  stage: release
  interruptible: false # Ensure deployment finishes once started
  cache:
    key: "maven-$CI_COMMIT_REF_SLUG"
    paths:
      - .m2/repository/
    policy: pull
  rules:
    - if: \(CI_COMMIT_BRANCH == \)CI_DEFAULT_BRANCH
  script:
    - |
      echo "Deploying SNAPSHOT for $MODULE_PATH to GitLab Maven Registry..."
      mvn \(MAVEN_CLI_OPTS deploy -pl \)MODULE_PATH -am -DskipTests

It is important to know when to override the cache policy so that it will not unnecessarily push the updated .m2/repository back to the remote cache if I am sure this job only uses the dependencies to do its job.

!reference tag

.coverage-parser:
  script:
    - |
      echo "Extracting coverage percentage for GitLab UI..."
      REPORT_PATH=\({JACOCO_XML_PATH:-"\)MODULE_PATH/target/site/jacoco/jacoco.xml"}
      echo "DEBUG: Parsing JaCoCo report at: $REPORT_PATH"
      
      if [ -f "$REPORT_PATH" ]; then
        # Use grep -o to extract ONLY the matching tag, then tail -1 to get the aggregate
        # This works correctly even if the entire XML is on a single line.
        LINE_COUNTER=\((grep -o '<counter type="LINE"[^>]*/>' "\)REPORT_PATH" | tail -1 || true)

        if [ -n "$LINE_COUNTER" ]; then
          MISSED=\((echo "\)LINE_COUNTER" | sed -n 's/.*missed="\([0-9]*\)".*/\1/p')
          COVERED=\((echo "\)LINE_COUNTER" | sed -n 's/.*covered="\([0-9]*\)".*/\1/p')
          TOTAL=$((MISSED + COVERED))

          echo "DEBUG: Found LINE counter: missed=\(MISSED, covered=\)COVERED, total=$TOTAL"

          if [ "$TOTAL" -gt 0 ]; then
            PERCENT=\((awk -v c="\)COVERED" -v t="$TOTAL" 'BEGIN {printf "%.2f", (c / t) * 100}')
            echo "Coverage: $PERCENT%"
          else
            echo "Coverage: 0.00%"
          fi
        else
          echo "DEBUG: No LINE counter line found in $REPORT_PATH"
        fi
      fi

# Template for application modules
.application-template:
  extends: .base-maven-job
  stage: build
  variables:
    JACOCO_XML_PATH: "$MODULE_PATH/target/site/jacoco/jacoco.xml"
  script:
    - mvn \(MAVEN_CLI_OPTS verify -pl \)MODULE_PATH -am
    - !reference [.coverage-parser, script]

This is similar to YAML Anchors, which allows you to reuse snippets across jobs. While that is the case, I find !reference more developer-friendly, as it allows you to select specific keys (like script) to reuse.

Note that YAML Anchors are a native YAML feature and work outside of GitLab, while the !reference tag is a GitLab-specific feature.

Variables

variables:
  # Performance: Use a project-relative path for caching
  MAVEN_REPO_LOCAL: ".m2/repository"
  SONAR_USER_HOME: ".sonar"  # Defines the location of the analysis task cache
  
  # JVM Tuning for CI (Merged with user preferences)
  MAVEN_OPTS: >-
    -Dhttps.protocols=TLSv1.2
    -Dorg.slf4j.simpleLogger.showDateTime=true
    -Djava.awt.headless=true
    -Dfile.encoding=UTF-8
    -Xmx2048m
    -XX:+TieredCompilation
    -XX:TieredStopAtLevel=1

  # Maven CLI optimization (Merged with user preferences)
  MAVEN_CLI_OPTS: >-
    --batch-mode
    --errors
    --fail-at-end
    --show-version
    --no-transfer-progress
    --threads 1C
    -DinstallAtEnd=true
    -DdeployAtEnd=true
    -s .mvn/settings-ci.xml

  GIT_DEPTH: "0"  # Tells git to fetch all the branches of the project, required by the analysis task

  # GitLab FastZip and Compression
  FF_USE_FASTZIP: "true"
  ARTIFACT_COMPRESSION_LEVEL: "fast"
  CACHE_COMPRESSION_LEVEL: "fast"

I want to draw attention to MAVEN_CLI_OPTS where -s .mvn/settings-ci.xml is defined. This is the unsung hero of our pipeline: it maps ${env.CI_JOB_TOKEN} to our GitLab Maven repository, allowing seamless, credential-free publishing and dependency resolution within the CI environment.

I also enabled FF_USE_FASTZIP and fast compression. In a monorepo with 10+ modules, the time saved zipping and unzipping artifacts and cache across dozens of jobs adds up to several minutes per pipeline.

Artifacts

.application-template:
  extends: .base-maven-job
  stage: build
  script:
    - mvn \(MAVEN_CLI_OPTS verify -pl \)MODULE_PATH -am
  artifacts:
    when: always
    paths:
      - "$MODULE_PATH/target/"
    exclude:
      - "$MODULE_PATH/target/*.jar"
    reports:
      junit:
        - "$MODULE_PATH/target/surefire-reports/TEST-*.xml"
        - "$MODULE_PATH/target/failsafe-reports/TEST-*.xml"
      coverage_report:
        coverage_format: jacoco
        path: "$MODULE_PATH/target/site/jacoco/jacoco.xml"
    expire_in: 1 hour
  coverage: '/Coverage: (\d+(?:\.\d+)?)%/'
  • when: Set to always so that artifacts are uploaded even on failure, allowing debugging via HTML reports.

  • paths, exclude: Target specific files for upload. Since fat-jar files can be massive, excluding them prevents 413 Request Entity Too Large errors.

  • reports: Ensures all relevant reports are submitted to provide coverage data in the GitLab Pipeline UI and the MR widget.

Sharing artifacts across jobs ensures that build products (like /classes) can be reused in subsequent steps, such as jib:build. This significantly reduces container build times by skipping redundant source code re-compilation.

Artifacts across parent-child Pipeline

One of the biggest hurdles in Parent-Child pipelines is that child pipelines run in isolated workspaces. Standard needs: artifacts: true cannot pull files from a triggered child pipeline back into the root orchestrator.

To solve this for our aggregated reports, I implemented an ID-based API Collection pattern:

  1. Bridge API: Query the parent pipeline's bridges to find the downstream_pipeline.id.

  2. Jobs API: Query that child pipeline to find the specific build job ID.

  3. Artifact Download: Use a Project Access Token to download the artifact zip directly via the API.

Why a Project Access Token (PAT)? Because GitLab's standard $CI_JOB_TOKEN is restricted for security and often cannot cross the pipeline boundary. A PAT with read_api scope ensures our aggregator has the necessary authority.

# Simplified aggregation logic in root orchestrator
for module in mmm-core mmm-search; do
  # 1. Resolve Child Pipeline ID
  CHILD_PIPELINE_ID=\((curl --silent --header "PRIVATE-TOKEN: \){PAT_TOKEN}" \
    "\({CI_API_V4_URL}/projects/\){CI_PROJECT_ID}/pipelines/${CI_PIPELINE_ID}/bridges" \
    | jq -r ".[] | select(.name==\"trigger-${module#mmm-}\") | .downstream_pipeline.id")

  # 2. Resolve specific Job ID
  CHILD_JOB_ID=\((curl --silent --header "PRIVATE-TOKEN: \){PAT_TOKEN}" \
    "\({CI_API_V4_URL}/projects/\){CHILD_PIPELINE_ID}/jobs" \
    | jq -r ".[] | select(.name==\"build-${module#mmm-}\") | .id")

  # 3. Securely Download
  curl --location --header "PRIVATE-TOKEN: ${PAT_TOKEN}" \
    "\({CI_API_V4_URL}/projects/\){CI_PROJECT_ID}/jobs/${CHILD_JOB_ID}/artifacts" \
    --output "${module}.zip"
done

Independent Child Pipelines

With .gitlab-ci-base.yml providing the blueprint for all child modules, the child pipelines are simple to set up and configure.

# project/mmm-core/.gitlab-ci.yml
include:
  - local: '.gitlab-ci-base.yml'

variables:
  MODULE_PATH: "project/mmm-core"

build-core:
  extends: .application-template

deploy-core-snapshot:
  extends: .deploy-snapshot-template
  needs: ["build-core"]

deploy-core-image:
  extends: .deploy-image-template
  needs: ["build-core"]

deploy-manifests-prod:
  extends: .deploy-cluster-template
  needs: ["deploy-core-image"]
  resource_group: core-deploy-prod
  environment:
    name: prod
  variables:
    OVERLAY_PATH: "project/mmm-core/k8s/overlays/prod"
    DEPLOYMENT_NAME: "mmm-core"

Deployment Safety (resource_group)

In a monorepo, multiple modules often deploy to the same namespace. If two pipelines trigger simultaneously, they might both try to run kubectl apply for the same module, leading to race conditions.

To prevent this, I implemented module-specific Resource Groups as a CI-level mutex:

deploy-manifests-prod:
  extends: .deploy-cluster-template
  resource_group: core-deploy-prod
  environment:
    name: prod

By using the naming convention [module]-deploy-[env]:

  • Serialization: Only one pipeline can deploy mmm-core to prod at a time.

  • Concurrency: mmm-core and mmm-search can still deploy to prod simultaneously because they use different locks.

Shift Left Security

GitLab provides comprehensive built-in security scanning templates that you can adopt quickly and easily.

Shift left security means building security testing, compliance checks, and secure coding practices into the earliest phases of the software development life cycle (SDLC).

Source: https://about.gitlab.com/topics/devsecops/shift-left-security/

# .gitlab-ci.yml
include:
  - local: '.gitlab-ci-base.yml'
  - template: Jobs/SAST.gitlab-ci.yml
  - template: Jobs/Secret-Detection.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml

stages:
  - ...
  - test # setup to run GitLab built-in scans
  - report

That's it! You could override the individual job if you want to, but it's optional and some could be overwritten using global variable.

<properties>
	<sonar.version>5.5.0.6356</sonar.version>
	<sonar.projectKey>bwgjoseph:${project.artifactId}</sonar.projectKey>
	<sonar.projectName>bwgjoseph:${project.artifactId}</sonar.projectName>
	<!-- Not necessary for self-hosted -->
	<sonar.organization>bwgjoseph</sonar.organization>
	<sonar.coverage.jacoco.xmlReportPaths>${project.build.directory}/site/jacoco/jacoco.xml</sonar.coverage.jacoco.xmlReportPaths>

	<!-- SonarQube Scanner Properties -->
	<sonar.scanner.skipSystemTruststore>true</sonar.scanner.skipSystemTruststore>
	<sonar.scanner.skipJreProvisioning>true</sonar.scanner.skipJreProvisioning>
	<!-- Java 25 support: Allow Sonar plugins (like IaC) to access restricted native methods -->
	<sonar.scanner.javaOpts>--enable-native-access=ALL-UNNAMED</sonar.scanner.javaOpts>
	<!-- Use CI project dir for scanner home if available, fallback to local .sonar -->
	<sonar.userHome>${env.CI_PROJECT_DIR}/.sonar</sonar.userHome>
</properties>

To ensure independent Quality Gates in SonarQube, I enforced unique sonar.projectKey values for each module (e.g., bwgjoseph:${project.artifactId}). Without this, every module analysis would overwrite the previous one in the Sonar dashboard!

The Extras - Expert Mode

To take a pipeline from "Functional" to "Enterprise-Ready," I added these high-impact features:

  • The Engineering Portal: I used GitLab Pages to host a unified site for aggregated JaCoCo coverage at /coverage and Maven Documentation at /site. I added a "Gateway Index" in CI that dynamically builds a landing page to navigate between them.

  • Human-Friendly Triggers: I leveraged GitLab's variables:options to create a dropdown menu in the UI. Now, anyone on the team can manually trigger a full build without needing to remember CLI flags.

  • The Ghost Aggregator: I introduced a "code-free" module (mmm-report) purely for aggregation. It provides a clean target for our ID-based artifact collection and prevents our functional modules from being cluttered with aggregation logic.

  • Inner-Loop Development: Validate your complex parent-child YAMLs locally using gitlab-ci-local. This shaves hours off the debugging cycle by allowing you to run jobs directly on your machine.

Conclusion

At the end of the day, a CI/CD pipeline is only as good as the developer experience it provides. In this redesign, our goal was to make the monorepo feel "small" again—ensuring that a developer working on a single module isn't burdened by the weight of the entire project.

By automating the complex bits—like ID-based artifact collection and cross-pipeline reporting—and providing human-friendly tools like UI-driven triggers and local validation, I’ve built a system that stays out of the way while providing a safety net of "Shift Left" security.

Scaling a monorepo parent-child pipeline isn't just about build commands; it's about orchestration, visibility, and developer experience. Hopefully, these patterns help you (and the future me!) build better pipelines.

Source Code

As usual, the full source code is available on GitHub