
Most engineering teams will confidently tell you they practice CI/CD. They have pipelines. They have green builds. They have deployment workflows. What many of them have actually built, however, is the visual language of CI/CD without the underlying discipline — an automated costume worn over what is, in practice, manual batch integration and fragile, human-gated delivery. The pipelines exist, but they have been quietly hollowed out by a collection of structural habits that undermine their core purpose: providing fast, trustworthy, automated confidence that software is ready to ship.
This problem is more widespread than it might seem. The DORA State of DevOps Report 2023 continues to show a significant gap between elite-performing engineering organizations and the rest of the field. Elite teams deploy multiple times per day, recover from incidents in under an hour, and maintain change failure rates below five percent. Low-performing teams deploy monthly or less, take days or weeks to recover, and experience failure rates that make releases a dreaded event rather than a routine one. The technical practices that separate these two groups are not mysterious — they are well-documented. What is missing is recognition of the anti-patterns that quietly erode those practices from within, even when the pipelines are running and the dashboards are green.
This article names the some common structural anti-patterns that corrupt CI/CD pipelines — turning them into slow, untrustworthy, or purely cosmetic processes — and provides evidence-backed remedies for each. The target reader is a mid-to-senior engineer or engineering lead who has inherited or built a pipeline they suspect is lying to them. The goal is not to prescribe a specific toolchain but to identify the underlying dysfunctions that cause otherwise well-intentioned pipelines to fail at their primary job.
Before naming specific anti-patterns, it is worth stepping back and questioning the premise that most teams are doing CI/CD at all. In Martin Fowler's foundational 2006 article "Continuous Integration," the definition is precise: every developer integrates their work into the mainline at least once per day, and each integration is verified by an automated build that completes in under ten minutes. That is the bar. Not weekly merges. Not nightly builds. Not "we have a pipeline that runs when you open a pull request." At least daily integration into a shared mainline, with a fast automated verification.
By that definition, a large share of teams practicing what they call CI are actually practicing something closer to scheduled batch integration. Builds triggered only on pull request merge, or nightly on a cron schedule, are accumulating integration debt between runs. The longer two developers work in isolation, the more divergent their changes become, and the more expensive the eventual merge. Think of it like financial debt: the interest compounds. A merge that would have taken ten minutes if done yesterday might take three hours today, and might require architectural renegotiation if left for a week. The batch integration model optimizes for the comfort of isolation at the cost of the compounding pain of late integration — a classic local-optimization trap.
The DORA 2023 research reinforces this with data. Deployment frequency is the leading predictor of organizational performance in software delivery, and elite teams deploy on-demand, multiple times per day. Nicole Forsgren, Jez Humble, and Gene Kim demonstrated in Accelerate (2018) that high integration frequency is causally linked to lower change failure rates and faster mean time to recovery — not merely correlated with them. The causal direction matters here: it is not that high-performing teams happen to integrate frequently because they have low failure rates. The frequent integration is itself a mechanism that produces lower failure rates by surfacing conflicts and regressions when they are still small and cheap to fix. This reframing sets the stage for everything that follows. The anti-patterns below are all, in some form, mechanisms by which teams accidentally reintroduce batch integration behavior into pipelines that look continuous on the surface.
The fix: Establish a team norm that CI means every developer commits to the main branch — or a short-lived branch with a lifetime under one day — with a passing build before end of each working day. Make this explicit, not aspirational.
The monolithic pipeline is one of the most common structural failures in CI/CD, and one of the most damaging. It takes the form of a single, sequential pipeline where every step — compilation, unit tests, integration tests, end-to-end tests, security scans, deployment — runs one after another in a single queue. A failure at any point blocks the entire pipeline for every engineer on the team. One flaky integration test, one slow end-to-end suite, one environment hiccup in a downstream service, and fifty people are blocked from shipping anything.
Jez Humble and Dave Farley described the antidote to this in Continuous Delivery (2010) through the concept of the Deployment Pipeline: a series of stages, each designed to provide fast feedback at progressively higher cost. The commit stage — compile, lint, unit tests — should complete in under five minutes and give developers near-immediate signal. The acceptance stage — integration tests, contract tests — runs after the commit stage passes. Performance and security stages run independently, in parallel where possible, triggered by the same artifact. Each stage provides a quality gate appropriate to its cost, and expensive stages do not block cheap ones unnecessarily. The monolithic pipeline collapses all of this into a single queue, destroying the early-feedback principle that makes CI valuable in the first place.
Gene Kim, Jez Humble, Patrick Debois, and John Willis identify pipeline blockage in The DevOps Handbook (2016) as a constraint on flow — a concept borrowed from the Theory of Constraints and the First Way of DevOps. When one failure can block the entire value stream, the pipeline becomes a bottleneck that amplifies rather than absorbs disruption. The DORA 2023 data shows that teams with modular, parallelized pipelines achieve lead times for changes under one hour; monolithic pipelines are a structural predictor of membership in the lower-performing cohorts. This is also an application of Amdahl's Law to software delivery: the sequential portion of a pipeline bounds the maximum throughput improvement achievable through other optimizations.
The fix: Split the pipeline into a fast commit stage covering unit tests, linting, and compilation — targeting under five minutes — and a separate, slower acceptance stage covering integration and end-to-end tests. Run test suites in parallel across shards where possible. Fail fast on the cheapest and most deterministic signals first, and reserve expensive stages for changes that have already passed the cheaper gates.
Perhaps the most insidious anti-pattern is the one where the pipeline is always green and the team has quietly learned not to trust it. Tests pass but do not exercise meaningful behavior. Coverage metrics are reported but gamed by tests that assert nothing useful. Flaky tests are skipped, suppressed, or silently excluded from the required check suite. The build passes every time, and the artifact it produces is not reliably deployable. The pipeline has become a confidence theater — the appearance of verification without its substance.
Martin Fowler's CI principles are unambiguous on this point: a passing build must represent a genuine deployable candidate. The purpose of the automated build is not to confirm that the code compiles. It is to confirm that the software behaves correctly. Tests that exercise code paths without verifying behavior are not tests in any meaningful sense — they are structured noise that consumes CI time while providing no signal. Continuous Delivery (Humble and Farley, 2010) draws a sharp distinction between confidence-building stages, which actually gate deployment on verified behavior, and cosmetic stages, which pass regardless of whether the software works. Pipelines with suppressed or permanently skipped tests have drifted from the former into the latter, eroding the team's ability to trust the feedback signal that CI is supposed to provide.
Google's Site Reliability Engineering book defines toil as manual, repetitive work with no lasting value. Treating flaky tests as a permanent fact of life — re-running the pipeline and hoping for green — is exactly this kind of toil. It is work that consumes time, degrades confidence, and produces no lasting improvement. Accelerate (Forsgren et al., 2018) identifies test reliability as a key differentiator between elite and low-performing teams: high performers maintain test suites they trust enough to deploy against; low performers maintain test suites they work around. The gap is not primarily technical. It is a matter of organizational discipline around the treatment of test failures.
The fix: Treat a flaky test as a P1 incident, not a known issue. Enforce a policy that any test skipped or suppressed in CI must have a tracked defect with an assigned owner and a fix deadline. Separate fast, deterministic unit tests from slower integration tests to protect the commit-stage signal from the noise of environmental instability. If a test cannot be made reliable, quarantine it in a separate suite that runs without blocking the build, log its failures automatically, and hold the owning team accountable for resolution within a defined sprint.
A particularly damaging pattern that has become deeply embedded in many Git-centric workflows is the use of long-lived environment branches — typically dev, staging, main, and release — as a proxy for environment promotion. The model works like this: code merged to dev triggers a deployment to the development environment. Code merged to staging triggers a deployment to staging. Code merged to main or release triggers a deployment to production. Each environment branch triggers its own build from its own source tree.
The problem with this model is that it violates what Humble and Farley call one of the most fundamental principles of Continuous Delivery: build your binaries only once. The artifact that passes the commit stage should be the exact artifact promoted through acceptance, performance, and production — with environment-specific differences handled through configuration injection at runtime, not through rebuilding from a different branch. When each environment triggers a separate build, the artifact deployed to staging was compiled from different source code than the one deployed to production. That means the artifact that passed all your tests is not the one being shipped to users. The entire testing pipeline becomes a proxy for a confidence you never actually established.
This is not a theoretical concern. Rebuilding per-environment introduces real, silent risks: differences in dependency versions resolved at build time, differences in compiler flags or build tool behavior, differences in environment variables injected during the build rather than at runtime. These differences make staging an unreliable proxy for production, and they are notoriously difficult to diagnose because the failure mode is not a build error — it is a subtle behavioral difference that only appears in production. Martin Fowler's Feature Toggles pattern (2016) provides the practical mechanism that makes build-once-deploy-many achievable: incomplete or environment-specific features are hidden behind flags rather than held in a branch, allowing a single artifact to traverse all environments safely. The DevOps Handbook (Kim et al., 2016) identifies environment-specific builds as a leading cause of the "works on my machine" failure mode that is expensive to diagnose after the fact.
The fix: Version and store build artifacts — Docker images, JAR files, compiled binaries — in an artifact registry immediately after the commit stage passes. Promote the same immutable artifact through each environment by injecting environment-specific configuration at runtime through environment variables, secrets managers, or config maps. Use feature flags to control feature availability independently of the deployment event. The artifact should never be rebuilt for promotion — only its runtime configuration should change.
CI/CD pipelines are automated systems, but their value depends entirely on the organizational controls that ensure developers cannot bypass them. A repository without branch protection rules is a repository where the pipeline's guarantees are optional. Any developer can push directly to main, bypassing tests entirely. Any developer can force-push and rewrite history, making the audit trail unreliable. Any developer can merge a pull request without review, circumventing the code quality gate. The pipeline runs — it just runs after the damage is done, or not at all.
A protected branch policy on a production or shared integration branch should enforce a specific set of conditions: required status checks (all CI pipeline gates must pass before merge is permitted), required pull request reviews from at least one or two team members, dismissal of stale reviews when new commits are pushed (preventing approvals from becoming stale after changes are made), code owner review requirements for changes to critical paths, and a requirement that the branch be up-to-date with the target before merging to prevent out-of-date merges that have already been validated in an outdated state. These rules are not bureaucratic overhead — they are the organizational mechanism that ensures the pipeline's guarantees are structural, not voluntary.
Humble and Farley frame this in Continuous Delivery as a question of organizational controls: automation alone is not sufficient to produce reliable software delivery; the human processes around the automation must be engineered with equal care. DORA 2023 data reinforces this directly, showing that elite teams combine automation with organizational controls. Branch protection rules are the enforcement mechanism for trunk-based development discipline and code review rigor — without them, both practices rely on individual developer judgment under deadline pressure, which is a fragile foundation.
The fix: Enforce branch protection rules on all production and shared integration branches. Require all status checks to pass. Require pull request reviews from a minimum of one to two reviewers. Enable stale review dismissal so that new commits invalidate previous approvals. Treat branch protection as non-negotiable repository configuration, not as a preference or a trust-based decision.
Long-lived branches are integration debt in the most literal sense. Every day a branch remains unmerged, the trunk continues to evolve. Conflicts accumulate. The branch's assumptions about the state of shared code become progressively less accurate. Merging a branch that has been open for two weeks is not twice as hard as merging one open for one week — it is often an order of magnitude harder, because the conflicts are not just additive. They interact with each other in ways that require understanding the intent behind changes made by other developers in the interim.
Paul Hammant's canonical reference on trunk-based development at trunkbaseddevelopment.com defines short-lived feature branches as branches with a lifetime of less than one day before merging to trunk. Branches lasting longer than a few days represent a warning sign; branches lasting weeks are the integration equivalent of a big-batch waterfall release. Accelerate (Forsgren et al., 2018) identifies trunk-based development — defined specifically as fewer than three active branches with branch lifetimes under a day — as one of the strongest engineering predictors of delivery performance among elite teams. This is not a stylistic preference. It is a structural property of high-performing delivery systems.
Beyond the merge complexity, repositories that accumulate stale branches develop a discoverability problem. Which branches are active? Which were merged months ago and never cleaned up? Which represent abandoned experiments? Manual cleanup is unreliable because developers, reasonably, err on the side of caution when deciding whether a branch is safe to delete. The result is a repository cluttered with dozens or hundreds of branches, none of which can be quickly interpreted without context. This cognitive overhead compounds over time, and it is entirely avoidable with automation.
The fix: Enforce automated branch deletion on merge — this is supported natively in GitHub, GitLab, and Bitbucket and takes one configuration toggle to enable. Implement a scheduled pipeline job or repository maintenance bot that flags branches with no activity in fourteen or more days for review. Adopt branch naming conventions that include ticket IDs, which enables automated lifecycle management tied to issue state. Establish a team norm of branch lifetimes under twenty-four hours, and treat any branch older than a few days as a conversation to have in the next standup.
Somewhere between the acceptance stage and production, many pipelines include a step where a human being must click "Approve" before the deployment proceeds. The intention is understandable: production deployments are consequential, and having a human in the loop feels like a responsible safeguard. In practice, these gates frequently become what Humble and Farley call pure latency — delays that consume lead time without contributing any additional confidence.
The dynamic plays out in a familiar way. The pipeline has run. All tests pass. The deployment to staging was successful three days ago. The engineer who is supposed to approve the production deployment is in meetings. Or they trust the pipeline and approve without reviewing anything substantive because they do not have the context to make a meaningful judgment. Or they rubber-stamp the approval to unblock the team, which they feel pressure to do quickly. The gate, in each of these scenarios, adds hours or days of queue time while providing no marginal safety improvement over what the automated pipeline already established.
Humble and Farley draw an important distinction between Continuous Delivery — every commit is releasable, and the decision to deploy to production is a business decision made intentionally — and Continuous Deployment, where every passing commit is released automatically. In both models, manual approval gates that are not tied to a specific compliance requirement or a genuine risk review are an anti-pattern. They do not improve quality. They destroy lead time. Google's SRE book describes the ideal of "pushing a button" deployments as achievable only when the pipeline provides automated confidence sufficient that human judgment adds no marginal value. DORA 2023 data shows elite teams achieve change lead times under one hour — manual approval queues are structurally incompatible with that benchmark. The DevOps Handbook (Kim et al., 2016) distinguishes between peer-reviewed changes — pull request review before commit — and post-commit manual gates: the former improves quality; the latter delays delivery without improving it.
The fix: Replace manual gates with automated quality signals: passing test suites, code coverage thresholds, security scan results, and performance benchmarks. If a human must approve a deployment, give them a dashboard of those signals, not a queue notification. Reserve human approval for genuine compliance checkpoints — SOC 2 change management controls, regulated industry sign-offs, or explicitly risk-classified changes — not as a general safety buffer applied to every release. The goal is to make automated signals so trustworthy that the team does not feel they need a human check as a fallback.
One anti-pattern deserves special attention because it is so commonly rationalized as acceptable. Flaky tests — tests that pass and fail non-deterministically on the same code — are not a minor inconvenience. They are a structural failure of the pipeline's feedback mechanism, and treating them as an ambient condition rather than a defect is one of the fastest ways to erode a team's trust in CI.
The "broken windows" principle from software craftsmanship applies directly here. One tolerated flaky test signals to the team that the pipeline is not a reliable environment. That signal, once internalized, encourages further degradation: tests suppressed rather than fixed, re-run policies that mask systemic failures, and eventually a culture where "re-run the build and see if it passes" is standard operating procedure. Google's Testing Blog published research in 2016 finding that even a one-percent flakiness rate across a large test suite produces meaningful developer friction and erodes trust in the CI signal. Google treats test flakiness as a first-class reliability problem — not because it is philosophically tidy, but because the operational impact of flaky tests compounds at scale. Continuous Delivery (Humble and Farley, 2010) is explicit on this point: automated tests must be deterministic. The same code must produce the same result on every run. A nondeterministic test provides no useful signal about software correctness and should be quarantined immediately.
The DORA research links test reliability to deployment confidence in a way that has direct business consequences. Teams that do not trust their tests deploy less frequently, with more ceremony, and with more anxiety than teams that do. That anxiety is not irrational — it is a rational response to operating under an unreliable feedback system. Improving test reliability is therefore not just a quality-of-life improvement for developers; it is a prerequisite for achieving the deployment frequency and lead time benchmarks that characterize high-performing delivery organizations.
The fix: Implement a flaky test quarantine policy. Any test that fails non-deterministically is moved to a quarantine suite that still runs — and whose failures are still logged and reported — but does not block the main build. The test is automatically assigned to an owner with a fix deadline of no more than one sprint. Flakiness rate should be tracked as a pipeline health metric alongside build duration, test coverage, and deployment frequency. Never silently skip or suppress a flaky test. Suppression hides the symptom; quarantine exposes it and creates accountability for resolution.
Each of the anti-patterns described in this article is, at its root, a degradation of the same thing: the pipeline's value as a trust instrument. The primary output of a well-functioning CI/CD pipeline is not a Docker image or a JAR file. It is a confidence signal — a machine-verified assertion that the software in this artifact, built from this commit, behaves correctly and is safe to deploy. Every anti-pattern in this article degrades that signal: by making it slow, by making it unreliable, by making it bypassable, or by making it technically green while carrying no real confidence about correctness.
The DORA 2023 elite performer profile gives a concrete benchmark: deploy on-demand, multiple times per day; change lead time under one hour; change failure rate under five percent; mean time to restore under one hour. These numbers are not aspirational fantasy. They are the documented operating characteristics of real engineering organizations. What separates elite teams from the rest is not primarily talent or resources — Accelerate (Forsgren et al., 2018) demonstrates that the differentiator is a specific set of technical practices, capabilities that can be built incrementally by any team willing to examine and correct the structural habits that are working against them.
The practical benchmark for a healthy pipeline looks like this: the commit stage runs in under ten minutes and produces a single versioned, immutable artifact stored in a registry. That artifact is promoted through environments by injecting runtime configuration, never by rebuilding. Acceptance and integration tests run in parallel in a separate stage, against the same artifact, in isolated environments that are themselves version-controlled. Branch protection rules enforce that no code reaches the main branch without passing all required checks and receiving a code review. Feature flags enable any commit to traverse all environments safely, regardless of whether the underlying feature is ready for users. And flakiness, when it appears, is treated as a defect with an owner and a deadline — not as a fact of life.
The path to this state does not require a complete pipeline rewrite or a months-long initiative. Most teams can begin by fixing the cheapest and highest-leverage item on this list: enabling automated branch deletion on merge, enforcing branch protection rules, or splitting a monolithic pipeline into a fast commit stage and a slower acceptance stage. The compounding returns on each improvement make it worthwhile to move incrementally. The goal is a pipeline that the team trusts enough to deploy from on a Friday afternoon without fear — because the confidence signal it produces has been earned, not assumed.
Books
Continuous Delivery by Jez Humble and Dave Farley (Addison-Wesley, 2010) https://continuousdelivery.com/
Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, Jez Humble, and Gene Kim (IT Revolution, 2018). https://itrevolution.com/accelerate-book/
The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis (IT Revolution, 2016). https://itrevolution.com/the-devops-handbook/
Site Reliability Engineering by Google (O'Reilly, 2016). https://sre.google/sre-book/table-of-contents/
Research and Reports
Articles and References
"Continuous Integration" by Martin Fowler (martinfowler.com, 2006). https://martinfowler.com/articles/continuousIntegration.html
"Feature Toggles (aka Feature Flags)" by Martin Fowler (martinfowler.com, 2016). https://martinfowler.com/articles/feature-toggles.html
"BranchByAbstraction" by Martin Fowler (martinfowler.com, 2014). https://martinfowler.com/bliki/BranchByAbstraction.html
Trunk Based Development by Paul Hammant (trunkbaseddevelopment.com). https://trunkbaseddevelopment.com/
"Flaky Tests at Google and How We Mitigate Them" — Google Testing Blog (2016). https://testing.googleblog.com/2016/05/flaky-tests-at-google-and-how-we.html
DORA Research Program — dora.dev. https://dora.dev/