// Knowledge Base

Performance Testing in CI/CD

Annual big-bang performance tests find regressions months after the commit that caused them. Pipeline-integrated testing finds them at merge time, while the diff is one screen of code.

The test pyramid, for performance

TierTriggerDurationScope
SmokeEvery merge2–5 minCritical endpoints, modest fixed load, threshold-gated
BaselineNightly30–60 minCore workload model at steady load, trend-tracked
FullPer release / scheduledHoursComplete workload model, stress & soak variants

The smoke tier is the highest-value, lowest-cost addition most teams can make: a 3-minute k6 job with thresholds (p(95)<500) that fails the build catches the worst regressions for nearly free.

Budgets as code

Performance budgets live in the repository next to the code they constrain — reviewed, versioned and enforced like any other test. A budget change is a visible, deliberate decision in a pull request, not a silent drift.

Making pipeline results trustworthy

Stable environments: noisy shared environments produce flaky gates that teams learn to ignore. Dedicated (if modest) performance environments, or at minimum consistent container resources, are a precondition. Relative comparison: nightly tiers compare against a rolling baseline of recent runs rather than absolute targets, flagging statistically significant drift — this tolerates environment differences while still catching regressions. Trend dashboards: per-build latency and throughput trends make slow degradation visible across weeks, the kind no single gate catches.

What pipelines can't replace

CI-scale tests run at reduced load on reduced environments: they catch regressions superbly and predict absolute capacity poorly. Go-live decisions, peak-event readiness and scaling validation still require full-scale engagements against production-parity environments — the two practices complement rather than substitute. We help teams build both.