// Knowledge Base

Performance Testing in CI/CD

Annual big-bang performance tests find regressions months after the commit that caused them. Pipeline-integrated testing finds them at merge time, while the diff is one screen of code.

The test pyramid, for performance

Tier	Trigger	Duration	Scope
Smoke	Every merge	2–5 min	Critical endpoints, modest fixed load, threshold-gated
Baseline	Nightly	30–60 min	Core workload model at steady load, trend-tracked
Full	Per release / scheduled	Hours	Complete workload model, stress & soak variants

The smoke tier is the highest-value, lowest-cost addition most teams can make: a 3-minute k6 job with thresholds (p(95)<500) that fails the build catches the worst regressions for nearly free.

Budgets as code

Performance budgets live in the repository next to the code they constrain — reviewed, versioned and enforced like any other test. A budget change is a visible, deliberate decision in a pull request, not a silent drift.

Making pipeline results trustworthy

Stable environments: noisy shared environments produce flaky gates that teams learn to ignore. Dedicated (if modest) performance environments, or at minimum consistent container resources, are a precondition. Relative comparison: nightly tiers compare against a rolling baseline of recent runs rather than absolute targets, flagging statistically significant drift — this tolerates environment differences while still catching regressions. Trend dashboards: per-build latency and throughput trends make slow degradation visible across weeks, the kind no single gate catches.

What pipelines can't replace

CI-scale tests run at reduced load on reduced environments: they catch regressions superbly and predict absolute capacity poorly. Go-live decisions, peak-event readiness and scaling validation still require full-scale engagements against production-parity environments — the two practices complement rather than substitute. We help teams build both.