// Methodology / Phase 4

Execution & Monitoring

Disciplined execution turns test runs into trustworthy data points. Sloppy execution produces numbers that look precise and mean nothing.

Run protocol

Every run follows the same checklist: environment health verified and change-frozen, caches in a documented state (cold or warmed — deliberately chosen, never accidental), monitoring dashboards live, baseline metrics captured, run ID assigned and logged. Each run's configuration — scenario, load level, build version, environment state — is recorded so any result can be tied to exactly what produced it.

Baseline first, always

Before any full-load test we establish a single-user/low-volume baseline. It catches broken scripts and environment problems cheaply, and gives the uncontended response-time floor that later results are read against: if p95 at full load is 480 ms and the baseline is 450 ms, the system barely noticed; if the baseline is 60 ms, you have queueing.

Live monitoring during runs

We watch tests in real time rather than discovering problems in post-analysis: client-side latency and error feeds, server-side saturation signals (CPU, GC time, pool utilisation, queue depths), and load-generator health. Runs that go off the rails — environment interference, data exhaustion, generator saturation — are stopped, diagnosed and rescheduled rather than allowed to produce contaminated data.

The cardinal rule: one variable at a time

When investigating a finding, each subsequent run changes exactly one thing — a pool size, an index, a heap setting. Change three things and improve 20%, and you've learned almost nothing (one change may have cost you 10% while another gained 30%). The discipline feels slow and is in fact the fastest route to a tuned system.

Phase outputs

A run log of all executions with configuration and outcome; the complete raw results archive (client-side measurements and server-side metrics, time-aligned); and flagged anomalies for Phase 5 analysis.