Scalability Testing
"Just add more servers" is a hypothesis, not a strategy. Scalability testing measures how much capacity each added resource actually buys you — and where adding more stops helping.
What it is
Scalability testing runs the same workload across multiple resource configurations — varying instance counts, instance sizes, or both — and measures how maximum sustainable throughput changes. The output is a scaling curve: capacity as a function of resources.
Linear in theory, sublinear in practice
Perfect scaling (2× nodes → 2× capacity) is rare. Shared resources — databases, caches, message brokers, distributed locks — serialise some fraction of every request, and that serial fraction caps total speed-up (Amdahl's law). Coordination costs grow with node count. A typical finding: the web tier scales near-linearly to 12 nodes, but the primary database saturates at the equivalent of 7, making further web-tier spend pure waste.
Scaling curve (example, measured):
2 nodes : 9,800 tps (1.00× per node baseline)
4 nodes : 19,100 tps (0.97×)
8 nodes : 34,400 tps (0.88×)
16 nodes : 41,200 tps (0.53×) ← DB write saturation
Conclusion: scale-out effective to ~10 nodes;
beyond that, invest in the database tier.
Auto-scaling validation
For cloud platforms we additionally test the dynamics of scaling: are the scaling metrics and thresholds right? How long from threshold breach to serving capacity? Does scale-in behave safely under sustained load? Do scaling events themselves cause latency spikes (cold starts, cache repopulation, connection storms against the database)?
What you get
A measured scaling curve with per-node efficiency; identification of the first non-scaling bottleneck; cost-per-unit-of-capacity at each configuration so finance and engineering can agree on a target; and tuned auto-scaling policies with evidence behind every threshold.