// Knowledge Base

Little's Law & Queueing Theory

One short equation governs every system under load. Understanding it converts performance testing from empirical poking into engineering.

The law

L = λ × W

L : items in the system (concurrency)
λ : arrival/throughput rate
W : time each item spends in the system

It holds for any stable system regardless of arrival pattern or service discipline — web servers, thread pools, supermarket queues. Three immediate practical uses:

1. Converting "concurrent users" to load

Stakeholders specify "5,000 concurrent users"; load tools need arrival rates. With Little's Law: throughput = concurrency ÷ (response time + think time). 5,000 users with 12 s think time and 0.5 s responses ≈ 400 req/s. Get the think time wrong by 2× and the load is wrong by 2× — this single conversion error invalidates more tests than any tooling fault.

2. Sizing pools and limits

A service handling 200 req/s with 50 ms mean database time needs 200 × 0.05 = 10 busy connections on average — so a pool of 15–20 covers bursts, while a pool of 100 adds risk without benefit. The same arithmetic sizes thread pools, queue bounds and worker counts. When measured concurrency exceeds the Little's Law prediction, something is holding items longer than it should — that's a finding.

3. Why latency explodes near saturation

For a queueing resource, wait time scales with ρ/(1−ρ) where ρ is utilisation. The consequences are not intuitive:

utilisation   relative wait
   50%        1.0×
   75%        3.0×
   90%        9.0×
   95%       19.0×
   98%       49.0×

From 90% to 95% utilisation, waits roughly double; from 95% to 98%, they more than double again. This is why systems feel fine at 85% and collapse at 95% — and why "the CPU still has 10% headroom" is not reassurance. It is also why we recommend operating targets around 70–75% on saturable resources and treat the latency "knee" in test results as the practical capacity limit, not the point where errors begin.

The knee in practice

In stepped load tests, plot p95 latency against throughput. The curve is flat, then bends, then climbs near-vertically. Capacity decisions should reference the bend — beyond it, each added request costs disproportionate latency, and the system has no absorption margin for variance, retries or failures.