little's law.md

created

daily

1.0 Summary

Little’s Law is part of [[queueing theory]]. It explains mean concurrency, throughput, and latency for any system, as well as all of its subsystems. It states that if we have any two variables, we can compute the third. The only requirement is that the system itself is stable. Little's law can be summarized as:

[!Summary] Concurrency ($L$) is throughput ($\lambda$) multiplied by latency ($W$).

It contains the following three variables:

Concurrency ($L$): mean number of units in the system (e.g. 10 active threads)
Throughput ($\lambda$): mean arrival time between two consecutive units (e.g. 100 req/sec)
Latency ($W$): mean time spent by a unit in the system (e.g. 0.1s or 100ms)

Concurrency can also be thought of as load. Little's Law states that as long as we have any two units, we can compute the third:

Concurrency = Latency * Throughput (0.1 * 100 = 10)
Throughput = Concurrency / Latency (10 / 0.1 = 100)
Latency = Concurrency / Throughput (10 / 100 = 0.1)

What this teaches us is that if we want to improve:

Throughput (higher is better): we want to increase concurrency and lower latency.
Latency (lower is better): we want to lower concurrency and increase throughput.

Concurrency is not something that can be "improved", instead it's often the only immediate variable we have to affect both throughout and latency. Little's law shows us that that modifying concurrency comes with a fundamental tradeoff between throughput and latency:

Increasing concurrency ↑: improves throughput ↑ but makes latency ↓ worse.
Reducing concurrency ↓: improves latency ↑ but makes throughput ↓ worse.

Notably we can only control concurrency if we artificially limit it. Because if we don't, concurrency will be limited for us by either latency or throughput. And that may lead to unexpected latencies or throughput issues, which are causes of system instability. It's better to fail in predictable ways that can be handled, rather than unpredictable ways which cannot be handled. This is why, in general, we should always be on the lookout to eliminate all unbounded concurrency in our systems.

[!Todo] read up on the universal scalability law. Watch [[2025-04-17 Scalability Is Quantifiable The Universal Scalability Law USENIX]]

2.0 Formal Definition

Little's Law is formally defined as: $$ L=\lambda W $$

What this says is that the *mean number of units in the system * is the mean effective arrival rate multiplied by the amount of time each item spends in the system. We can fill this for both averages and means, as long as the metric is consistent.

$L$ (Load): mean number of units in the system (concurrency)
$\lambda$ (Arrival): mean arrival time between two consecutive units (throughput)
$W$ (Wait): mean time spent by a unit in the system (latency)

[!info] When talking about distributions the lowercase $\lambda$ has refers to the resulting mean:

In probability theory, $\lambda$ represents the density of occurrences within a time interval, as modelled by the [[Poisson distribution]].

3.0 Subsystems

Little's law not only applies to the entire system, but also all of its subsystems. In a server context this means Little's Law applies to all of:

compute regions
datacenters
individual clusters
individual machines
pods on those machines
load balancers
server applications
Worker queues
Thread pools
Message queues

The only constraint is that the system is stable. Which depending on where you draw the line, you can find a point of stability.

3.0 Examples

3.1 Latency

Say we're working with an HTTP server which continuously handles requests. We can apply Little's law to the describe latency as follows:

$L$: mean number of concurrent requests being handled
$\lambda$: mean throughput, measured in requests per second
$W$: mean response time (latency) measured in seconds

Now say we wanted to know what the mean latency of a request is, but we only know the RPS and the number of concurrent requests. We can invert the formula as follows:

$$ W=\frac{L}{\lambda} $$

This states that the mean response time is the mean number of concurrent requests, divided by the mean throughput. Say we are handling 10 requests concurrently most of the time, and we're handling about 50 requests per second. We can slot that in our equation like this:

$$ W =\frac{10}{50} $$

Which we can solve for $W=0.2\sec$, giving us the mean latency for our server.

4.0 Critique

[[2025-04-18 Telling Stories About Little's Law]] tells a more complete story about little's law. It critiques it for being too simplistic and assuming that systems are entirely stationary.

[!Quote] The system was ticking along nicely, then just after midnight a spike of requests from arrived from a flash sale. This caused latency to increase because of increased lock contention on the database, which in turn caused 10% of client calls to time-out and be retried. A bug in backoff in our client meant that this increased call rate to 10x the normal for this time of day, further increasing contention. And so on…

Instead it provides the following equations:

$Wn+1=W(Ln, \lambda n, t)$
$\lambda n+1=\lambda(Ln, Wn, t)$
$Ln+1=(\lambda n+1)(Wn+1)$