Latency vs Throughput

Two of the most important performance metrics in system design are latency and throughput. Understanding their relationship is crucial for building high-performance systems.

What is Latency?

Latency is the time it takes for a single request to travel from the client to the server and back. It's measured in time units (milliseconds, seconds).

Types of Latency

Type	Description	Typical Values
Network Latency	Time for data to travel over the network	1-100ms
Disk Latency	Time to read/write to storage	0.1-10ms (SSD), 5-20ms (HDD)
Memory Latency	Time to access RAM	100ns
Processing Latency	Time to compute/process	Varies

Latency Numbers Every Developer Should Know

L1 cache reference:                    0.5 ns
L2 cache reference:                    7 ns
Main memory reference:                 100 ns
SSD random read:                       150,000 ns (150 μs)
HDD seek:                              10,000,000 ns (10 ms)
Send packet CA → Netherlands → CA:    150,000,000 ns (150 ms)

What is Throughput?

Throughput is the number of operations or amount of data processed per unit of time. It measures the capacity of your system.

Common Throughput Metrics

Requests per second (RPS): API throughput
Queries per second (QPS): Database throughput
Megabytes per second (MB/s): Data transfer rate
Transactions per second (TPS): Payment/order systems

Latency vs Throughput: The Relationship

These two metrics are often at odds:

Scenario	Latency	Throughput
Process one request at a time	Low	Low
Batch many requests together	High	High
Parallel processing	Medium	High

The Highway Analogy

Think of a highway:

Latency = Time for one car to travel from A to B
Throughput = Number of cars passing a point per hour

A wider highway (more lanes) increases throughput but doesn't change latency for individual cars.

Optimizing for Latency

Strategies to reduce latency:

Caching: Store frequently accessed data closer to the user
CDNs: Serve content from geographically closer locations
Connection pooling: Reuse existing connections
Async processing: Don't wait for non-critical operations
Database indexing: Speed up query execution

Optimizing for Throughput

Strategies to increase throughput:

Horizontal scaling: Add more servers
Load balancing: Distribute requests efficiently
Batching: Process multiple items together
Compression: Reduce data size for transfer
Parallel processing: Use multiple threads/processes

Trade-offs in Practice

Example: Database Writes

Approach	Latency	Throughput
Write immediately to disk	High (sync)	Low
Write to buffer, flush periodically	Low	High
Write-ahead logging	Medium	Medium-High

Example: API Design

Approach	Latency	Throughput
Single item per request	Low	Low
Batch API (many items per request)	Higher	Much higher

Interview Tips

Know the numbers: Memorize key latency values (memory vs disk vs network)
Identify bottlenecks: Which component has the highest latency?
Discuss trade-offs: When would you sacrifice latency for throughput?
Consider percentiles: p50, p95, p99 latencies matter more than averages
Think about tail latency: The slowest requests impact user experience

Summary

Metric	Measures	Goal	Unit
Latency	Speed of single operation	Minimize	ms, μs
Throughput	System capacity	Maximize	RPS, MB/s

Both metrics matter—the right balance depends on your specific use case and user expectations.

Latency vs Throughput

On this page