Latency vs Throughput
Understanding the difference between latency and throughput in system design.
Latency vs Throughput
Two of the most important performance metrics in system design are latency and throughput. Understanding their relationship is crucial for building high-performance systems.
What is Latency?
Latency is the time it takes for a single request to travel from the client to the server and back. It's measured in time units (milliseconds, seconds).
Types of Latency
| Type | Description | Typical Values |
|---|---|---|
| Network Latency | Time for data to travel over the network | 1-100ms |
| Disk Latency | Time to read/write to storage | 0.1-10ms (SSD), 5-20ms (HDD) |
| Memory Latency | Time to access RAM | 100ns |
| Processing Latency | Time to compute/process | Varies |
Latency Numbers Every Developer Should Know
L1 cache reference: 0.5 ns
L2 cache reference: 7 ns
Main memory reference: 100 ns
SSD random read: 150,000 ns (150 μs)
HDD seek: 10,000,000 ns (10 ms)
Send packet CA → Netherlands → CA: 150,000,000 ns (150 ms)What is Throughput?
Throughput is the number of operations or amount of data processed per unit of time. It measures the capacity of your system.
Common Throughput Metrics
- Requests per second (RPS): API throughput
- Queries per second (QPS): Database throughput
- Megabytes per second (MB/s): Data transfer rate
- Transactions per second (TPS): Payment/order systems
Latency vs Throughput: The Relationship
These two metrics are often at odds:
| Scenario | Latency | Throughput |
|---|---|---|
| Process one request at a time | Low | Low |
| Batch many requests together | High | High |
| Parallel processing | Medium | High |
The Highway Analogy
Think of a highway:
- Latency = Time for one car to travel from A to B
- Throughput = Number of cars passing a point per hour
A wider highway (more lanes) increases throughput but doesn't change latency for individual cars.
Optimizing for Latency
Strategies to reduce latency:
- Caching: Store frequently accessed data closer to the user
- CDNs: Serve content from geographically closer locations
- Connection pooling: Reuse existing connections
- Async processing: Don't wait for non-critical operations
- Database indexing: Speed up query execution
Optimizing for Throughput
Strategies to increase throughput:
- Horizontal scaling: Add more servers
- Load balancing: Distribute requests efficiently
- Batching: Process multiple items together
- Compression: Reduce data size for transfer
- Parallel processing: Use multiple threads/processes
Trade-offs in Practice
Example: Database Writes
| Approach | Latency | Throughput |
|---|---|---|
| Write immediately to disk | High (sync) | Low |
| Write to buffer, flush periodically | Low | High |
| Write-ahead logging | Medium | Medium-High |
Example: API Design
| Approach | Latency | Throughput |
|---|---|---|
| Single item per request | Low | Low |
| Batch API (many items per request) | Higher | Much higher |
Interview Tips
- Know the numbers: Memorize key latency values (memory vs disk vs network)
- Identify bottlenecks: Which component has the highest latency?
- Discuss trade-offs: When would you sacrifice latency for throughput?
- Consider percentiles: p50, p95, p99 latencies matter more than averages
- Think about tail latency: The slowest requests impact user experience
Summary
| Metric | Measures | Goal | Unit |
|---|---|---|---|
| Latency | Speed of single operation | Minimize | ms, μs |
| Throughput | System capacity | Maximize | RPS, MB/s |
Both metrics matter—the right balance depends on your specific use case and user expectations.