LogoMasst Docs

Latency vs Throughput

Understanding the difference between latency and throughput in system design.

Latency vs Throughput

Two of the most important performance metrics in system design are latency and throughput. Understanding their relationship is crucial for building high-performance systems.


What is Latency?

Latency is the time it takes for a single request to travel from the client to the server and back. It's measured in time units (milliseconds, seconds).

Types of Latency

TypeDescriptionTypical Values
Network LatencyTime for data to travel over the network1-100ms
Disk LatencyTime to read/write to storage0.1-10ms (SSD), 5-20ms (HDD)
Memory LatencyTime to access RAM100ns
Processing LatencyTime to compute/processVaries

Latency Numbers Every Developer Should Know

L1 cache reference:                    0.5 ns
L2 cache reference:                    7 ns
Main memory reference:                 100 ns
SSD random read:                       150,000 ns (150 μs)
HDD seek:                              10,000,000 ns (10 ms)
Send packet CA → Netherlands → CA:    150,000,000 ns (150 ms)

What is Throughput?

Throughput is the number of operations or amount of data processed per unit of time. It measures the capacity of your system.

Common Throughput Metrics

  • Requests per second (RPS): API throughput
  • Queries per second (QPS): Database throughput
  • Megabytes per second (MB/s): Data transfer rate
  • Transactions per second (TPS): Payment/order systems

Latency vs Throughput: The Relationship

These two metrics are often at odds:

ScenarioLatencyThroughput
Process one request at a timeLowLow
Batch many requests togetherHighHigh
Parallel processingMediumHigh

The Highway Analogy

Think of a highway:

  • Latency = Time for one car to travel from A to B
  • Throughput = Number of cars passing a point per hour

A wider highway (more lanes) increases throughput but doesn't change latency for individual cars.


Optimizing for Latency

Strategies to reduce latency:

  1. Caching: Store frequently accessed data closer to the user
  2. CDNs: Serve content from geographically closer locations
  3. Connection pooling: Reuse existing connections
  4. Async processing: Don't wait for non-critical operations
  5. Database indexing: Speed up query execution

Optimizing for Throughput

Strategies to increase throughput:

  1. Horizontal scaling: Add more servers
  2. Load balancing: Distribute requests efficiently
  3. Batching: Process multiple items together
  4. Compression: Reduce data size for transfer
  5. Parallel processing: Use multiple threads/processes

Trade-offs in Practice

Example: Database Writes

ApproachLatencyThroughput
Write immediately to diskHigh (sync)Low
Write to buffer, flush periodicallyLowHigh
Write-ahead loggingMediumMedium-High

Example: API Design

ApproachLatencyThroughput
Single item per requestLowLow
Batch API (many items per request)HigherMuch higher

Interview Tips

  • Know the numbers: Memorize key latency values (memory vs disk vs network)
  • Identify bottlenecks: Which component has the highest latency?
  • Discuss trade-offs: When would you sacrifice latency for throughput?
  • Consider percentiles: p50, p95, p99 latencies matter more than averages
  • Think about tail latency: The slowest requests impact user experience

Summary

MetricMeasuresGoalUnit
LatencySpeed of single operationMinimizems, μs
ThroughputSystem capacityMaximizeRPS, MB/s

Both metrics matter—the right balance depends on your specific use case and user expectations.