Question 1

What is a latency budget in software engineering?

Accepted Answer

A latency budget is the maximum time allocated to each component in a request's lifecycle so the total end-to-end response time meets your SLA. For example, if your API must respond in 500 ms, you might allocate 50 ms to the network, 100 ms to the database, and 30 ms to auth — leaving the rest as headroom. It forces explicit ownership of latency across teams.

Question 2

What is a good latency target for a web API?

Accepted Answer

Industry standards vary by use case. Interactive web APIs typically target p99 under 200–500 ms. Payment flows often require sub-100 ms. Real-time communication targets sub-100 ms. LLM-backed APIs commonly accept 1–5 seconds. The right target depends on your users' expectations — A/B test your specific product to find where latency meaningfully impacts conversion.

Question 3

How much headroom should I leave in a latency budget?

Accepted Answer

A 10–20% unallocated buffer is a common rule of thumb. This absorbs tail-latency spikes, GC pauses, cold starts, and unexpected downstream degradation. For critical paths where you have strict SLAs, lean toward 20%. For internal tooling with softer targets, 10% is acceptable. If you have less than 5% headroom your SLA is at significant risk.

Question 4

Should I budget for p50, p95, or p99 latency?

Accepted Answer

Always budget for p99 or higher for SLA-critical paths. p50 (median) hides the tail experience — if one in a hundred requests blows your budget, that user experiences failure. p99 means 99% of requests are at or below your target. For systems with very high throughput (millions of requests per day), p999 may be worth tracking too.

Question 5

How do I measure each component's latency contribution?

Accepted Answer

Use distributed tracing — OpenTelemetry is the standard, with exporters for Jaeger, Tempo, and Datadog APM. Instrument each service boundary with a span. Your trace waterfall view will show exact durations per component. For database queries, enable slow query logging and use EXPLAIN ANALYZE. Once you have p99 measurements per component, plug them directly into this calculator.

Latency Budget Calculator

How to Calculate a Latency Budget

Formula

Example Latency Budget Calculations

Example 1 — Standard REST API with database (500 ms SLA)

Example 2 — Microservices fan-out (200 ms SLA)

Example 3 — LLM-backed API (3000 ms SLA)

Tips for Managing Your Latency Budget

Notes

Frequently Asked Questions