Hash Collision Probability Calculator
EncodingEnter the number of items and your hash function's bit size to instantly calculate the probability of at least one collision. Covers CRC32, SHA-256, and any fixed-width hash.
Last updated: April 2026
This calculator is designed for real-world usage based on typical engineering scenarios and publicly available documentation.
A hash collision probability calculator tells you how likely it is that two items in your dataset will share the same hash value — a risk that compounds faster than most engineers expect. The math is the same as the birthday paradox: in a group of just 23 people, there's a greater-than-even chance two share a birthday. Hash functions face the same statistical pressure. This tool is useful for anyone picking a hash function for a cache key, a distributed shard key, a URL fingerprint, or a deduplication lookup. The collision rate depends on exactly two things: the number of items you're hashing and the output width of your hash function in bits. Plugging those in gives you the exact probability before you commit to a design. Developers often underestimate how quickly a 32-bit hash exhausts its collision budget. CRC32 reaches a 50% collision probability at around 77,000 items — well within the scale of a busy web service's session store or URL cache. A 64-bit hash pushes that threshold to roughly 5 billion items, and SHA-256 (256-bit) is astronomically safe for any practical dataset. Use this calculator alongside your load and capacity planning. If your hash function choice turns out to be too narrow, pair this result with the compression ratio or payload size calculators to model the cost of widening your keys.
How to Calculate Hash Collision Probability
1. Select your hash function's output size in bits — for example, 32 for CRC32, 64 for FNV-64 or xxHash64, 128 for MurmurHash3, or 256 for SHA-256. 2. Enter the number of items you plan to hash: rows in a table, cache keys, session IDs, or URLs. 3. The calculator computes the hash space H = 2^bits — the total count of distinct hash values your function can produce. 4. It applies the birthday problem formula: P = 1 − e^(−n(n−1) ÷ (2H)). 5. The result is the probability that at least one pair of items collides — i.e., two different inputs produce an identical hash output.
Formula
P(collision) = 1 − e^(−n(n−1) / (2H)) n — number of items hashed H — hash space size: H = 2^bits bits — hash function output width (e.g. 32, 64, 128, 256) e — Euler's number (≈ 2.71828) Approximation for large H (n << H): P ≈ n² / (2H)
Example Hash Collision Probability Calculations
Example 1 — CRC32 (32-bit) at the 50% collision point
n = 77,163 items H = 2^32 = 4,294,967,296
exponent = 77,163 × 77,162 ÷ (2 × 4,294,967,296)
= 5,953,628,406 ÷ 8,589,934,592
≈ 0.6931
P = 1 − e^(−0.6931) ≈ 50.0%
→ With just 77 k items, a CRC32-keyed cache has a coin-flip chance of a collision. Example 2 — FNV-64 (64-bit) with 1 million items
n = 1,000,000 items H = 2^64 ≈ 1.844 × 10^19
exponent = (1 × 10^6)² ÷ (2 × 1.844 × 10^19)
= 1 × 10^12 ÷ 3.689 × 10^19
≈ 2.71 × 10^−8
P ≈ 2.71 × 10^−8 ≈ 0.0000027%
→ A 64-bit hash with a million keys is effectively collision-free in practice. Example 3 — SHA-256 (256-bit) with 1 trillion items
n = 1 × 10^12 items H = 2^256 ≈ 1.158 × 10^77
exponent = (1 × 10^12)² ÷ (2 × 1.158 × 10^77)
= 1 × 10^24 ÷ 2.316 × 10^77
≈ 4.32 × 10^−54
P ≈ 4.32 × 10^−54 (astronomically close to zero)
→ SHA-256 is collision-resistant for all practical dataset sizes — including planetary-scale systems. Tips for Choosing a Hash Function and Managing Collision Risk
- › Use a 64-bit hash (xxHash64, SipHash, FNV-64) for all hash-table and cache-key use cases — collision probability stays below 1 in 10 billion for datasets under a billion items.
- › Avoid CRC32 as a uniqueness key. It saturates quickly: collision probability hits 1% at ~9,300 items and 50% at ~77,000 items. CRC32 is a checksum, not a unique fingerprint.
- › For deduplication or content addressing (storing files, blobs, or commits), use SHA-256 or BLAKE3. Their 256-bit output makes accidental collisions mathematically negligible even at petabyte scale.
- › The birthday paradox bites harder than intuition suggests. A 50% collision threshold for a b-bit hash is roughly √(2^b × ln 2) — far fewer items than 2^b.
- › In hash tables with chaining, collisions slow lookups but don't lose data. In caches or sets that use hashes as unique keys, a collision silently returns the wrong value — treat collision probability as a correctness risk, not just a performance risk.
- › If you're hashing user-controlled input (e.g. HTTP request routing, map keys in a service), use a keyed hash like SipHash to prevent hash-flooding DoS attacks — collision resistance alone is not sufficient.
Notes
- › Results are estimates and may vary based on actual usage.
- › Always validate against your production environment.