The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures
70% of distributed systems interviews
Powers systems at Cassandra, DynamoDB
Tunable consistency query improvement
Fault tolerance
TL;DR
A quorum is the minimum number of nodes required to perform an operation in a distributed system. Typically quorum = (N/2) + 1, where N is the total number of replicas, ensuring a majority agreement while tolerating minority failures. This fundamental technique enables tunable consistency in systems like Cassandra, DynamoDB, and distributed consensus protocols.
Visual Overview
QUORUM READS & WRITES (N=5 replicas, Quorum=3)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Write Operation: SET key="value" with W=3 β
β β
β Client sends write to 5 replicas: β
β ββββββ ββββββ ββββββ ββββββ ββββββ β
β β R1 β β R2 β β R3 β β R4 β β R5 β β
β ββββββ ββββββ ββββββ ββββββ ββββββ β
β β β β β β β
β Success Success Success Timeout Failed β
β β
β W=3 acknowledgments received β β
β Write is SUCCESSFUL (quorum reached) β
β β
β Read Operation: GET key with R=3 β
β ββββββ ββββββ ββββββ ββββββ ββββββ β
β β R1 β β R2 β β R3 β β R4 β β R5 β β
β ββββββ ββββββ ββββββ ββββββ ββββββ β
β v=42 v=42 v=42 (down) (down) β
β β β β β
β β
β R=3 responses received β β
β Return value=42 (majority agrees) β
β β
β Guarantees: β
β - If R + W > N: Reads see latest write β
β - If R=3, W=3, N=5: 3+3 > 5 β (strong consistency) β
β - Tolerate failures: Up to N-W for writes β
β Up to N-R for reads β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
QUORUM INTERSECTION (Why R+W>N guarantees consistency)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β N=5 replicas, W=3, R=3 β
β β
β Write quorum (3 nodes): β
β [R1] [R2] [R3] R4 R5 β
β β β β β
β β
β Read quorum (3 nodes): β
β R1 [R2] [R3] [R4] R5 β
β β β β β
β β
β Overlap: R2, R3 (at least one node in both) β
β β β
β Read MUST see the latest write! β
β (because at least one read replica has latest) β
β β
β Mathematical proof: β
β R + W > N β
β 3 + 3 > 5 β β
β Overlap size: R + W - N = 3 + 3 - 5 = 1 β
β (At least 1 node must be in both quorums) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
EVENTUAL CONSISTENCY (R+W β€ N)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β N=5 replicas, W=1, R=1 (no overlap guarantee) β
β β
β Write quorum (1 node): β
β [R1] R2 R3 R4 R5 β
β β β
β β
β Read quorum (1 node): β
β R1 R2 R3 [R4] R5 β
β β β
β β
β Overlap: NONE! (read might miss the write) β
β β β
β Eventual consistency only β
β (replicas sync eventually via anti-entropy) β
β β
β Trade-off: β
β + Faster (W=1, R=1 vs W=3, R=3) β
β + Higher availability (tolerate more failures) β
β - Might read stale data β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Explanation
What is a Quorum?
A quorum is the minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful. The classic quorum formula is:
Quorum = floor(N/2) + 1
Where N = total number of replicas
Examples:
N=3 β Quorum = 2 (majority)
N=5 β Quorum = 3 (majority)
N=7 β Quorum = 4 (majority)
Why Majority?
A majority quorum ensures that any two quorums must overlap by at least one node, preventing split-brain scenarios and ensuring consistency.
Quorum Variants
1. Read Quorum (R) & Write Quorum (W)
Tunable quorums allow trading off consistency vs availability:
Configuration Examples (N=5):
Strong Consistency:
R=3, W=3 (R+W > N)
- Reads always see latest write
- Requires 3 nodes for any operation
- Availability: Tolerate 2 failures
Eventual Consistency:
R=1, W=1 (R+W β€ N)
- Reads may see stale data
- Fastest operations
- Availability: Tolerate 4 failures (only need 1 node)
Write-Heavy Optimization:
R=4, W=2 (R+W > N)
- Fast writes (only 2 acks needed)
- Slower reads (need 4 responses)
- Good for write-heavy workloads
Read-Heavy Optimization:
R=1, W=5 (R+W > N)
- Fast reads (only 1 response needed)
- Slower writes (all nodes must ack)
- Good for read-heavy workloads
2. Strict Quorum vs Sloppy Quorum
STRICT QUORUM:
ββββββββββββββββββββββββββββββββββββββββββ
β N=3 replicas: [A, B, C] β
β β
β Write must go to A, B, or C only β
β If 2 nodes down β write FAILS β β
β β
β Guarantees: Consistent membership β
β Trade-off: Lower availability β
ββββββββββββββββββββββββββββββββββββββββββ
SLOPPY QUORUM (Hinted Handoff):
ββββββββββββββββββββββββββββββββββββββββββ
β Preferred nodes: [A, B, C] β
β Fallback nodes: [D, E] β
β β
β If A, B down: β
β Write to C, D, E (sloppy quorum) β β
β Hint: "This belongs to A" β
β β
β When A recovers: β
β D and E send hinted data to A β
β β
β Guarantees: High availability β
β Trade-off: Temporary inconsistency β
β β
β Used by: Cassandra, Riak, DynamoDB β
ββββββββββββββββββββββββββββββββββββββββββ
3. Quorum with Versioning
Handle concurrent writes with version vectors:
Scenario: Concurrent writes during network partition
ββββββββββββββββββββββββββββββββββββββββββββββ
β Partition A writes: value="X" version=1 β
β Partition B writes: value="Y" version=1 β
β β
β When partition heals, quorum read finds: β
β - 2 nodes with value="X" version=1 β
β - 3 nodes with value="Y" version=1 β
β β
β Resolution options: β
β 1. Last-Write-Wins: Use timestamp β
β 2. Conflict detection: Return both to app β
β 3. Vector clocks: Track causality β
ββββββββββββββββββββββββββββββββββββββββββββββ
Why R + W > N Ensures Consistency
Mathematical Proof:
Given:
- N = total replicas
- R = read quorum size
- W = write quorum size
If R + W > N, then:
- Write touches W nodes
- Read touches R nodes
- Overlap = R + W - N > 0
Example: N=5, R=3, W=3
- Write touches 3 nodes (any 3 of 5)
- Read touches 3 nodes (any 3 of 5)
- Overlap = 3 + 3 - 5 = 1 node minimum
Result: Read quorum MUST include at least one node
from the write quorum
β Read will see the latest write β
Visual Proof:
All possible write/read quorum combinations (N=5, W=3, R=3):
Write Quorum Read Quorum Overlap
[1,2,3] [1,2,3] 3 nodes
[1,2,3] [1,2,4] 2 nodes
[1,2,3] [1,4,5] 1 node β (minimum)
[1,2,3] [3,4,5] 1 node β
[1,2,3] [2,4,5] 1 node β
Every combination has at least 1 node overlap!
β Consistency guaranteed
Failure Tolerance
Quorum system can tolerate failures:
For N replicas with quorum Q:
- Max failures = N - Q
- Quorum Q = floor(N/2) + 1
Examples:
N=3, Q=2 β Tolerate 1 failure
N=5, Q=3 β Tolerate 2 failures
N=7, Q=4 β Tolerate 3 failures
For operations to succeed:
- Writes: Need W nodes alive
- Reads: Need R nodes alive
With W=3, R=3, N=5:
- Can tolerate 2 node failures (need 3 alive)
- If 3 nodes fail β system unavailable
With W=2, R=2, N=5 (eventual consistency):
- Can tolerate 3 node failures (need 2 alive)
- Higher availability, but no consistency guarantee
Real Systems Using Quorums
System | Quorum Model | Default Configuration | Tunable? | Use Case |
---|---|---|---|---|
Cassandra | R/W quorums | LOCAL_QUORUM | Yes | Multi-datacenter database |
DynamoDB | R/W quorums | Eventually consistent (R=1) | Yes | Key-value store |
Riak | R/W quorums | R=2, W=2, N=3 | Yes | Distributed database |
Raft | Majority quorum | (N/2)+1 fixed | No | Consensus protocol |
Paxos | Majority quorum | (N/2)+1 fixed | No | Consensus protocol |
Zookeeper | Majority quorum | (N/2)+1 fixed | No | Coordination service |
Case Study: Cassandra Consistency Levels
Cassandra Quorum Consistency Levels:
ONE:
- W=1, R=1
- Fastest, lowest consistency
- Use: Logging, metrics
QUORUM:
- W=β(N/2)+1β, R=β(N/2)+1β
- Strong consistency (R+W > N)
- Use: Critical user data
ALL:
- W=N, R=N
- Strongest consistency, lowest availability
- Use: Very critical operations
LOCAL_QUORUM:
- Quorum within local datacenter only
- Use: Multi-datacenter with low latency
Configuration Example:
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3, // N=3 replicas in DC1
'datacenter2': 3 // N=3 replicas in DC2
};
// Write with quorum
INSERT INTO users (id, name) VALUES (1, 'Alice')
USING CONSISTENCY QUORUM;
// Read with quorum
SELECT * FROM users WHERE id=1
USING CONSISTENCY QUORUM;
Case Study: DynamoDB Quorum
DynamoDB Configuration (N=3 replicas across AZs):
Eventually Consistent Read (default):
- R=1 (read from any replica)
- Fastest, cheapest
- May return stale data (<1s lag typically)
Strongly Consistent Read:
- R=2 (quorum read, R+W > N where W=2)
- Always returns latest data
- 2x cost of eventually consistent
Write:
- W=2 (write to quorum)
- Acknowledged when 2/3 replicas confirm
- Third replica updated asynchronously
Failure Handling:
- If 1 replica down: Quorum still works (2/3 available)
- If 2 replicas down: Writes fail, reads might fail
- Automatic recovery: Failed replica catches up via anti-entropy
API Usage:
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
// Eventually consistent read (R=1)
await dynamodb.get({
TableName: 'Users',
Key: { userId: 123 },
ConsistentRead: false // R=1, fast, might be stale
}).promise();
// Strongly consistent read (R=2, quorum)
await dynamodb.get({
TableName: 'Users',
Key: { userId: 123 },
ConsistentRead: true // R=2, slower, always latest
}).promise();
When to Use Quorums
β Perfect Use Cases
Multi-Region Databases
Scenario: Global e-commerce platform
Requirement: Data replicated across 5 regions, need consistency
Configuration: N=5, R=3, W=3
Benefit: Tolerate 2 region failures while maintaining consistency
High Availability with Consistency
Scenario: User authentication system
Requirement: Must be available during failures, but need consistent reads
Configuration: N=3, R=2, W=2
Benefit: Tolerate 1 node failure, read-your-writes consistency
Tunable Consistency
Scenario: Social media application
Requirement: Different consistency for different data
Configuration:
- User posts: R=1, W=1 (eventual consistency OK)
- User credentials: R=3, W=3 (strong consistency required)
Benefit: Optimize each data type independently
β When NOT to Use (or Use Carefully)
Single Datacenter with Low Latency Requirements
Problem: Quorum reads/writes add latency (wait for multiple nodes)
Alternative: Leader-based replication with async followers
Example: Real-time gaming leaderboard
Strict Serializable Isolation Needed
Problem: Quorums provide eventual or read-your-writes consistency, not serializability
Alternative: Distributed transactions with 2PC or consensus
Example: Bank transfers requiring ACID transactions
Very Small Clusters
Problem: N=1 or N=2 cannot form meaningful quorums
Alternative: N=3 minimum for quorum-based systems
Reason: N=2 cannot tolerate any failures with majority quorum
Interview Application
Common Interview Question
Q: βDesign a distributed key-value store that can tolerate node failures while ensuring reads return the latest written value. How would you use quorums?β
Strong Answer:
βIβd design a quorum-based system with tunable consistency:
System Architecture:
- Replication: N=5 replicas across 5 servers (or availability zones)
- Quorum Configuration: R=3, W=3 (strong consistency)
- Partitioning: Consistent hashing for key distribution
Why R=3, W=3:
- R + W = 6 > N = 5, ensuring quorum overlap
- Any read quorum (3 nodes) MUST intersect with any write quorum (3 nodes)
- Guarantees: Reads always see latest write
- Fault tolerance: Tolerate 2 node failures (need 3 alive)
Write Flow:
- Client sends write(key, value) to coordinator
- Coordinator sends to all 5 replicas
- Wait for W=3 acknowledgments
- Return success to client
- Remaining 2 replicas update asynchronously
Read Flow:
- Client sends read(key) to coordinator
- Coordinator sends to all 5 replicas
- Wait for R=3 responses
- Compare versions (using vector clocks or timestamps)
- Return latest version to client
- Repair stale replicas in background (read repair)
Handling Conflicts:
- Use vector clocks to detect concurrent writes
- If conflict detected: Return all versions to client (like DynamoDB)
- Client resolves conflict (e.g., merge shopping carts)
Optimization for Reads:
- For read-heavy workload: R=1, W=5
- Faster reads (wait for 1 response)
- Slower writes (all nodes must ack)
Optimization for Writes:
- For write-heavy workload: R=4, W=2
- Faster writes (wait for 2 acks)
- Slower reads (wait for 4 responses)
Trade-offs:
- Latency: Quorum operations slower than single-node (wait for multiple responses)
- Consistency: Strong with R+W>N, eventual with R+Wβ€N
- Availability: Can tolerate N-Q failures where Q is quorum size
Real-World Example: This design is similar to Amazon DynamoDB and Apache Cassandraβ
Follow-up: What if network partitions the cluster?
βWith quorum-based systems during network partition:
Scenario: 5 nodes split into groups: [3 nodes] and [2 nodes]
Majority Partition (3 nodes):
- Can form quorum (W=3, R=3) β
- Accepts reads and writes
- Remains available
Minority Partition (2 nodes):
- Cannot form quorum β
- Rejects writes (canβt get W=3)
- Rejects reads (canβt get R=3)
- Sacrifices availability for consistency
Alternative: Sloppy Quorum (Cassandra-style):
- Minority partition uses hinted handoff
- Writes to fallback nodes with hints
- Higher availability, temporary inconsistency
- When partition heals, hints are replayed
This demonstrates the CAP theorem trade-off:
- Strict quorum: Consistent, Partition-tolerant (CP)
- Sloppy quorum: Available, Partition-tolerant (AP)β
Code Example
Quorum-Based Read/Write Implementation
import time
import random
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass
from collections import Counter
@dataclass
class VersionedValue:
"""Value with version for conflict detection"""
value: Any
version: int
timestamp: float
class QuorumStore:
"""
Simple quorum-based distributed key-value store
"""
def __init__(self, num_replicas: int = 5, read_quorum: int = 3,
write_quorum: int = 3):
self.num_replicas = num_replicas
self.read_quorum = read_quorum
self.write_quorum = write_quorum
# Simulate replicas (in production, these are separate servers)
self.replicas: List[Dict[str, VersionedValue]] = [
{} for _ in range(num_replicas)
]
# Simulate replica availability (for fault injection)
self.replica_available = [True] * num_replicas
# Validate quorum configuration
if read_quorum + write_quorum <= num_replicas:
print("WARNING: R+W β€ N, eventual consistency only!")
else:
print(f"R+W > N ({read_quorum}+{write_quorum} > {num_replicas}), "
f"strong consistency guaranteed")
def write(self, key: str, value: Any) -> bool:
"""
Write to quorum of replicas
Returns True if write quorum achieved
"""
print(f"\n=== WRITE: key={key}, value={value} ===")
# Create versioned value
versioned_value = VersionedValue(
value=value,
version=int(time.time() * 1000), # Use timestamp as version
timestamp=time.time()
)
# Send write to all replicas
acks = 0
successful_replicas = []
for i, replica in enumerate(self.replicas):
if self.replica_available[i]:
# Simulate network delay
time.sleep(random.uniform(0.001, 0.01))
# Write to replica
replica[key] = versioned_value
acks += 1
successful_replicas.append(i)
print(f" Replica {i}: ACK (total: {acks}/{self.write_quorum})")
# Check if write quorum achieved
if acks >= self.write_quorum:
print(f"β Write quorum achieved ({acks} >= {self.write_quorum})")
# Continue writing to remaining replicas asynchronously
# (in production, this would be background task)
return True
else:
print(f" Replica {i}: UNAVAILABLE")
print(f"β Write quorum NOT achieved ({acks} < {self.write_quorum})")
return False
def read(self, key: str) -> Tuple[Any, bool]:
"""
Read from quorum of replicas
Returns (value, success) tuple
"""
print(f"\n=== READ: key={key} ===")
# Send read to all replicas
responses: List[VersionedValue] = []
responding_replicas = []
for i, replica in enumerate(self.replicas):
if self.replica_available[i]:
# Simulate network delay
time.sleep(random.uniform(0.001, 0.01))
if key in replica:
responses.append(replica[key])
responding_replicas.append(i)
print(f" Replica {i}: value={replica[key].value}, "
f"version={replica[key].version}")
else:
print(f" Replica {i}: key not found")
# Check if read quorum achieved
if len(responses) >= self.read_quorum:
print(f"β Read quorum achieved "
f"({len(responses)} >= {self.read_quorum})")
break
else:
print(f" Replica {i}: UNAVAILABLE")
if len(responses) < self.read_quorum:
print(f"β Read quorum NOT achieved "
f"({len(responses)} < {self.read_quorum})")
return None, False
# Return value with highest version (latest write)
latest = max(responses, key=lambda v: v.version)
print(f"β Returning latest value: {latest.value} "
f"(version {latest.version})")
# Read repair: Update stale replicas in background
self._read_repair(key, latest, responding_replicas)
return latest.value, True
def _read_repair(self, key: str, latest: VersionedValue,
responding_replicas: List[int]):
"""
Update stale replicas with latest value (background operation)
"""
stale_replicas = []
for i in responding_replicas:
if self.replicas[i].get(key, None) != latest:
stale_replicas.append(i)
if stale_replicas:
print(f" Read repair: Updating stale replicas {stale_replicas}")
for i in stale_replicas:
self.replicas[i][key] = latest
def simulate_failure(self, replica_id: int):
"""Simulate replica failure"""
self.replica_available[replica_id] = False
print(f"\n[FAILURE] Replica {replica_id} is now UNAVAILABLE")
def simulate_recovery(self, replica_id: int):
"""Simulate replica recovery"""
self.replica_available[replica_id] = True
print(f"\n[RECOVERY] Replica {replica_id} is now AVAILABLE")
# Usage Examples
if __name__ == '__main__':
# Example 1: Strong consistency (R+W > N)
print("=" * 60)
print("Example 1: Strong Consistency (N=5, R=3, W=3)")
print("=" * 60)
store = QuorumStore(num_replicas=5, read_quorum=3, write_quorum=3)
# Write
store.write("user:123:name", "Alice")
# Read (should see the write)
value, success = store.read("user:123:name")
assert value == "Alice"
# Example 2: Fault tolerance
print("\n" + "=" * 60)
print("Example 2: Fault Tolerance (2 replicas fail)")
print("=" * 60)
store.simulate_failure(3)
store.simulate_failure(4)
# Write should still succeed (need 3, have 3)
store.write("user:456:name", "Bob")
# Read should still succeed
value, success = store.read("user:456:name")
assert value == "Bob"
# Example 3: Quorum failure (too many nodes down)
print("\n" + "=" * 60)
print("Example 3: Quorum Failure (3 replicas fail)")
print("=" * 60)
store.simulate_failure(2) # Now 3 replicas down
# Write should fail (need 3, have 2)
success = store.write("user:789:name", "Charlie")
assert not success
# Example 4: Eventual consistency (R+W β€ N)
print("\n" + "=" * 60)
print("Example 4: Eventual Consistency (N=5, R=1, W=1)")
print("=" * 60)
store_eventual = QuorumStore(num_replicas=5, read_quorum=1, write_quorum=1)
# Write to 1 replica only
store_eventual.write("counter", 42)
# Read might return stale data (if reads from different replica)
# In this simulation, read repair will fix it eventually
value, success = store_eventual.read("counter")
Quorum with Conflict Detection
from typing import Dict, Set
import hashlib
@dataclass
class VectorClock:
"""Vector clock for detecting concurrent writes"""
clocks: Dict[int, int] # replica_id -> counter
def increment(self, replica_id: int):
"""Increment counter for this replica"""
self.clocks[replica_id] = self.clocks.get(replica_id, 0) + 1
def merge(self, other: 'VectorClock'):
"""Merge with another vector clock (take max of each)"""
for replica_id, count in other.clocks.items():
self.clocks[replica_id] = max(
self.clocks.get(replica_id, 0),
count
)
def is_concurrent(self, other: 'VectorClock') -> bool:
"""Check if two writes are concurrent (neither causally before)"""
self_before = False
other_before = False
all_replicas = set(self.clocks.keys()) | set(other.clocks.keys())
for replica_id in all_replicas:
self_count = self.clocks.get(replica_id, 0)
other_count = other.clocks.get(replica_id, 0)
if self_count < other_count:
other_before = True
elif self_count > other_count:
self_before = True
# Concurrent if both are partially before each other
return self_before and other_before
@dataclass
class VersionedValueWithVector:
"""Value with vector clock for conflict detection"""
value: Any
vector_clock: VectorClock
class QuorumStoreWithConflicts(QuorumStore):
"""Quorum store that detects and handles concurrent writes"""
def read_with_conflicts(self, key: str) -> Tuple[List[Any], bool]:
"""
Read from quorum, detecting conflicts
Returns (list of values, success)
- Single value if no conflict
- Multiple values if concurrent writes detected
"""
print(f"\n=== READ WITH CONFLICT DETECTION: key={key} ===")
responses: List[VersionedValueWithVector] = []
for i, replica in enumerate(self.replicas):
if self.replica_available[i] and key in replica:
responses.append(replica[key])
if len(responses) >= self.read_quorum:
break
if len(responses) < self.read_quorum:
return [], False
# Detect conflicts using vector clocks
conflicts = []
resolved = []
for i, v1 in enumerate(responses):
is_conflict = False
for j, v2 in enumerate(responses):
if i != j and v1.vector_clock.is_concurrent(v2.vector_clock):
is_conflict = True
break
if is_conflict:
conflicts.append(v1.value)
if conflicts:
print(f"β CONFLICT DETECTED: {len(conflicts)} concurrent writes")
return conflicts, True
else:
# No conflict, return latest value
latest = max(responses,
key=lambda v: sum(v.vector_clock.clocks.values()))
return [latest.value], True
Related Content
Prerequisites:
- Distributed Systems Basics - Foundation concepts
Related Concepts:
- Consensus - Related agreement protocols
- Leader-Follower Replication - Alternative replication model
- Eventual Consistency - When R+W β€ N
Used In Systems:
- Cassandra: Tunable quorum consistency
- DynamoDB: Read/write quorums with strong vs eventual consistency
- Riak: Quorum-based with sloppy quorums
Explained In Detail:
- Distributed Systems Deep Dive - Quorum implementation details
Quick Self-Check
- Can explain quorum in 60 seconds?
- Understand why R+W>N ensures consistency?
- Know how to calculate fault tolerance from quorum size?
- Can explain difference between strict and sloppy quorums?
- Understand trade-offs between different R/W configurations?
- Can design a quorum system for given requirements?