Skip to content

Quorum

The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures

TL;DR

A quorum is the minimum number of nodes required to perform an operation in a distributed system. Typically quorum = (N/2) + 1, where N is the total number of replicas, ensuring a majority agreement while tolerating minority failures. This fundamental technique enables tunable consistency in systems like Cassandra, DynamoDB, and distributed consensus protocols.

Visual Overview

Quorum Reads and Writes

Core Explanation

What is a Quorum?

A quorum is the minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful. The classic quorum formula is:

Quorum Formula

Why Majority?

A majority quorum ensures that any two quorums must overlap by at least one node, preventing split-brain scenarios and ensuring consistency.

Quorum Variants

1. Read Quorum (R) & Write Quorum (W)

Tunable Quorum Configuration

2. Strict Quorum vs Sloppy Quorum

Strict vs Sloppy Quorum

3. Quorum with Versioning

Quorum with Version Vectors

Why R + W > N Ensures Consistency

Mathematical Proof:

R + W > N Proof

Visual Proof:

Quorum Overlap Examples

Failure Tolerance

Quorum Failure Tolerance

Real Systems Using Quorums

SystemQuorum ModelDefault ConfigurationTunable?Use Case
CassandraR/W quorumsLOCAL_QUORUMYesMulti-datacenter database
DynamoDBR/W quorumsEventually consistent (R=1)YesKey-value store
RiakR/W quorumsR=2, W=2, N=3YesDistributed database
RaftMajority quorum(N/2)+1 fixedNoConsensus protocol
PaxosMajority quorum(N/2)+1 fixedNoConsensus protocol
ZookeeperMajority quorum(N/2)+1 fixedNoCoordination service

Case Study: Cassandra Consistency Levels

Cassandra Quorum Consistency Levels

Case Study: DynamoDB Quorum

DynamoDB Quorum Configuration

When to Use Quorums

✓ Perfect Use Cases

Multi-Region Databases

Multi-Region Quorum Use Case

High Availability with Consistency

High Availability Quorum Use Case

Tunable Consistency

Tunable Consistency Use Case

✕ When NOT to Use (or Use Carefully)

Single Datacenter with Low Latency Requirements

When Not To Use: Low Latency

Strict Serializable Isolation Needed

When Not To Use: Serializable Isolation

Very Small Clusters

When Not To Use: Small Clusters

Interview Application

Common Interview Question

Q: “Design a distributed key-value store that can tolerate node failures while ensuring reads return the latest written value. How would you use quorums?”

Strong Answer:

“I’d design a quorum-based system with tunable consistency:

System Architecture:

  • Replication: N=5 replicas across 5 servers (or availability zones)
  • Quorum Configuration: R=3, W=3 (strong consistency)
  • Partitioning: Consistent hashing for key distribution

Why R=3, W=3:

  • R + W = 6 > N = 5, ensuring quorum overlap
  • Any read quorum (3 nodes) MUST intersect with any write quorum (3 nodes)
  • Guarantees: Reads always see latest write
  • Fault tolerance: Tolerate 2 node failures (need 3 alive)

Write Flow:

  1. Client sends write(key, value) to coordinator
  2. Coordinator sends to all 5 replicas
  3. Wait for W=3 acknowledgments
  4. Return success to client
  5. Remaining 2 replicas update asynchronously

Read Flow:

  1. Client sends read(key) to coordinator
  2. Coordinator sends to all 5 replicas
  3. Wait for R=3 responses
  4. Compare versions (using vector clocks or timestamps)
  5. Return latest version to client
  6. Repair stale replicas in background (read repair)

Handling Conflicts:

  • Use vector clocks to detect concurrent writes
  • If conflict detected: Return all versions to client (like DynamoDB)
  • Client resolves conflict (e.g., merge shopping carts)

Optimization for Reads:

  • For read-heavy workload: R=1, W=5
  • Faster reads (wait for 1 response)
  • Slower writes (all nodes must ack)

Optimization for Writes:

  • For write-heavy workload: R=4, W=2
  • Faster writes (wait for 2 acks)
  • Slower reads (wait for 4 responses)

Trade-offs:

  • Latency: Quorum operations slower than single-node (wait for multiple responses)
  • Consistency: Strong with R+W>N, eventual with R+W≤N
  • Availability: Can tolerate N-Q failures where Q is quorum size

Real-World Example: This design is similar to Amazon DynamoDB and Apache Cassandra”

Follow-up: What if network partitions the cluster?

“With quorum-based systems during network partition:

Scenario: 5 nodes split into groups: [3 nodes] and [2 nodes]

Majority Partition (3 nodes):

  • Can form quorum (W=3, R=3) ✓
  • Accepts reads and writes
  • Remains available

Minority Partition (2 nodes):

  • Cannot form quorum ✗
  • Rejects writes (can’t get W=3)
  • Rejects reads (can’t get R=3)
  • Sacrifices availability for consistency

Alternative: Sloppy Quorum (Cassandra-style):

  • Minority partition uses hinted handoff
  • Writes to fallback nodes with hints
  • Higher availability, temporary inconsistency
  • When partition heals, hints are replayed

This demonstrates the CAP theorem trade-off:

  • Strict quorum: Consistent, Partition-tolerant (CP)
  • Sloppy quorum: Available, Partition-tolerant (AP)“

Code Example

Quorum-Based Read/Write Implementation

import time
import random
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass
from collections import Counter

@dataclass
class VersionedValue:
    """Value with version for conflict detection"""
    value: Any
    version: int
    timestamp: float

class QuorumStore:
    """
    Simple quorum-based distributed key-value store
    """
    def __init__(self, num_replicas: int = 5, read_quorum: int = 3,
                 write_quorum: int = 3):
        self.num_replicas = num_replicas
        self.read_quorum = read_quorum
        self.write_quorum = write_quorum

        # Simulate replicas (in production, these are separate servers)
        self.replicas: List[Dict[str, VersionedValue]] = [
            {} for _ in range(num_replicas)
        ]

        # Simulate replica availability (for fault injection)
        self.replica_available = [True] * num_replicas

        # Validate quorum configuration
        if read_quorum + write_quorum <= num_replicas:
            print("WARNING: R+W ≤ N, eventual consistency only!")
        else:
            print(f"R+W > N ({read_quorum}+{write_quorum} > {num_replicas}), "
                  f"strong consistency guaranteed")

    def write(self, key: str, value: Any) -> bool:
        """
        Write to quorum of replicas
        Returns True if write quorum achieved
        """
        print(f"\n=== WRITE: key={key}, value={value} ===")

        # Create versioned value
        versioned_value = VersionedValue(
            value=value,
            version=int(time.time() * 1000),  # Use timestamp as version
            timestamp=time.time()
        )

        # Send write to all replicas
        acks = 0
        successful_replicas = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i]:
                # Simulate network delay
                time.sleep(random.uniform(0.001, 0.01))

                # Write to replica
                replica[key] = versioned_value
                acks += 1
                successful_replicas.append(i)

                print(f"  Replica {i}: ACK (total: {acks}/{self.write_quorum})")

                # Check if write quorum achieved
                if acks >= self.write_quorum:
                    print(f"✓ Write quorum achieved ({acks} >= {self.write_quorum})")

                    # Continue writing to remaining replicas asynchronously
                    # (in production, this would be background task)
                    return True
            else:
                print(f"  Replica {i}: UNAVAILABLE")

        print(f"✗ Write quorum NOT achieved ({acks} < {self.write_quorum})")
        return False

    def read(self, key: str) -> Tuple[Any, bool]:
        """
        Read from quorum of replicas
        Returns (value, success) tuple
        """
        print(f"\n=== READ: key={key} ===")

        # Send read to all replicas
        responses: List[VersionedValue] = []
        responding_replicas = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i]:
                # Simulate network delay
                time.sleep(random.uniform(0.001, 0.01))

                if key in replica:
                    responses.append(replica[key])
                    responding_replicas.append(i)
                    print(f"  Replica {i}: value={replica[key].value}, "
                          f"version={replica[key].version}")
                else:
                    print(f"  Replica {i}: key not found")

                # Check if read quorum achieved
                if len(responses) >= self.read_quorum:
                    print(f"✓ Read quorum achieved "
                          f"({len(responses)} >= {self.read_quorum})")
                    break
            else:
                print(f"  Replica {i}: UNAVAILABLE")

        if len(responses) < self.read_quorum:
            print(f"✗ Read quorum NOT achieved "
                  f"({len(responses)} < {self.read_quorum})")
            return None, False

        # Return value with highest version (latest write)
        latest = max(responses, key=lambda v: v.version)
        print(f"→ Returning latest value: {latest.value} "
              f"(version {latest.version})")

        # Read repair: Update stale replicas in background
        self._read_repair(key, latest, responding_replicas)

        return latest.value, True

    def _read_repair(self, key: str, latest: VersionedValue,
                    responding_replicas: List[int]):
        """
        Update stale replicas with latest value (background operation)
        """
        stale_replicas = []
        for i in responding_replicas:
            if self.replicas[i].get(key, None) != latest:
                stale_replicas.append(i)

        if stale_replicas:
            print(f"  Read repair: Updating stale replicas {stale_replicas}")
            for i in stale_replicas:
                self.replicas[i][key] = latest

    def simulate_failure(self, replica_id: int):
        """Simulate replica failure"""
        self.replica_available[replica_id] = False
        print(f"\n[FAILURE] Replica {replica_id} is now UNAVAILABLE")

    def simulate_recovery(self, replica_id: int):
        """Simulate replica recovery"""
        self.replica_available[replica_id] = True
        print(f"\n[RECOVERY] Replica {replica_id} is now AVAILABLE")

# Usage Examples
if __name__ == '__main__':
    # Example 1: Strong consistency (R+W > N)
    print("=" * 60)
    print("Example 1: Strong Consistency (N=5, R=3, W=3)")
    print("=" * 60)

    store = QuorumStore(num_replicas=5, read_quorum=3, write_quorum=3)

    # Write
    store.write("user:123:name", "Alice")

    # Read (should see the write)
    value, success = store.read("user:123:name")
    assert value == "Alice"

    # Example 2: Fault tolerance
    print("\n" + "=" * 60)
    print("Example 2: Fault Tolerance (2 replicas fail)")
    print("=" * 60)

    store.simulate_failure(3)
    store.simulate_failure(4)

    # Write should still succeed (need 3, have 3)
    store.write("user:456:name", "Bob")

    # Read should still succeed
    value, success = store.read("user:456:name")
    assert value == "Bob"

    # Example 3: Quorum failure (too many nodes down)
    print("\n" + "=" * 60)
    print("Example 3: Quorum Failure (3 replicas fail)")
    print("=" * 60)

    store.simulate_failure(2)  # Now 3 replicas down

    # Write should fail (need 3, have 2)
    success = store.write("user:789:name", "Charlie")
    assert not success

    # Example 4: Eventual consistency (R+W ≤ N)
    print("\n" + "=" * 60)
    print("Example 4: Eventual Consistency (N=5, R=1, W=1)")
    print("=" * 60)

    store_eventual = QuorumStore(num_replicas=5, read_quorum=1, write_quorum=1)

    # Write to 1 replica only
    store_eventual.write("counter", 42)

    # Read might return stale data (if reads from different replica)
    # In this simulation, read repair will fix it eventually
    value, success = store_eventual.read("counter")

Quorum with Conflict Detection

from typing import Dict, Set
import hashlib

@dataclass
class VectorClock:
    """Vector clock for detecting concurrent writes"""
    clocks: Dict[int, int]  # replica_id -> counter

    def increment(self, replica_id: int):
        """Increment counter for this replica"""
        self.clocks[replica_id] = self.clocks.get(replica_id, 0) + 1

    def merge(self, other: 'VectorClock'):
        """Merge with another vector clock (take max of each)"""
        for replica_id, count in other.clocks.items():
            self.clocks[replica_id] = max(
                self.clocks.get(replica_id, 0),
                count
            )

    def is_concurrent(self, other: 'VectorClock') -> bool:
        """Check if two writes are concurrent (neither causally before)"""
        self_before = False
        other_before = False

        all_replicas = set(self.clocks.keys()) | set(other.clocks.keys())

        for replica_id in all_replicas:
            self_count = self.clocks.get(replica_id, 0)
            other_count = other.clocks.get(replica_id, 0)

            if self_count < other_count:
                other_before = True
            elif self_count > other_count:
                self_before = True

        # Concurrent if both are partially before each other
        return self_before and other_before

@dataclass
class VersionedValueWithVector:
    """Value with vector clock for conflict detection"""
    value: Any
    vector_clock: VectorClock

class QuorumStoreWithConflicts(QuorumStore):
    """Quorum store that detects and handles concurrent writes"""

    def read_with_conflicts(self, key: str) -> Tuple[List[Any], bool]:
        """
        Read from quorum, detecting conflicts
        Returns (list of values, success)
        - Single value if no conflict
        - Multiple values if concurrent writes detected
        """
        print(f"\n=== READ WITH CONFLICT DETECTION: key={key} ===")

        responses: List[VersionedValueWithVector] = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i] and key in replica:
                responses.append(replica[key])

                if len(responses) >= self.read_quorum:
                    break

        if len(responses) < self.read_quorum:
            return [], False

        # Detect conflicts using vector clocks
        conflicts = []
        resolved = []

        for i, v1 in enumerate(responses):
            is_conflict = False
            for j, v2 in enumerate(responses):
                if i != j and v1.vector_clock.is_concurrent(v2.vector_clock):
                    is_conflict = True
                    break

            if is_conflict:
                conflicts.append(v1.value)

        if conflicts:
            print(f"⚠ CONFLICT DETECTED: {len(conflicts)} concurrent writes")
            return conflicts, True
        else:
            # No conflict, return latest value
            latest = max(responses,
                        key=lambda v: sum(v.vector_clock.clocks.values()))
            return [latest.value], True

See It In Action:

Prerequisites:

Related Concepts:

Used In Systems:

  • Cassandra: Tunable quorum consistency
  • DynamoDB: Read/write quorums with strong vs eventual consistency
  • Riak: Quorum-based with sloppy quorums

Explained In Detail:

  • Distributed Systems Deep Dive - Quorum implementation details

Quick Self-Check

  • Can explain quorum in 60 seconds?
  • Understand why R+W>N ensures consistency?
  • Know how to calculate fault tolerance from quorum size?
  • Can explain difference between strict and sloppy quorums?
  • Understand trade-offs between different R/W configurations?
  • Can design a quorum system for given requirements?
Interview Notes
💼70% of distributed systems interviews
Interview Relevance
70% of distributed systems interviews
🏭Cassandra, DynamoDB
Production Impact
Powers systems at Cassandra, DynamoDB
Tunable consistency
Performance
Tunable consistency query improvement
📈Fault tolerance
Scalability
Fault tolerance