Skip to content

Immutability

6 min Beginner Patterns Interview: 55%

Design principle where data structures cannot be modified after creation, simplifying distributed systems by eliminating update conflicts and race conditions

πŸ’Ό 55% of system design interviews
Interview Relevance
55% of system design interviews
🏭 Kafka, Git, blockchain
Production Impact
Powers systems at Kafka, Git, blockchain
⚑ No race conditions
Performance
No race conditions query improvement
πŸ“ˆ Simplified replication
Scalability
Simplified replication

TL;DR

Immutability means data cannot be changed after creation. In distributed systems, immutable data structures eliminate entire classes of concurrency bugs, enable caching without invalidation, simplify replication, and power systems like Kafka, Git, and event sourcing architectures.

Visual Overview

MUTABLE DATA (Traditional Approach)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Database Record: User Balance                     β”‚
β”‚                                                    β”‚
β”‚  T0: balance = $100                               β”‚
β”‚  T1: UPDATE balance = $80  (withdraw $20)         β”‚
β”‚  T2: UPDATE balance = $130 (deposit $50)          β”‚
β”‚                                                    β”‚
β”‚  Current State: balance = $130                    β”‚
β”‚  History: LOST βœ•                                  β”‚
β”‚                                                    β”‚
β”‚  Problems:                                         β”‚
β”‚  - Race conditions (concurrent updates)           β”‚
β”‚  - No audit trail                                 β”‚
β”‚  - Cache invalidation needed                      β”‚
β”‚  - Difficult to debug past states                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

IMMUTABLE DATA (Append-Only Approach)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Event Log: User Transactions                      β”‚
β”‚                                                    β”‚
β”‚  Event 1: {type: "DEPOSIT",  amount: 100, time: T0}β”‚
β”‚  Event 2: {type: "WITHDRAW", amount: 20,  time: T1}β”‚
β”‚  Event 3: {type: "DEPOSIT",  amount: 50,  time: T2}β”‚
β”‚                                                    β”‚
β”‚  Current State: SUM(events) = $130                β”‚
β”‚  History: PRESERVED βœ“                              β”‚
β”‚                                                    β”‚
β”‚  Benefits:                                         β”‚
β”‚  βœ“ No race conditions (only appends)              β”‚
β”‚  βœ“ Complete audit trail                           β”‚
β”‚  βœ“ Cache forever (never invalidated)              β”‚
β”‚  βœ“ Time-travel debugging (replay to any point)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CONCURRENCY COMPARISON:

Mutable (Requires Locking):
Thread A: READ balance=100 β†’ UPDATE balance=80  ↓
Thread B: READ balance=100 β†’ UPDATE balance=150 ↓
Result: Lost update! (one transaction overwrites other)

Immutable (Lock-Free):
Thread A: APPEND {withdraw: 20, id: 1}  ← No conflict
Thread B: APPEND {deposit: 50,  id: 2}  ← No conflict
Result: Both transactions preserved βœ“

Core Explanation

What is Immutability?

Immutability is a design principle where data structures cannot be modified after creation. Instead of updating existing data, you create new versions.

Programming Example:

// MUTABLE (traditional)
let user = { name: "Alice", age: 30 };
user.age = 31; // Original data modified βœ•

// IMMUTABLE (functional)
const user = { name: "Alice", age: 30 };
const updatedUser = { ...user, age: 31 }; // New object created βœ“
// Original 'user' unchanged

In distributed systems, immutability typically means:

  1. Append-only writes: New records added, existing records never modified
  2. Versioned data: Each change creates a new version
  3. Event logs: Store changes as immutable events

Why Immutability Matters in Distributed Systems

1. Eliminates Concurrency Bugs

PROBLEM WITH MUTABLE DATA:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Two servers updating same record           β”‚
β”‚                                             β”‚
β”‚  Server A: UPDATE inventory SET qty=9       β”‚
β”‚  Server B: UPDATE inventory SET qty=8       β”‚
β”‚                                             β”‚
β”‚  Race condition:                            β”‚
β”‚  - Who wins? (last write wins = data loss)  β”‚
β”‚  - Need distributed locks (slow)            β”‚
β”‚  - Need MVCC or optimistic locking          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SOLUTION WITH IMMUTABLE DATA:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Two servers appending events               β”‚
β”‚                                             β”‚
β”‚  Server A: APPEND {sold: 1, timestamp: T1}  β”‚
β”‚  Server B: APPEND {sold: 2, timestamp: T2}  β”‚
β”‚                                             β”‚
β”‚  No race condition:                         β”‚
β”‚  - Both events preserved                    β”‚
β”‚  - No locks needed (append-only)            β”‚
β”‚  - Total sold = 3 (computed from events)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Enables Aggressive Caching

MUTABLE DATA:
- Cache user profile
- User updates profile
- Must invalidate cache (cache invalidation is hard!)
- Cache miss on next read

IMMUTABLE DATA:
- Cache user profile version 5
- User updates profile β†’ creates version 6
- Version 5 cache still valid (never expires)
- New requests use version 6 (different cache key)

Result: Cache can live forever, no invalidation needed

3. Simplifies Replication

MUTABLE REPLICATION:
Primary: UPDATE user SET name='Alice' WHERE id=123
Replica: Must apply same UPDATE

Problems:
- What if replica is behind? (out-of-order updates)
- What if UPDATE fails on replica? (inconsistency)
- How to handle conflicts? (complex merge logic)

IMMUTABLE REPLICATION:
Primary: APPEND event {id: 123, name: 'Alice', version: 5}
Replica: APPEND same event

Benefits:
- Events can be replayed in order
- Idempotent (appending same event twice is safe)
- No merge conflicts (deterministic ordering)

4. Time-Travel Debugging

MUTABLE: Only current state exists
- Bug in production?
- Cannot see what state was at T-1 hour

IMMUTABLE: Complete history exists
- Bug in production?
- Replay events to T-1 hour
- See exact state at any point in time
- Example: "What was user's cart at 3pm yesterday?"

Append-Only Logs

The most common form of immutability in distributed systems:

KAFKA TOPIC (Append-Only Log)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Partition 0: User Events                    β”‚
β”‚                                              β”‚
β”‚  Offset 0: {user: 1, action: "login"}       β”‚
β”‚  Offset 1: {user: 1, action: "view_product"}β”‚
β”‚  Offset 2: {user: 2, action: "login"}       β”‚
β”‚  Offset 3: {user: 1, action: "purchase"}    β”‚
β”‚                                              β”‚
β”‚  Properties:                                 β”‚
β”‚  - Only appends allowed (no updates/deletes) β”‚
β”‚  - Each message has immutable offset        β”‚
β”‚  - Consumers can replay from any offset     β”‚
β”‚  - Old messages deleted by time (retention) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits:
βœ“ High throughput (sequential disk writes)
βœ“ Multiple consumers can read same data
βœ“ Replay events for recovery or new consumers
βœ“ Audit trail preserved

Versioned Data

Alternative approach: Keep multiple versions of data

DATABASE WITH VERSIONING (e.g., DynamoDB)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Profile Versions                     β”‚
β”‚                                            β”‚
β”‚  Version 1: {name: "Alice", age: 30}       β”‚
β”‚  Version 2: {name: "Alice", age: 31}       β”‚
β”‚  Version 3: {name: "Alice A", age: 31}     β”‚
β”‚                                            β”‚
β”‚  Current: Version 3                        β”‚
β”‚  History: Versions 1-2 preserved           β”‚
β”‚                                            β”‚
β”‚  Implementation:                           β”‚
β”‚  - Each write creates new version          β”‚
β”‚  - Version ID/timestamp tracks changes     β”‚
β”‚  - Old versions kept or garbage collected  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Examples:
- DynamoDB: Version numbers
- PostgreSQL: MVCC (Multi-Version Concurrency Control)
- Git: Commit hashes

Real Systems Using Immutability

SystemImmutability ModelUse CaseBenefits
KafkaAppend-only logMessage streamingReplay, fault tolerance, high throughput
GitImmutable commitsVersion controlComplete history, branching, rollback
BlockchainImmutable ledgerCryptocurrencyTamper-proof, audit trail
Event SourcingEvent logCQRS systemsAudit trail, time-travel, replay
S3Write-once objectsObject storageCache forever, versioning
DatomicImmutable factsDatabaseQuery past states, time-travel

Case Study: Kafka Log Immutability

KAFKA DESIGN DECISIONS:

1. Messages are immutable after write
   - Producer writes message β†’ never changed
   - Consumers cannot modify messages
   - Only deletion: Time-based retention (e.g., delete after 7 days)

2. Sequential writes to disk
   - Append-only = sequential I/O (fast!)
   - Modern disks: Sequential ~600 MB/s vs Random ~100 MB/s
   - Result: Kafka throughput in millions of msgs/sec

3. Zero-copy reads
   - Messages immutable β†’ cache in OS page cache
   - Send directly from page cache to network (zero-copy)
   - No serialization/deserialization overhead

4. Replayability
   - Consumer can reset offset and replay
   - Used for: Recovery, new consumers, backfilling data
   - Example: "Process last 24 hours of events again"

5. Log compaction (for keyed data)
   - Keep latest value per key
   - Delete old versions (garbage collection)
   - Still immutable: Never UPDATE, only APPEND + COMPACT

Case Study: Git Commits

GIT COMMIT IMMUTABILITY:

Commit: SHA-256 hash of (content + metadata)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Commit abc123:                         β”‚
β”‚  - Parent: def456                      β”‚
β”‚  - Tree: Files snapshot                β”‚
β”‚  - Author: Alice                       β”‚
β”‚  - Message: "Add feature X"            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Properties:
- Changing any field β†’ different hash β†’ different commit
- Cannot modify history without changing hash
- Result: Tamper-proof, verifiable history

Benefits:
βœ“ Branching: Create alternate histories (branches)
βœ“ Merging: Combine histories deterministically
βœ“ Rollback: Revert to any commit
βœ“ Distributed: Clone full history to any machine

When to Use Immutability

βœ“ Perfect Use Cases

Event Sourcing Architectures

Scenario: Banking system
Requirement: Complete audit trail for compliance
Solution: Store all transactions as immutable events
Benefit: Can audit any account at any point in time

Message Streaming

Scenario: Real-time analytics pipeline
Requirement: Multiple consumers, replayability
Solution: Kafka append-only log
Benefit: New analytics jobs can process historical data

Caching & CDN

Scenario: Static assets (images, JS, CSS)
Solution: Immutable URLs with content hash
Example: bundle.abc123.js (hash in filename)
Benefit: Cache forever with HTTP Cache-Control: immutable

Version Control

Scenario: Collaborative document editing
Solution: Store every edit as immutable version
Benefit: Undo, redo, view history, branch documents

βœ• When NOT to Use (or Use Carefully)

Storage-Constrained Systems

Problem: Immutable data accumulates forever
Example: 1 billion events/day = massive storage cost
Solution: Log compaction, retention policies, snapshots

GDPR Right to Delete

Problem: Cannot truly delete immutable data
Example: User requests account deletion (GDPR)
Solution: Tombstone records, encryption with key deletion

Real-Time Updates with Small Changes

Problem: Appending full document for small change is wasteful
Example: Updating single field in 1MB document
Solution: Hybrid approach (mutable with WAL for durability)

Interview Application

Common Interview Question

Q: β€œWhy does Kafka use immutable logs instead of a traditional database?”

Strong Answer:

β€œKafka uses immutable append-only logs for several key reasons:

1. Performance:

  • Sequential disk writes are 6x faster than random writes (600 MB/s vs 100 MB/s)
  • Append-only allows optimizing for sequential I/O
  • Result: Kafka achieves millions of messages/second throughput

2. Replayability:

  • Immutable messages can be read multiple times
  • Consumers can reset offset and replay historical data
  • Use cases: Recovery from consumer failures, backfilling data for new analytics

3. Simplifies Replication:

  • Replicas just copy log segments
  • No complex merge logic (events never change)
  • Idempotent replication (copying same event twice is safe)

4. Multiple Consumers:

  • Same log can be consumed by multiple independent consumers
  • Each consumer tracks own offset
  • Example: Real-time analytics + batch processing on same stream

5. Durability:

  • Once written to log, message is never lost
  • Replicas have identical copies (deterministic)
  • Contrast with message queues that delete on consumption

Trade-offs:

  • Storage cost: Must retain logs (mitigated by log compaction + retention)
  • Cannot update: If message has error, must append correction event
  • But benefits far outweigh costs for streaming use cases”

Code Example

Immutable Event Sourcing Pattern

// MUTABLE APPROACH (traditional)
class BankAccount {
  constructor() {
    this.balance = 0; // Mutable state
  }

  deposit(amount) {
    this.balance += amount; // In-place update βœ•
    // History lost!
  }

  withdraw(amount) {
    this.balance -= amount; // In-place update βœ•
  }
}

// Problem: No audit trail, race conditions on concurrent updates

// IMMUTABLE APPROACH (event sourcing)
class BankAccountEventSourced {
  constructor() {
    this.events = []; // Immutable event log
  }

  // Commands: Append events (never modify existing)
  deposit(amount) {
    const event = {
      type: "DEPOSIT",
      amount: amount,
      timestamp: Date.now(),
      id: generateId(),
    };
    this.events.push(event); // Append-only βœ“
    return event;
  }

  withdraw(amount) {
    const event = {
      type: "WITHDRAW",
      amount: amount,
      timestamp: Date.now(),
      id: generateId(),
    };
    this.events.push(event); // Append-only βœ“
    return event;
  }

  // Query: Compute current state from events
  getBalance() {
    return this.events.reduce((balance, event) => {
      if (event.type === "DEPOSIT") return balance + event.amount;
      if (event.type === "WITHDRAW") return balance - event.amount;
      return balance;
    }, 0);
  }

  // Time-travel: Get balance at any point in history
  getBalanceAt(timestamp) {
    return this.events
      .filter(e => e.timestamp <= timestamp)
      .reduce((balance, event) => {
        if (event.type === "DEPOSIT") return balance + event.amount;
        if (event.type === "WITHDRAW") return balance - event.amount;
        return balance;
      }, 0);
  }

  // Audit: Get complete transaction history
  getAuditLog() {
    return this.events.map(e => ({
      type: e.type,
      amount: e.amount,
      timestamp: new Date(e.timestamp).toISOString(),
    }));
  }
}

// Usage
const account = new BankAccountEventSourced();
account.deposit(100);
account.withdraw(20);
account.deposit(50);

console.log(account.getBalance()); // 130
console.log(account.getBalanceAt(Date.now() - 1000)); // Balance 1 second ago
console.log(account.getAuditLog()); // Complete history

Immutable Cache Keys (Versioned Assets)

// MUTABLE (cache invalidation problem)
<script src="/bundle.js"></script>
// Updated bundle.js β†’ Must invalidate CDN cache (complex!)

// IMMUTABLE (cache forever)
<script src="/bundle.abc123.js"></script>
// Updated bundle β†’ New hash β†’ New URL β†’ Old cache unaffected βœ“

// Implementation
const crypto = require('crypto');
const fs = require('fs');

function generateImmutableAssetURL(filePath) {
  const content = fs.readFileSync(filePath);
  const hash = crypto.createHash('sha256')
    .update(content)
    .digest('hex')
    .substring(0, 8);

  const extension = filePath.split('.').pop();
  const basename = filePath.replace(`.${extension}`, '');

  // Immutable URL: content hash in filename
  const immutableURL = `${basename}.${hash}.${extension}`;

  // HTTP headers for immutable cache
  // Cache-Control: public, max-age=31536000, immutable
  // Result: Browser never revalidates (cache forever)

  return immutableURL;
}

// Example
generateImmutableAssetURL('bundle.js');  // bundle.abc12345.js
// Change one byte β†’ Different hash β†’ Different URL β†’ New cache entry

Prerequisites: None - foundational concept

Related Concepts:

Used In Systems:

  • Kafka: Message streaming with immutable logs
  • Git: Version control with immutable commits
  • Blockchain: Immutable distributed ledger

Explained In Detail:

  • Kafka Deep Dive - Immutable log architecture in depth

Quick Self-Check

  • Can explain immutability in 60 seconds?
  • Understand difference between mutable and immutable data?
  • Know 3 benefits of immutability in distributed systems?
  • Can explain how Kafka uses immutability for performance?
  • Understand trade-offs (storage cost, GDPR)?
  • Can implement simple event sourcing pattern?