Concepts Library

Periodically saving processing state to enable recovery from failures without reprocessing all data from the beginning

7 min 60% interviews

Failover

Automatic switching to a backup system or replica when the primary fails, ensuring service continuity with minimal downtime

7 min 70% interviews

Quorum

The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures

7 min 70% interviews

Idempotence

Operations that produce the same result when applied multiple times, critical for reliable distributed systems with retries and duplicate message handling

8 min 70% interviews

Load Balancing

Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck

8 min 80% interviews

Eventual Consistency

A consistency model where updates eventually propagate to all replicas, prioritizing availability over immediate consistency in distributed systems

9 min 75% interviews

Leader-Follower Replication

How distributed systems achieve fault tolerance and high availability by replicating data from a leader node to multiple follower nodes

12 min 85% interviews

Event Sourcing

Prerequisites: log-based-storage

An architectural pattern that stores all changes to application state as a sequence of events, enabling complete audit trails and time-travel capabilities

10 min 70% interviews

CQRS (Command Query Responsibility Segregation)

Prerequisites: event-sourcing

An architectural pattern that separates read and write operations into distinct models, optimizing each for its specific use case

11 min 75% interviews

Consensus

Prerequisites: leader-follower-replication

How distributed systems agree on a single value or state across multiple nodes, enabling coordination despite failures and network partitions

12 min 75% interviews

Messaging

Event streaming and communication

6 concept s

Producer Acknowledgments

Prerequisites: topic-partitioningleader-follower-replication

Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability

7 min 65% interviews

Producer Batching

How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O

8 min 70% interviews

Topic Partitioning

How distributed systems divide data into partitions for parallel processing, ordering guarantees, and horizontal scalability

8 min 85% interviews

Offset Management

Prerequisites: topic-partitioningconsumer-groups

How distributed messaging systems track consumer progress through partitions using offsets, enabling fault tolerance, exactly-once processing, and replay capabilities

9 min 80% interviews

Consumer Groups

Prerequisites: topic-partitioning

How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees

10 min 90% interviews

Exactly-Once Semantics

Prerequisites: producer-batching

How distributed messaging systems guarantee each message is processed exactly once, eliminating duplicates while ensuring atomicity across multiple operations

12 min 85% interviews

Storage

Data persistence and retrieval

3 concept s

Write-Ahead Log (WAL)

A technique where changes are written to a durable log before being applied to the database, enabling crash recovery and replication in database systems

8 min 65% interviews

Log-Based Storage

How distributed systems use append-only logs for durable, ordered, and high-throughput data storage with time-travel and replay capabilities

10 min 75% interviews

Sharding