Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.
Visual Overview
Consumer Group Architecture
Consumer Group Architecture
CONSUMER GROUP ARCHITECTURE:
Topic: user-events (4 partitions)
┌──────────┬──────────┬──────────┬──────────┐│ Part 0 │ Part 1 │ Part 2 │ Part 3 │└──────────┴──────────┴──────────┴──────────┘││││││││▼▼▼▼┌──────────┬──────────┬──────────┬──────────┐│ Consumer │ Consumer │ Consumer │ Consumer ││ A │ B │ C │ D │└──────────┴──────────┴──────────┴──────────┘Group: "analytics-processors"
KEY GUARANTEES:
├── Each partition assigned to exactly ONE consumer in group
├── Each consumer can handle multiple partitions
├──Automatic rebalancing on member changes
└──Fault tolerance through coordinator failover
REBALANCING SCENARIOS:
1. Consumer joins: 4 partitions → 5 consumers (rebalance)
2. Consumer crashes: 4 partitions → 3 consumers (rebalance)
3. Partition added: New partition needs assignment (rebalance)
Core Explanation
What is a Consumer Group?
A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:
Load distribution: Partitions spread across consumers
Topic: user-events (4 partitions)
│├───►Group: "analytics" (processes all events)
│ Consumer A: [P0, P1]
│ Consumer B: [P2, P3]
│└───►Group: "fraud-detection" (also processes all events)
Consumer X: [P0, P1, P2, P3]
Each group independently consumes ALL messages.
Groups do NOT affect each other.
Use Case - Multiple Processing Pipelines:
Multiple Processing Pipelines
Multiple Processing Pipelines
Topic: "user-actions"
Group 1: "real-time-analytics"
→Processes events for live dashboards
Group 2: "ml-feature-pipeline"
→Extracts features for ML models
Group 3: "audit-logger"
→Archives events for compliance
All three groups consume the SAME messages independently.
✕ Rebalancing overhead on frequent consumer changes
Real Systems Using This
Kafka (Apache)
Implementation: Group coordinator per partition in __consumer_offsets
Scale: Thousands of consumer groups processing trillions of messages
Typical Setup: 10-50 consumers per group for high-throughput topics
Amazon Kinesis
Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
Scale: Auto-scaling consumer groups based on shard count
Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges
Apache Pulsar
Implementation: Shared subscription model (similar to consumer groups)
Scale: Automatic load rebalancing without stop-the-world pauses
Typical Setup: Dynamic consumer scaling with minimal disruption
When to Use Consumer Groups
✓ Perfect Use Cases
High-Throughput Event Processing
High-Throughput Event Processing
High-Throughput Event Processing
Scenario: Processing 1M events/sec from user activity stream
Solution: Consumer group with 100 consumers (10K events/sec each)
Result: Linear scaling, automatic fault tolerance
Parallel Data Pipeline
Parallel Data Pipeline
Parallel Data Pipeline
Scenario: Real-time ETL from Kafka to data warehouse
Solution: Consumer group with partitions = number of available cores
Result: Maximize parallelism while maintaining ordering per partition
Multiple Processing Pipelines
Multiple Processing Pipelines Use Case
Multiple Processing Pipelines Use Case
Scenario: Same events need processing by analytics, ML, and audit systems
Solution: Three separate consumer groups on same topic
Result: Independent processingwithout interfering with each other
✕ When NOT to Use
Need Broadcast to All Consumers
Broadcast Requirement
Broadcast Requirement
Problem: Every consumer must receive ALL messages
Issue: Consumer groups distribute messages (each gets subset)
Alternative: Use separate consumer groups or pub-sub pattern
Very Low Latency Requirements
Very Low Latency Requirements
Very Low Latency Requirements
Problem: Sub-millisecond latency critical
Issue: Rebalancing causes temporary processing pauseAlternative: Single consumer or fixed partition assignment
More Consumers than Partitions Long-Term
More Consumers than Partitions
More Consumers than Partitions
Problem: Want to run 100 consumers with only 10 partitions
Issue: 90 consumers will be idle, wasting resourcesAlternative: Increase partition count or reduce consumers
Interview Application
Common Interview Question 1
Q: “You have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”
Strong Answer:
“Only 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”
Why this is good:
Shows understanding of partition assignment constraint
Identifies the inefficiency
Provides multiple solutions
Considers future scaling
Common Interview Question 2
Q: “What happens during a consumer group rebalance? How does it affect processing?”
Strong Answer:
“Rebalancing occurs when consumers join, leave, or crash. The process:
Coordinator detects the change (heartbeat timeout or explicit notification)
Sends REBALANCE_IN_PROGRESS to all group members
Consumers stop processing and commit their offsets
All consumers re-join the group
Coordinator calculates new partition assignments using the configured strategy
Consumers receive new assignments and resume processing
Impact: Processing pauses for ~500ms to several seconds. In production, we minimize rebalances by:
Using static membership (Kafka 2.3+) to avoid rebalances on restarts
Tuning session.timeout.ms and heartbeat.interval.ms
Using sticky assignor to minimize partition movement
Graceful shutdowns with proper leave group notifications”
Why this is good:
Detailed step-by-step understanding
Quantifies the impact
Shows production awareness
Provides optimization strategies
Red Flags to Avoid
✕ Confusing consumer groups with partition replicas
✕ Claiming you can assign same partition to multiple consumers in one group
✕ Not knowing about rebalancing and its impact
✕ Forgetting that consumer count cannot exceed partition count for effectiveness
Quick Self-Check
Before moving on, can you:
Explain consumer groups in 60 seconds?
Draw a diagram showing partition-to-consumer assignment?
Explain what triggers a rebalance?
Calculate optimal consumer count given partition count?