Consumer Groups | Concepts

TL;DR

Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.

Visual Overview

CONSUMER GROUP ARCHITECTURE:

Topic: user-events (4 partitions)
┌─────────┬─────────┬─────────┬─────────┐
│Part 0   │Part 1   │Part 2   │Part 3   │
└─────────┴─────────┴─────────┴─────────┘
     │        │        │        │
     │        │        │        │
     ▼        ▼        ▼        ▼
┌─────────┬─────────┬─────────┬─────────┐
│Consumer │Consumer │Consumer │Consumer │
│   A     │   B     │   C     │   D     │
└─────────┴─────────┴─────────┴─────────┘

Group: "analytics-processors"

KEY GUARANTEES:
├── Each partition assigned to exactly ONE consumer in group
├── Each consumer can handle multiple partitions
├── Automatic rebalancing on member changes
└── Fault tolerance through coordinator failover

REBALANCING SCENARIOS:
1. Consumer joins:   4 partitions → 5 consumers (rebalance)
2. Consumer crashes: 4 partitions → 3 consumers (rebalance)
3. Partition added:  New partition needs assignment (rebalance)

Core Explanation

What is a Consumer Group?

A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:

Load distribution: Partitions spread across consumers
Fault tolerance: Failed consumers automatically replaced
Scaling: Add/remove consumers dynamically
Coordination: Group coordinator manages partition assignments

Partition Assignment Guarantee

The Golden Rule:

Each partition is assigned to exactly one consumer within a consumer group at any given time.

VALID ASSIGNMENT (4 partitions, 3 consumers):
Consumer A: [Partition 0, Partition 1]
Consumer B: [Partition 2]
Consumer C: [Partition 3]
✓ Each partition assigned exactly once

INVALID ASSIGNMENT:
Consumer A: [Partition 0]
Consumer B: [Partition 0]  ✕ Partition 0 assigned twice!

This guarantee ensures:

No duplicate processing within a group
Ordering maintained per partition
Clear ownership of each partition

How Partition Assignment Works

Assignment Strategies:

// 1. RANGE STRATEGY (default)
// Assigns consecutive partitions to consumers
Topic: user-events (6 partitions)
Consumer A: [0, 1]
Consumer B: [2, 3]
Consumer C: [4, 5]
// Pro: Simple, predictable
// Con: Uneven if partition count doesn't divide evenly

// 2. ROUND-ROBIN STRATEGY
// Distributes partitions one-by-one in round-robin
Topic: user-events (6 partitions)
Consumer A: [0, 3]
Consumer B: [1, 4]
Consumer C: [2, 5]
// Pro: Even distribution
// Con: Less predictable, more partition movement on rebalance

// 3. STICKY STRATEGY
// Minimizes partition movement during rebalance
// Keeps existing assignments when possible
// Pro: Reduces rebalancing overhead
// Con: Slightly more complex

Configuration:

Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "analytics-processors");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    "org.apache.kafka.clients.consumer.RangeAssignor");

Group Coordinator and Rebalancing

Coordinator Selection:

Group ID: "analytics-processors"
                ↓
    hash(groupId) % num_partitions(__consumer_offsets)
                ↓
    Partition 23 in __consumer_offsets
                ↓
    Broker 2 (leader of partition 23)
                ↓
    Broker 2 becomes Group Coordinator

Rebalancing Protocol (Simplified):

REBALANCING FLOW:

1. TRIGGER EVENT
   ├── Consumer joins group
   ├── Consumer leaves/crashes
   ├── Consumer heartbeat timeout
   └── Partition count changes

2. COORDINATOR INITIATES REBALANCE
   ├── Sends REBALANCE_IN_PROGRESS to all consumers
   └── Consumers stop processing, commit offsets

3. JOIN GROUP PHASE
   ├── All consumers re-join group
   ├── Send their supported partition assignment strategies
   └── Coordinator collects member info

4. ASSIGNMENT PHASE
   ├── Coordinator runs assignment strategy
   ├── Calculates new partition assignments
   └── Sends assignments to consumers

5. RESUME PROCESSING
   └── Consumers start consuming from new assignments

TOTAL REBALANCE TIME: ~500ms to several seconds

Scaling Patterns

Under-Subscribed (Fewer Consumers than Partitions):

4 Partitions, 2 Consumers:
Consumer A: [P0, P1]
Consumer B: [P2, P3]

Throughput: 2x (2 parallel consumers)
Utilization: 100% (all consumers busy)

Fully-Subscribed (Equal Consumers and Partitions):

4 Partitions, 4 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]

Throughput: 4x (4 parallel consumers)
Utilization: 100% (optimal)

Over-Subscribed (More Consumers than Partitions):

4 Partitions, 6 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]
Consumer E: []  ⚠️ IDLE
Consumer F: []  ⚠️ IDLE

Throughput: 4x (limited by partitions)
Utilization: 67% (2 consumers wasted)

✕ Cannot scale beyond partition count!

Multiple Consumer Groups

Independent Processing:

Topic: user-events (4 partitions)
        │
        ├───► Group: "analytics" (processes all events)
        │     Consumer A: [P0, P1]
        │     Consumer B: [P2, P3]
        │
        └───► Group: "fraud-detection" (also processes all events)
              Consumer X: [P0, P1, P2, P3]

Each group independently consumes ALL messages.
Groups do NOT affect each other.

Use Case - Multiple Processing Pipelines:

Topic: "user-actions"

Group 1: "real-time-analytics"
  → Processes events for live dashboards

Group 2: "ml-feature-pipeline"
  → Extracts features for ML models

Group 3: "audit-logger"
  → Archives events for compliance

All three groups consume the SAME messages independently.

Tradeoffs

Advantages:

✓ Horizontal scalability (add more consumers)
✓ Automatic fault tolerance (consumer failures handled)
✓ Load balancing across consumers
✓ Multiple independent processing pipelines (multiple groups)

Disadvantages:

✕ Rebalancing causes processing pause (stop-the-world)
✕ Cannot scale beyond partition count
✕ Partition assignment may be uneven
✕ Rebalancing overhead on frequent consumer changes

Real Systems Using This

Kafka (Apache)

Implementation: Group coordinator per partition in __consumer_offsets
Scale: Thousands of consumer groups processing trillions of messages
Typical Setup: 10-50 consumers per group for high-throughput topics

Amazon Kinesis

Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
Scale: Auto-scaling consumer groups based on shard count
Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges

Apache Pulsar

Implementation: Shared subscription model (similar to consumer groups)
Scale: Automatic load rebalancing without stop-the-world pauses
Typical Setup: Dynamic consumer scaling with minimal disruption

When to Use Consumer Groups

✓ Perfect Use Cases

High-Throughput Event Processing

Scenario: Processing 1M events/sec from user activity stream
Solution: Consumer group with 100 consumers (10K events/sec each)
Result: Linear scaling, automatic fault tolerance

Parallel Data Pipeline

Scenario: Real-time ETL from Kafka to data warehouse
Solution: Consumer group with partitions = number of available cores
Result: Maximize parallelism while maintaining ordering per partition

Multiple Processing Pipelines

Scenario: Same events need processing by analytics, ML, and audit systems
Solution: Three separate consumer groups on same topic
Result: Independent processing without interfering with each other

✕ When NOT to Use

Need Broadcast to All Consumers

Problem: Every consumer must receive ALL messages
Issue: Consumer groups distribute messages (each gets subset)
Alternative: Use separate consumer groups or pub-sub pattern

Very Low Latency Requirements

Problem: Sub-millisecond latency critical
Issue: Rebalancing causes temporary processing pause
Alternative: Single consumer or fixed partition assignment

More Consumers than Partitions Long-Term

Problem: Want to run 100 consumers with only 10 partitions
Issue: 90 consumers will be idle, wasting resources
Alternative: Increase partition count or reduce consumers

Interview Application

Common Interview Question 1

Q: “You have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”

Strong Answer:

“Only 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”

Why this is good:

Shows understanding of partition assignment constraint
Identifies the inefficiency
Provides multiple solutions
Considers future scaling

Common Interview Question 2

Q: “What happens during a consumer group rebalance? How does it affect processing?”