Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck
Load balancing distributes network traffic or computational workload across multiple servers using algorithms like round-robin, least-connections, or consistent hashing to prevent any single server from being overwhelmed. Essential for scalability, high availability, and optimized resource utilization in systems like AWS ELB, nginx, and HAProxy.
Visual Overview
Load Balancing Overview
Load Balancing Overview
WITHOUT LOAD BALANCING (Single Server)
┌────────────────────────────────────────────────┐│ All traffic → Single Server ││││ 100 req/s →┌──────────┐│││ Server ││││Overload!│││└──────────┘││││ Problems: ││ - Single point of failure ✕ ││ - Limited capacity ✕ ││ - High latency under load ✕ ││ - No redundancy ✕ │└────────────────────────────────────────────────┘WITH LOAD BALANCING (Distributed)
┌────────────────────────────────────────────────┐│Load Balancer││┌─────────────┐││ 100 req │ Nginx/ │││ /s →│ELB/ ││││HAProxy│││└─────────────┘││↓││┌────────┼────────┐││↓↓↓││┌───────┐┌───────┐┌───────┐│││Server1││Server2││Server3││││33 req/││33 req/││33 req/││││ s ││ s ││ s │││└───────┘└───────┘└───────┘││││ Benefits: ││✓High availability (failover) ││✓Horizontal scalability (add servers) ││✓Better resource utilization││✓Health checks & auto-routing│└────────────────────────────────────────────────┘LOAD BALANCING ALGORITHMS COMPARISON
┌────────────────────────────────────────────────┐│Round Robin (sequential distribution): ││ Request 1 → Server 1 ││ Request 2 → Server 2 ││ Request 3 → Server 3 ││ Request 4 → Server 1 (cycle repeats) ││││Least Connections (dynamic balancing): ││ Server 1: 5 active connections ││ Server 2: 3 active connections ✓ (chosen) ││ Server 3: 8 active connections ││→ Route to server with fewest connections ││││Consistent Hashing (sticky routing): ││ hash(user_id) % num_servers ││ User 123 → Server 2 (always same server) ││ User 456 → Server 1 (always same server) ││→ Same client always routes to same server │└────────────────────────────────────────────────┘LAYER 4 VS LAYER 7 LOAD BALANCING
┌────────────────────────────────────────────────┐│ Layer 4 (Transport Layer - TCP/UDP): ││┌──────────────┐│││ Client ││││ 1.2.3.4:5678 │││└──────────────┘││↓││ Load balancer sees: IP + Port ││Routes based on: TCP connection ││ Cannot see: HTTP headers, URLs, cookies ││↓││ Backend server receives original connection ││││ + Faster (no HTTP parsing) ││ + Lower latency (~1-2ms) ││ - Limited routing logic ││││ Layer 7 (Application Layer - HTTP): ││┌──────────────┐│││ Client ││││ GET /api/users││││ Cookie: xyz │││└──────────────┘││↓││ Load balancer sees: Full HTTP request ││Routes based on: URL, headers, cookies ││↓││ /api/users → Backend Pool A ││ /static/* → Backend Pool B (CDN) ││││ + Advanced routing (path, host, cookie) ││ + SSL termination ││ - Slower (HTTP parsing, ~5-10ms) │└────────────────────────────────────────────────┘
Core Explanation
What is Load Balancing?
Load balancing is the process of distributing incoming requests across multiple backend servers to:
Optimize resource utilization: No server is overloaded while others are idle
Maximize throughput: Handle more requests by adding servers
Minimize latency: Route to least-loaded or nearest server
Ensure high availability: Route around failed servers
Load Balancing Algorithms
1. Round Robin
Round Robin Algorithm
Round Robin Algorithm
Simple sequential distribution:
Incoming requests: Backend servers:
Request 1 ──────────→ Server 1
Request 2 ──────────→ Server 2
Request 3 ──────────→ Server 3
Request 4 ──────────→ Server 1 (cycle repeats)
Pros:
✓Simple implementation
✓Even distribution (if all requests equal)
✓Stateless (no tracking needed)
Cons:
✗Doesn't account for server capacity
✗Doesn't account for request complexity
✗ Long-running requests can overload one server
Use case: Stateless microservices with uniform requests
2. Weighted Round Robin
Weighted Round Robin Algorithm
Weighted Round Robin Algorithm
Distribute based on server capacity:
Backend servers with weights:
Server 1 (weight=5): More powerful
Server 2 (weight=3): Medium capacity
Server 3 (weight=2): Less powerful
Distribution pattern:
5 requests → Server 1
3 requests → Server 2
2 requests → Server 3
(repeat)
Pros:
✓Accounts for heterogeneous server capacity
✓Efficient resource utilization
Cons:
✗ Still doesn't account for dynamic load
✗Requires manual weight configuration
Use case: Mixed hardware (different CPU/RAM capacities)
3. Least Connections
Least Connections Algorithm
Least Connections Algorithm
Route to server with fewest active connections:
Real-time server state:
Server 1: 25 active connections
Server 2: 15 active connections ✓ (chosen)
Server 3: 30 active connections
New request → Server 2 (fewest connections)
Pros:
✓Dynamic load balancing
✓ Accounts for long-running connections
✓Better for variable request durations
Cons:
✗Requires tracking connection state
✗More complex implementation
Use case: HTTP/1.1 keep-alive, websockets, long-polling
4. Weighted Least Connections
Weighted Least Connections Algorithm
Weighted Least Connections Algorithm
Combines least connections with server weights:
Formula: connections / weight
Server 1: 20 connections, weight=5 → score = 4.0
Server 2: 12 connections, weight=3 → score = 4.0
Server 3: 10 connections, weight=2 → score = 5.0 ✓ (highest)
Route to Server 1 or 2 (lowest score)
Pros:
✓Best of both worlds (capacity + dynamic load)
Use case: Production systems with mixed hardware
5. Least Response Time
Least Response Time Algorithm
Least Response Time Algorithm
Route to server with fastest response time:
Recent response times (moving average):
Server 1: 50ms average
Server 2: 30ms average ✓ (chosen)
Server 3: 100ms average
Pros:
✓Optimizes user experience
✓Automatically adapts to server performance
✓ Accounts for network latency
Cons:
✗Requires active health checks
✗ Can amplify cascading failuresUse case: Geo-distributed deployments
6. IP Hash (Consistent Hashing)
IP Hash (Consistent Hashing)
IP Hash (Consistent Hashing)
Hash client IP to deterministically select server:
hash(client_ip) % num_servers
Client 1.2.3.4 → hash % 3 = 1 → Server 1 (always)
Client 5.6.7.8 → hash % 3 = 2 → Server 2 (always)
Client 9.10.11.12 → hash % 3 = 0 → Server 3 (always)
Pros:
✓Session persistence (same client → same server)
✓Useful for caching (server caches client data)
✓No shared session storage needed
Cons:
✗Uneven distribution if client IPs clustered
✗ Server addition/removal disrupts assignments
Use case: Stateful applications with server-side sessions
7. Least Bandwidth
Least Bandwidth Algorithm
Least Bandwidth Algorithm
Route to server currently serving least bandwidth:
Server 1: 500 Mbps
Server 2: 300 Mbps ✓ (chosen)
Server 3: 700 Mbps
Use case: Video streaming, large file downloads
Layer 4 vs Layer 7 Load Balancing
Layer 4 (Transport Layer)
Layer 4 Load Balancing
Layer 4 Load Balancing
OSI Layer: Transport (TCP/UDP)
┌────────────────────────────────────────┐│What it sees: ││ - Source IP + Port ││ - Destination IP + Port ││ - TCP/UDP protocol ││││What it CAN'T see: ││ - HTTP headers ││ - URLs, query parameters ││ - Cookies ││ - Request body ││││Routing decisions based on: ││ - IP address ││ - Port number ││ - Protocol (TCP vs UDP) │└────────────────────────────────────────┘Example: AWS Network Load Balancer (NLB)
Pros:
✓Very fast (< 1ms latency)
✓High throughput (millions of requests/sec)
✓Low CPU usage
✓ Supports any TCP/UDP protocol
✓Preserves client IP (pass-through)
Cons:
✗No content-based routing
✗No SSL termination✗Limited health checks
Use case: TCP-based services, ultra-low latency requirements
Layer 7 (Application Layer)
Layer 7 Load Balancing
Layer 7 Load Balancing
OSI Layer: Application (HTTP/HTTPS)
┌────────────────────────────────────────┐│What it sees: ││ - Full HTTP request ││ - Headers (User-Agent, Host, etc.) ││ - URL path and query parameters ││ - Cookies ││ - Request body ││││Routing decisions based on: ││ - URL path: /api/* → API servers ││ - Host header: api.example.com ││ - Cookie: user_id=123 ││ - HTTP method: POST vs GET ││ - Custom headers │└────────────────────────────────────────┘Example: AWS Application Load Balancer (ALB), nginx
Pros:
✓Content-based routing (path, host, headers)
✓SSL/TLS termination (decrypt at LB)
✓Advanced health checks (HTTP status codes)
✓ Request/response manipulation
✓ Web Application Firewall (WAF) integration
Cons:
✗Slower (5-10ms latency due to HTTP parsing)
✗Higher CPU usage
✗More complex configuration
Use case: HTTP microservices, API gateways, web applications
Health Checks & Failover
Health Checks & Failover
Health Checks & Failover
Health Check Mechanisms:
1. Active Health Checks:
┌────────────────────────────────────────┐│ Load Balancer → Backend Server ││GET /health every 10 seconds ││↓││ Server responds: 200 OK✓││ or ││ Server timeout/error→ Mark unhealthy │└────────────────────────────────────────┘Configuration:
- Interval: 10s (how often to check)
- Timeout: 5s (max wait for response)
- Unhealthy threshold: 3 (failures before marking down)
- Healthy threshold: 2 (successes before marking up)
2. Passive Health Checks:
┌────────────────────────────────────────┐│Monitor real traffic: ││ Server returns 5xx errors→Unhealthy││ Server timeout→Unhealthy││ Server 2xx responses→Healthy│└────────────────────────────────────────┘Failover Flow:
┌────────────────────────────────────────┐│ 1. Server 2 fails health check ││ 2. Load balancer marks Server 2 DOWN││ 3. New requests → Server 1 & 3 only ││ 4. Server 2 recovers││ 5. Passes health checks (2x) ││ 6. Load balancer marks Server 2 UP││ 7. Resume sending traffic to Server 2 │└────────────────────────────────────────┘
Session Persistence (Sticky Sessions)
Session Persistence (Sticky Sessions)
Session Persistence (Sticky Sessions)
Problem: User session stored on specific server
Without sticky sessions:
┌────────────────────────────────────────┐│ Request 1: Login→ Server 1 (session) ││ Request 2: Get data → Server 2 ✗││ (Server 2 doesn't have session) │└────────────────────────────────────────┘Solution 1: Cookie-based sticky sessions:
┌────────────────────────────────────────┐│ Request 1: Login→ Server 1 ││ Response: Set-Cookie: server=1 ││ Request 2: Cookie: server=1 → Server 1││ (LB reads cookie, routes to Server 1) │└────────────────────────────────────────┘Solution 2: IP hash sticky sessions:
┌────────────────────────────────────────┐│ hash(client_ip) always → same server ││ Client 1.2.3.4 → Server 1 (always) │└────────────────────────────────────────┘Solution 3: Session replication (better):
┌────────────────────────────────────────┐│Store sessions in Redis/Memcached ││ Any server can access session ││No sticky sessions needed✓│└────────────────────────────────────────┘
Real Systems Using Load Balancing
System
Type
Algorithms
Key Features
Use Case
AWS ELB (ALB)
Layer 7
Round robin, least outstanding requests
Content-based routing, SSL termination
HTTP microservices
AWS NLB
Layer 4
Flow hash
Ultra-low latency, static IP
TCP services, high throughput
nginx
Layer 7
Round robin, least_conn, ip_hash
Open source, highly configurable
Web servers, API gateway
HAProxy
Layer 4/7
Weighted RR, least_conn, consistent hash
High performance, advanced ACLs
Enterprise load balancing
Envoy
Layer 7
Weighted RR, least_request, ring_hash
Service mesh, observability
Kubernetes, microservices
Cloudflare
Layer 7
Geo-routing, weighted pools
DDoS protection, CDN
Global load balancing
Case Study: AWS Application Load Balancer
AWS ALB Architecture
AWS ALB Architecture
AWS ALB Architecture:
┌──────────────────────────────────────────────┐│Internet││↓││ ALB (multi-AZ for high availability) ││├─Availability Zone 1 ││└─Availability Zone 2 ││↓││Target Groups: ││├─ API Servers (port 3000) │││└─ /api/_ → API target group ││├─ Web Servers (port 80) │││└─ /_ → Web target group ││└─ Admin Servers (port 8080) ││└─ /admin/* → Admin target group │└──────────────────────────────────────────────┘Routing Rules:
1. Path-based: /api/* → API servers
2. Host-based: admin.example.com → Admin servers
3. Header-based: X-API-Version: v2 → V2 servers
Health Checks:
- Protocol: HTTP
- Path: /health
- Interval: 30s
- Timeout: 5s
- Healthy threshold: 5
- Unhealthy threshold: 2
Features:
✓SSL/TLS termination (offload from servers)
✓WebSocket support✓HTTP/2 support✓ Integration with Auto Scaling✓CloudWatch metrics
Case Study: nginx Load Balancer
# nginx.conf - Load Balancer Configuration# Define upstream backend serversupstream backend { # Load balancing algorithm least_conn; # Use least connections # Backend servers with weights server backend1.example.com:8080 weight=5; server backend2.example.com:8080 weight=3; server backend3.example.com:8080 weight=2; # Server with max connections limit server backend4.example.com:8080 max_conns=100; # Backup server (used only if others fail) server backup.example.com:8080 backup; # Health check configuration keepalive 32; # Keep 32 connections alive}# API servers upstreamupstream api_servers { # Consistent hashing based on client IP ip_hash; server api1.example.com:3000; server api2.example.com:3000; server api3.example.com:3000;}server { listen 80; server_name example.com; # Health check endpoint location /health { access_log off; return 200 "healthy\n"; } # Route /api/* to API servers location /api/ { proxy_pass http://api_servers; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; # Retry logic proxy_next_upstream error timeout http_502 http_503; proxy_next_upstream_tries 3; } # Route all other traffic to backend location / { proxy_pass http://backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } # Static files (no load balancing needed) location /static/ { root /var/www; expires 1d; }}# SSL/TLS configurationserver { listen 443 ssl http2; server_name example.com; ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate_key /etc/nginx/ssl/key.pem; # SSL termination (decrypt here, forward HTTP to backend) location / { proxy_pass http://backend; }}
When to Use Load Balancing
✓ Perfect Use Cases
High Traffic Web Applications
High Traffic Web Applications
High Traffic Web Applications
Scenario: E-commerce site with millions of users
Requirements: Handle 100,000 requests/second
Solution: Layer 7 ALB with 50 backend servers
Benefit: Horizontal scalability, failover, health checks
Microservices Architecture
Microservices Architecture
Microservices Architecture
Scenario: 100+ microservices communicating
Solution: Service mesh (Envoy/Linkerd) with load balancing per service
Benefit: Automatic service discovery, circuit breaking, observability
Global Applications (Geo-Load Balancing)
Global Applications
Global Applications
Scenario: Users worldwide accessing application
Solution: DNS-based load balancing (Route53, Cloudflare)
Route: US users → US region, EU users → EU region
Benefit: Low latency, disaster recovery
Database Read Replicas
Database Read Replicas
Database Read Replicas
Scenario: Read-heavy application with MySQL replicas
Solution: Load balancer distributing reads across 5 replicas
Algorithm: Least connections (account for query duration)
Benefit: Scale read throughput
✕ When NOT to Use (or Use Carefully)
Single Server Deployment
Single Server Deployment Warning
Single Server Deployment Warning
Problem: Adds complexity and latency for no benefitAlternative: Direct connection to server
Example: Development environment, small apps
Alerts: Health check failures, high latency, 5xx errors
Dashboard: Real-time traffic distribution per server
Scaling:
Auto Scaling Group: Add servers when CPU > 70%
Load balancer auto-registers new instances
Graceful shutdown: Drain connections before removing server
Trade-offs:
Layer 7 LB adds 5-10ms latency vs Layer 4 (~1ms)
But enables advanced routing and SSL termination
For ultra-low latency, use Layer 4 or client-side LB”
Code Example
Simple Round Robin Load Balancer
import requestsimport timefrom typing import Listfrom dataclasses import dataclassimport threading@dataclassclass BackendServer: """Represents a backend server""" host: str port: int weight: int = 1 healthy: bool = True active_connections: int = 0class LoadBalancer: """ Simple load balancer implementing multiple algorithms """ def __init__(self, servers: List[BackendServer]): self.servers = servers self.current_index = 0 # For round robin self.lock = threading.Lock() # Start health check thread self.health_check_thread = threading.Thread( target=self._health_check_loop, daemon=True ) self.health_check_thread.start() def round_robin(self) -> BackendServer: """Simple round robin algorithm""" with self.lock: # Filter healthy servers healthy_servers = [s for s in self.servers if s.healthy] if not healthy_servers: raise Exception("No healthy servers available") # Get next server in round-robin fashion server = healthy_servers[self.current_index % len(healthy_servers)] self.current_index += 1 return server def weighted_round_robin(self) -> BackendServer: """Weighted round robin based on server capacity""" with self.lock: healthy_servers = [s for s in self.servers if s.healthy] if not healthy_servers: raise Exception("No healthy servers available") # Build weighted list (repeat servers based on weight) weighted_list = [] for server in healthy_servers: weighted_list.extend([server] * server.weight) # Round robin through weighted list server = weighted_list[self.current_index % len(weighted_list)] self.current_index += 1 return server def least_connections(self) -> BackendServer: """Route to server with fewest active connections""" with self.lock: healthy_servers = [s for s in self.servers if s.healthy] if not healthy_servers: raise Exception("No healthy servers available") # Find server with minimum connections server = min(healthy_servers, key=lambda s: s.active_connections) return server def weighted_least_connections(self) -> BackendServer: """Weighted least connections (connections / weight)""" with self.lock: healthy_servers = [s for s in self.servers if s.healthy] if not healthy_servers: raise Exception("No healthy servers available") # Find server with minimum connections/weight ratio server = min(healthy_servers, key=lambda s: s.active_connections / s.weight) return server def ip_hash(self, client_ip: str) -> BackendServer: """Consistent hashing based on client IP""" with self.lock: healthy_servers = [s for s in self.servers if s.healthy] if not healthy_servers: raise Exception("No healthy servers available") # Hash client IP to select server hash_value = hash(client_ip) server_index = hash_value % len(healthy_servers) return healthy_servers[server_index] def forward_request(self, request_path: str, algorithm: str = 'round_robin', client_ip: str = None) -> dict: """ Forward request to backend server using specified algorithm """ # Select server based on algorithm if algorithm == 'round_robin': server = self.round_robin() elif algorithm == 'weighted_round_robin': server = self.weighted_round_robin() elif algorithm == 'least_connections': server = self.least_connections() elif algorithm == 'weighted_least_connections': server = self.weighted_least_connections() elif algorithm == 'ip_hash': if not client_ip: raise ValueError("client_ip required for ip_hash algorithm") server = self.ip_hash(client_ip) else: raise ValueError(f"Unknown algorithm: {algorithm}") print(f"Routing to {server.host}:{server.port} " f"(connections: {server.active_connections})") # Increment connection count with self.lock: server.active_connections += 1 try: # Forward request to backend url = f"http://{server.host}:{server.port}{request_path}" response = requests.get(url, timeout=5) return { 'status': response.status_code, 'body': response.text, 'server': f"{server.host}:{server.port}" } except requests.RequestException as e: print(f"Error forwarding to {server.host}:{server.port}: {e}") # Mark server as unhealthy on error with self.lock: server.healthy = False raise finally: # Decrement connection count with self.lock: server.active_connections -= 1 def _health_check_loop(self): """Background thread to perform health checks""" while True: time.sleep(10) # Check every 10 seconds for server in self.servers: healthy = self._check_health(server) with self.lock: if healthy and not server.healthy: print(f"✓ Server {server.host}:{server.port} is now HEALTHY") server.healthy = True elif not healthy and server.healthy: print(f"✗ Server {server.host}:{server.port} is now UNHEALTHY") server.healthy = False def _check_health(self, server: BackendServer) -> bool: """Check if server is healthy""" try: url = f"http://{server.host}:{server.port}/health" response = requests.get(url, timeout=5) return response.status_code == 200 except requests.RequestException: return False def get_status(self) -> dict: """Get load balancer status""" with self.lock: return { 'total_servers': len(self.servers), 'healthy_servers': sum(1 for s in self.servers if s.healthy), 'servers': [ { 'host': s.host, 'port': s.port, 'healthy': s.healthy, 'active_connections': s.active_connections, 'weight': s.weight } for s in self.servers ] }# Usage Exampleif __name__ == '__main__': # Create backend servers servers = [ BackendServer('server1.example.com', 8080, weight=5), BackendServer('server2.example.com', 8080, weight=3), BackendServer('server3.example.com', 8080, weight=2), ] lb = LoadBalancer(servers) # Test different algorithms print("=== Round Robin ===") for i in range(5): try: result = lb.forward_request('/api/users', algorithm='round_robin') print(f"Request {i+1} → {result['server']}") except Exception as e: print(f"Request {i+1} failed: {e}") print("\n=== Least Connections ===") for i in range(5): try: result = lb.forward_request('/api/users', algorithm='least_connections') print(f"Request {i+1} → {result['server']}") except Exception as e: print(f"Request {i+1} failed: {e}") print("\n=== IP Hash (Sticky Sessions) ===") client_ips = ['1.2.3.4', '5.6.7.8', '1.2.3.4', '5.6.7.8'] for i, ip in enumerate(client_ips): try: result = lb.forward_request('/api/users', algorithm='ip_hash', client_ip=ip) print(f"Client {ip} → {result['server']}") except Exception as e: print(f"Request from {ip} failed: {e}") # Get status print("\n=== Load Balancer Status ===") import json print(json.dumps(lb.get_status(), indent=2))