A load balancer distributes incoming traffic across multiple backend servers. The basics are simple; the variations and trade-offs are where engineers spend their time. L4 vs L7. TCP vs HTTP. Sticky sessions vs stateless. Health checks. Connection pooling.
This post walks through the practical differences between load balancer types and what each is good for in 2026.
What a Load Balancer Does
The core job: accept incoming connections, decide which backend should serve them, and forward.
A simple architecture:
Internet → Load Balancer → [Backend 1, Backend 2, Backend 3]
The LB:
- Distributes traffic by some algorithm (round-robin, least-connections, hash-based).
- Checks backend health so it doesn’t send traffic to dead servers.
- Provides a single entry point to clients (one IP, one DNS name).
- Often handles TLS termination so backends only deal with plaintext HTTP.
Layer 4 vs Layer 7
The biggest distinction:
Layer 4 (TCP/UDP)
The LB looks at IP and port; doesn’t understand the application protocol. Forwards bytes.
- Faster. Less processing per packet.
- Works for any TCP/UDP protocol (HTTP, SSH, databases, custom protocols).
- No application-aware features — no URL routing, no header inspection.
- Examples: AWS NLB (Network Load Balancer), Azure Load Balancer, GCP TCP Load Balancer, HAProxy in tcp mode, nginx stream module.
Layer 7 (HTTP/HTTPS)
The LB parses HTTP. Sees URLs, headers, methods.
- Application-aware — can route by hostname, path, header.
- Can rewrite requests/responses — add headers, modify URLs.
- Slower per request (parsing overhead) but irrelevant for most apps.
- HTTP-specific features like sticky sessions via cookies, content compression, WAF integration.
- Examples: AWS ALB (Application Load Balancer), Cloudflare, nginx, HAProxy in http mode, GCP HTTPS Load Balancer.
For most modern web apps, L7 is the default. L4 is for non-HTTP traffic or very high-throughput specialized workloads.
When to Use Each
L7 (application load balancers)
- HTTP/HTTPS traffic.
- Multiple services on the same hostname (path routing).
- Multiple hostnames on the same LB.
- Need WAF, rate limiting, or other HTTP-aware features.
- Want to terminate TLS at the LB.
L4 (network load balancers)
- Non-HTTP protocols (SSH, MySQL, custom UDP).
- Need to preserve client IP without HTTP headers.
- Extreme throughput requirements.
- Multi-protocol traffic.
- Static IPs required (for whitelisting by clients).
You can also stack: L4 in front for raw traffic distribution, then L7 behind for HTTP routing. Many cloud architectures do this.
Load Balancing Algorithms
How the LB picks which backend gets the next request:
Round-robin
Cycle through backends in order. Simple. Good when backends are equivalent and requests have similar cost.
Least connections
Send to the backend with the fewest active connections. Better for long-lived connections of varying cost.
Weighted (any)
Different weights per backend. Useful when backends have different capacity (some are bigger machines).
IP hash
Hash the client’s IP to a backend. Same client always hits the same backend. Useful for session persistence without cookies.
Random
Truly random selection. Sometimes used to spread “thundering herd” effects.
Power-of-two-choices
Pick two backends at random; send to the one with fewer connections. Achieves near-optimal load distribution with less coordination than full least-connections.
For most workloads, round-robin or least-connections is fine. The fancy algorithms matter at higher scale.
Sticky Sessions
“Sticky sessions” mean once a client starts using a backend, subsequent requests go to the same backend.
Why
- The backend has session state in memory.
- Cache warmup on a specific backend matters.
- Connection-based protocols (WebSockets, long-lived TCP).
How L7 LBs do it
Cookie-based: LB sets a cookie identifying which backend the client should use. On subsequent requests, the cookie tells the LB where to send.
How L4 LBs do it
IP hash: client IP determines backend. Same client → same backend. Limitation: many users behind a corporate NAT all go to one backend.
Don’t use sticky sessions if you don’t need them
Sticky sessions reduce load distribution. They also cause “thundering herd” when a sticky backend fails — all its sessions hit other backends at once.
Modern stateless backend designs (sessions in Redis, JWTs) avoid the need for sticky sessions. Default to no sticky sessions unless your design requires them.
Health Checks
The LB needs to know if a backend is healthy. Configurable checks:
TCP check
Open a TCP connection. Success = healthy. Fastest; weakest signal.
HTTP check
Make an HTTP request to a specific path; check response code. Standard for L7 LBs.
Application check
Custom endpoint (/health) that runs internal checks (DB connection, cache reachability, etc.). Reports JSON or specific status codes.
Frequency and thresholds
- Interval: how often to check. 5-15 seconds typical.
- Healthy threshold: consecutive passes to mark healthy. 2-3 typical.
- Unhealthy threshold: consecutive fails to mark unhealthy. 2-3 typical.
Critical detail
Your health check endpoint must not be expensive. If checks fail under load, you mark backends unhealthy at exactly the time you need them. Make /health cheap and isolate it from main application failures.
TLS Termination
Where does HTTPS get decrypted?
At the LB (most common)
Backends see plain HTTP. LB handles certs, TLS negotiation, modern protocols.
- Easier cert management (one cert in one place).
- Backends focus on application logic.
- LB-to-backend usually plain HTTP over private network.
End-to-end
LB passes encrypted bytes through; backends terminate TLS.
- Required for some compliance regimes.
- Useful when you don’t trust the LB-to-backend network.
- More cert management overhead.
SSL passthrough (L4 specific)
L4 LB doesn’t decrypt; just forwards encrypted bytes. SNI-based routing on the encrypted layer.
For most setups, TLS at the LB is the right answer.
Client IP and X-Forwarded-For
When the LB forwards to the backend, the backend sees the LB’s IP, not the user’s. To get the real IP:
- L7 LB: adds
X-Forwarded-Forheader with the original client IP. - L4 LB: may preserve client IP via proxy protocol (header at the TCP level) or use of “client IP preservation” mode.
- AWS specific:
X-Forwarded-Foris the standard; the ALB also setsX-Forwarded-ProtoandX-Forwarded-Port.
For trust patterns, see X-Forwarded-For header.
Connection Pooling
The LB maintains pools of connections to backends. Why it matters:
Without pooling
Every client request opens a new TCP connection to a backend. Lots of TCP overhead.
With pooling
LB keeps connections to backends open and multiplexes. Faster, less overhead.
Configurable: max connections per backend, keepalive duration, etc. Modern LBs do this well by default.
Common LB Products in 2026
Cloud-managed
- AWS ALB / NLB — Standard for AWS workloads.
- GCP Load Balancing — Global, anycast-frontend.
- Azure Load Balancer / Application Gateway — Azure-native.
- Cloudflare Load Balancing — At-the-edge, global.
Self-hosted
- nginx — Industry workhorse.
- HAProxy — Stable, performant, popular for L7 and L4.
- Envoy — Modern, service-mesh-friendly.
- Traefik — Container-friendly, auto-discovery.
Service mesh
- Istio, Linkerd, Consul — LB at the service-to-service level inside clusters.
For new deployments in 2026, the choice usually starts with “are we on a managed cloud LB or do we need self-hosted?” Managed wins by default; self-hosted wins for specific control needs.
Geographic Load Balancing
For globally distributed services, you want users routed to the nearest backend region. Several mechanisms:
Anycast at the edge
One IP advertised from many POPs. See anycast vs unicast. Used by Cloudflare, GCP Global Load Balancing, AWS Global Accelerator.
GeoDNS
DNS returns different A records per query region. Less precise than anycast but simpler.
Application-layer routing
LB inspects the request, looks up geolocation of the source IP, routes to the appropriate regional backend.
The Ip2Geo API can be called from your LB layer for fine-grained geographic routing.
Health Check Failure Modes
A few classic ways health checks go wrong:
Cascading failure
LB marks slow backends unhealthy under load. All traffic hits remaining backends. They get overloaded. LB marks them unhealthy. Nothing healthy left.
Mitigation: panic mode (if too many are marked unhealthy, ignore health checks).
Flapping
Backend bounces between healthy and unhealthy. Connections constantly redistributed. Disruptive.
Mitigation: hysteresis (higher threshold to go healthy than unhealthy).
Healthy-but-broken
Health check passes; application is broken in a specific way. Health checks should exercise the same paths real traffic does.
TL;DR
- L4 load balancers operate on TCP/UDP. Fast, protocol-agnostic.
- L7 load balancers parse HTTP. Routing by URL, header, host. The default for web traffic.
- Algorithms: round-robin, least-connections, IP hash, weighted, power-of-two.
- Sticky sessions: avoid if possible; modern stateless designs don’t need them.
- Health checks: must be cheap; should exercise real code paths.
- TLS termination at LB is the standard pattern.
- X-Forwarded-For carries client IP through L7 LBs.
- Geographic LBs use anycast, GeoDNS, or application-layer routing.
Load balancing is one of those infrastructure areas where 2026’s defaults are excellent. Cloud-managed LBs handle most of what you used to configure manually. The remaining decisions — L4 vs L7, sticky vs stateless, health check design — are mostly straightforward once you internalize the trade-offs. For the related reverse-proxy pattern, see reverse proxy explained; for the geographic-routing layer, anycast vs unicast.