Deep Dive - WebSockets, SSE and Long Polling
Chat apps Uber location updates and stock tickers all require the server to push data to the client the moment it happens without waiting to be asked.
The entire internet was originally built on a simple premise. The client asks and the server answers. This is the Request Response model of HTTP. It is stateless and short lived.
But modern applications often break this rule. Chat apps Uber location updates and stock tickers all require the server to push data to the client the moment it happens without waiting to be asked.
This requirement fundamentally breaks the standard web architecture. It forces us to move from Stateless short lived connections to Stateful persistent connections. This shift introduces massive complexity at the Load Balancer and Kernel levels.
This post will explore the three main ways to achieve real time communication and the deep technical trade offs of each.
Long Polling
Before HTML5 gave us proper tools engineers had to trick the browser into real time behavior. This technique is called Long Polling.
In standard polling the client asks “Is there new data” every second. This burns bandwidth and CPU because 99% of the time the answer is “No”.
In Long Polling the client sends a request but the server does not answer immediately.
The client sends a request
GET /messages.The server receives the request and holds the connection open. It does not block a thread (which would kill the server). Instead it suspends the request context in the Event Loop or uses a Promise to park the connection in memory.
Once a new message actually arrives (e.g. 30 seconds later) the server finally sends the HTTP response with the data.
The client receives the data and immediately sends a new request to start the cycle again.
The Race Condition Gap
There is a fatal flaw in this design known as the Message Loss Gap.
Between the moment the server sends the response (Step 3) and the client sends the new request (Step 4) there is a small window of time where the client is technically not connected.
If a new message arrives during this millisecond gap it might be lost. To fix this engineers must implement complex Message Offsets or cursors so the client can say “I last saw message 100 give me everything after that” when it reconnects.
Server Sent Events SSE
HTML5 introduced a standardized way to keep a connection open forever. This is Server Sent Events (SSE).
SSE allows the server to push data to the client over a single long lived HTTP connection.
The Protocol and Wire Format
The client sends a standard HTTP request but sets the Accept header to text/event-stream.
The server responds with a 200 OK and keeps the socket open. The wire format is strict text.
id: 101
event: price_update
data: {"ticker": "AAPL", "price": 150.22}
id: 102
...The Recovery Mechanism (Last Event ID)
SSE has a built in resilience feature that Long Polling lacks. If the network drops the browser automatically attempts to reconnect. Crucially the browser reads the last id it received (e.g. 101) and sends a special HTTP header Last-Event-ID: 101 in the reconnection request.
The server sees this header checks its backlog and immediately replays any messages the client missed (e.g. 102 and 103). This makes SSE extremely robust for feeds.
WebSockets
When you need true real time interaction like a multiplayer game or a collaborative Google Doc SSE is not enough. You need WebSockets.
WebSockets allow for full duplex communication. Both the client and server can send binary data back and forth at any time over a single persistent TCP connection.
The Handshake Math
A WebSocket starts its life as a standard HTTP request but involves a cryptographic challenge to prove the server handles WebSockets.
Client Request - Contains
Connection: Upgradeand a random keySec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==.Server Logic - The server takes this key appends a specific Magic String UUID
258EAFA5-E914-47DA-95CA-C5AB0DC85B11hashes the result with SHA-1 and Base64 encodes it.Server Response - Sends the hash back in
Sec-WebSocket-Accept.
This handshake ensures that a simple HTTP server does not accidentally get confused by a WebSocket client.
The Frame Anatomy and Masking
Once upgraded data is sent in lightweight Frames not packets.
Fin Bit - Indicates if this is the last frame of the message.
Opcode - Defines the payload (Text Binary Ping Pong or Close).
Masking Key - This is critical. All frames sent from Client to Server must be masked (XORed) with a random 4-byte key.
Why Mask? - This prevents Proxy Cache Poisoning. If a client sends malicious bytes that look like a valid HTTP response an intermediary proxy might cache it and serve it to other users. Masking ensures the bytes on the wire look like random noise to any proxy in the middle.
The Load Balancing Nightmare
Moving from Stateless HTTP to Stateful WebSockets creates a massive headache for infrastructure engineers.
The Sticky Session Problem
In HTTP it does not matter which server handles request #1 and request #2. In WebSockets the connection is physically pinned to one specific server (e.g. Server A).
If Server A holds the socket for a chat room Server B knows nothing about it.
Solution You need a backend Pub Sub mechanism (like Redis) so Server B can publish the message and Server A can pick it up and push it down the WebSocket.
The File Descriptor Limit
In a standard web server a request finishes in 100ms so one server can handle millions of requests per hour because it recycles the connections.
With WebSockets a connection stays open for hours.
The Constraint - Every open connection consumes a File Descriptor in Linux.
The Limit - By default Linux limits open files to $1024$. You must tune the kernel (
fs.file-maxandulimit) to allow hundreds of thousands of open descriptors.The Ephemeral Port Limit - A TCP connection is defined by a 4-tuple
(SrcIP, SrcPort, DstIP, DstPort). The server listens on one port (443). The limit is actually on the Client side or the Load Balancer side which runs out of source ports (65535) to connect to the backend. To scale beyond $65k$ concurrent users on a single IP you need Virtual Interfaces or multiple backend IPs to create unique tuples.
Conclusion
Do not reach for WebSockets just because they are new. They introduce the complexity of stateful scaling masking logic and specialized kernel tuning.
Use Long Polling - if you need to support very old browsers or have very low message volume.
Use SSE - if you only need to push updates to the user (Stocks News Feeds) and want built in reconnection logic.
Use WebSockets - if you need low latency bidirectional communication (Games Chat Trading) and are willing to manage the infrastructure complexity.
Real time web is not magic. It is just managing a very long very thin wire between two computers while trying to keep the routers from cutting it.


