System Design - How X (Formerly Twitter) Notifies Millions in Seconds (Case Study)
It exposes the core trade offs every designer must face how to handle massive scale while remaining fast and reliable.
The design discussed below is a high-level speculation based on common industry best practices for massive scale social platforms, known technical challenges (like the fanout problem), and public domain information. The actual implementation and technologies used by X (Twitter) may differ significantly.
We have covered the fundamentals how systems are broken into microservices how data is sharded and how we must choose between Consistency and Availability.
Now it is time to apply these concepts to a massive real world problem Twitter Notifications.
Imagine a global figure like a sports star or a world leader posts a single tweet. That one write operation must result in a notification being instantly delivered to over 100 million users worldwide. This is one of the biggest scaling challenges in the industry.
It is a perfect case study because it exposes the core trade offs every designer must face how to handle massive scale while remaining fast and reliable.
The Read Write Ratio
The first step in any system design is capacity planning and identifying the Read Write Ratio (R W).
In Twitter’s case the R W ratio is extremely skewed.
The Write - A single user writes one tweet or post. This is a very small number of writes.
The Read - That one tweet is read by everyone who follows that user. If a star has 100 million followers that is 100 million reads that need to happen immediately after the single write.
The ratio is massive perhaps 1 write to 100 million reads. This tells the designer exactly where the complexity and the budget must go scaling the reads.
The Fanout
When a user posts a tweet the system has to perform a Fanout the process of spreading that one post to all relevant users’ feeds. There are two main ways to solve the fanout problem.
Fanout on Read (Pull Model)
When the star posts the tweet the system only saves the post to a single location called the Tweet Service. When a follower opens their app their feed service asks Give me all the new tweets from the $10,000$ people I follow. It is like the follower actively pulling the data.
The benefits to this method that its super efficient for to handle writes, the system is fast when the star posts a tweet because it does almost no work.
But this is slow when the follower opens their app. The system has to query thousands of timelines and then sort and merge all those results in real time. This is too slow for a massive scale service like Twitter. This causes high Latency.
Fanout on Write (Push Model)
When the star posts the tweet the system immediately identifies all $100$ million followers and pushes the new tweet directly into their individual inboxes or timelines. When the follower opens their app their feed is already ready.
This is efficient and fast for reads, when the follower opens their app the feed loads instantly because the work was done hours ago. Low latency is achieved.
But this requires massive compute power and speed to instantly process 100 million operations for a single tweet.
Twitter’s Choice Twitter uses a hybrid model prioritizing Fanout on Write for most users because the speed of the feed is non negotiable.
The Tech Stack
To make the heavy Fanout on Write architecture work the system must rely on specialized components.
The Message Queue (Kafka)
The single tweet write event is immediately placed onto a Message Queue like Kafka.
Decoupling
The queue decouples the action of posting from the action of delivering. The star gets a response back immediately saying “Your tweet is posted” even while the fanout process is still running in the background. This ensures low Latency for the star.
Scalability
The queue handles the massive burst of traffic by allowing the system to consume the messages as quickly as its Worker Services can handle them preventing an overload.
The Service Workers
A massive pool of dedicated Worker Services reads the tweet from the queue. Their only job is to calculate who the 100 million followers are and push the tweet into their timelines.
The timeline data for all 100 million users cannot fit on one server. Users are sharded usually by User ID. Each worker is responsible for fanning out the tweet to its assigned shards.
The Caching Layer (Redis)
The actual timeline data for 99% of users is not stored in the slow permanent database. It is stored in a blazing fast Key Value Cache like Redis or Memcached.
Architecture
Each follower’s timeline is its own key. When the worker fans out the tweet it simply adds the new tweet ID to the list stored under the follower’s Timeline Key.
Speed
This ensures that when the follower opens the app the feed service only has to hit the cache which can deliver the data in under 5 milliseconds making the timeline load virtually instant.
Consistency vs Speed
To achieve this incredible speed the system designers had to consciously sacrifice Strong Consistency and instead choose Eventual Consistency.
The A P Choice Twitter is an A P System prioritizing Availability and Partition Tolerance over 100% real time Consistency.
The Business Acceptance
If a sports star tweets at 10:00:00 AM and one follower in Germany sees the notification at 10:00:05 AM this temporary inconsistency is 100% acceptable to the business. Users prefer seeing most of the feed instantly rather than waiting an extra 5 seconds for the system to ensure 100% perfect ordering.
The Benefit
By accepting eventual consistency the system is allowed to use powerful high speed A P databases like Cassandra or fast caches like Redis which are built to survive network partitions and maximize uptime.
Conclusion
The Twitter Notification system is a masterclass in large scale design. It shows that great design is not about having the best server it is about making smart trade offs.
Read Write Ratio dictates a Fanout on Write approach.
Latency Constraint dictates using Asynchronous Queues and Caches to push the hard work out of the user’s view.
Scale Constraint dictates using Sharding and Worker Services to divide the labor.
Consistency Trade Off dictates choosing an A P architecture to keep the feed fast and always available.
We have seen how the concepts of sharding caching and asynchronous design work together to handle extreme scale.


