System Design - Event Driven and Asynchronous Design
Synchronous design works perfectly for simple systems but it often fails at massive scale.
So far we have mostly discussed what engineers call Synchronous Communication. This is the traditional method where Service A calls Service B and Service A must wait for Service B to respond before it can do anything else. Think of a phone call where you cannot hang up until the other person answers your question.
Synchronous design works perfectly for simple systems but it often fails at massive scale. If Service B is slow then Service A becomes slow. If Service B crashes then Service A also crashes because it is stuck waiting for a response that will never come. This creates a chain reaction known as a Cascading Failure.
The solution to both cascading failures and high latency is to adopt an Event Driven Architecture which relies on Asynchronous Communication. This is a powerful shift that allows systems like Twitter and Hotstar to decouple their components entirely.
This post will define synchronous versus asynchronous communication and introduce the central component of this architecture the Message Queue. We will explain why decoupling your microservices is the ultimate way to achieve both reliability and massive scale.
Synchronous vs Asynchronous
The difference between these two patterns is defined by whether the requesting service waits for the response.
Synchronous (Blocking)
In a synchronous world the client (Service A) makes a direct API call to the server (Service B). Service A is blocked or frozen until Service B finishes its work and sends a response.
If Service B takes 10 seconds to process an image then Service A also takes 10 seconds. During those 10 seconds 100% of Service A’s resources are wasted doing nothing but waiting. This causes high Latency and makes the system vulnerable. If Service B is struggling then every service that calls it will also start struggling.
Asynchronous (Non Blocking)
Think of this like a text message or an email. Service A sends a request and immediately continues to its next task without waiting for a reply. The response or result will arrive later through a separate mechanism.
Because Service A is not waiting it can handle 1000 other tasks in the time Service B takes to complete the first one. This maximizes resource utilization and dramatically lowers the effective Latency for the user. The user gets a confirmation that the request was received and the heavy work happens in the background.
The Message Queue
The key technology that enables asynchronous design is the Message Queue also known as a Message Broker or Event Bus. Common examples include Apache Kafka RabbitMQ and Amazon SQS.
A Message Queue is essentially a highly durable storage buffer that sits between two services. It allows them to communicate without ever having to talk to each other directly.
Core Components
Producer
This is the service that creates an Event or a message and writes it to the queue. For example a user uploaded photo event. Once the message is written the Producer is done and moves on to the next user.
Consumer
This is the service that reads the message from the queue and performs some work. For example an image processing service that creates thumbnails or a safety service that scans for viruses.
Queue or Topic
The durable storage that holds the message until a Consumer is ready to read it.
How It Decouples Services
The Queue provides three critical forms of decoupling.
Time Decoupling
The Producer and the Consumer do not have to be running at the same time. If the Image Processing Service is down for maintenance the Producer can continue writing messages to the Queue. When the Consumer comes back online it simply processes the backlog. This greatly improves Reliability.
Rate Decoupling
This is a lifesaver during traffic spikes. If the Producer sends 5000 messages per second but the Consumer can only process 500 per second the Queue acts as a buffer. The Queue protects the slow Consumer from being overwhelmed. This is called Load Leveling.
Logical Decoupling
The Producer does not need to know where the Consumer is or how many Consumers there are. It only knows the name of the Queue. You can add 10 new services like a logging service or an analytics service without ever changing a single line of code in the original Producer service.
Popular Event Driven Patterns
Message Queues enable two primary patterns which define how information is shared across your architecture.
Publish Subscribe (Pub Sub)
This is the most common pattern where a Producer sends an event to a Topic. Any number of interested Consumers can subscribe to that topic to get a copy of the message.
Think of a news channel. A service publishes a message like a new order being placed. The message is immediately sent to every service that has subscribed to the order events topic. The inventory service gets a copy to deduct stock. The billing service gets a copy to process payment. The notification service gets a copy to send an email.
Event Streaming with Kafka
Event streaming platforms like Apache Kafka provide a persistent and ordered log of all events that have ever happened.
In this model every event is written to a topic and given a sequence number. Consumers can read from any point in the stream and even re read past events. The event is not deleted once it is read. This is crucial for Auditing and State Reconstruction. If a service crashes you can simply replay the event stream from a historical point in time to fully rebuild its state.
Challenges of Asynchronous Design
While this architecture solves many problems it introduces new complexity that the system designer must manage carefully.
Eventual Consistency
As we discussed in our post on the CAP Theorem asynchronous systems are naturally A P systems that prioritize Availability over instant Consistency.
When Service A writes a message to the Queue and Service B reads it there is a short time delay. During this delay Service A’s data store is updated but Service B’s is not yet updated. The system accepts that the two services will eventually converge on the same data. The designer must ensure the business can tolerate this short window of data being slightly out of sync.
Distributed Transactions and the Saga Pattern
In a traditional synchronous system if a payment fails you can simply roll back the entire database transaction. In an event driven system a payment is no longer one single action.
If an order event triggers $5$ separate services asynchronously and the first four succeed but the fifth one fails you cannot use a simple database rollback. You must use the Saga Pattern. This involves creating a new Compensating Event to undo the work done by the first four services. For example if the Shipping Service fails it sends a refund payment event to the Billing Service.
Debugging and Tracing
In a synchronous system you can trace a transaction by following a single chain of direct calls. In an event driven system one event might trigger 10 other events which then trigger 10 more.
Tracing a single user action through this complex network is very difficult. You must implement Distributed Tracing. This involves attaching a unique ID to the first event and passing this ID along with every subsequent event. This allows a monitoring system to reconstruct the entire flow and pinpoint the exact source of an error.
The Next Step
The move from synchronous to asynchronous design using Message Queues is a requirement for creating any massive and resilient microservice architecture. It provides the essential layer of Decoupling that allows services to fail independently and process massive bursts of traffic without crashing.
You now understand the complete foundation of system design. We have covered requirements and reliability and data consistency and capacity and communication.


