Transactional Outbox Pattern 📬

Implementing the Outbox Pattern for Reliable Microservices Communication

Parvesh Saini

6 min read

Introduction

In a microservices architecture, each service often needs to do more than just update its own database. It also needs to inform other services within the organization about these updates. But there's a challenge, making sure that updating the database and notifying other services happens reliably. If something goes wrong, you might end up with a situation where the database is updated, but other services don’t get the memo or vice versa.

In this blog post, I want to discuss a possible solution to this problem, commonly known as the Transactional Outbox Pattern.

The Problem

Let’s say you have a function that adds a new employee to your database and then sends out an event to notify other services about this new employee.

async function createEmployee(data) {
    // Step 1: Save the employee to the database
    await prisma.employee.create({ data });

    // Step 2: Send an event to notify other services
    eventPublisher.publish('employee.created', data);
}

This seems pretty straightforward, right? But what if the first step (saving to the database) works, but the second step (sending the event) fails due to a network issue? In that case, the employee would be added to the database, but the other services wouldn’t know about it, leading to inconsistent data across your system.

On the other hand, if the event is sent out successfully but the database update fails, other services might start acting on an employee that doesn’t exist another potential problem.

The Solution: Outbox Approach

Transactional Outbox Pattern effectively resolves the atomicity issue by including the domain event in an outbox table within the same transaction that modifies database entities. Thanks to the atomicity property of ACID transactions, this ensures that both the event and the database change are either committed together or not at all. Afterward, a background process retrieves these events from the outbox table and publishes them to a message broker, enabling multiple consumer services to access and process the domain events.

While the Outbox Pattern is powerful, there are a few important things to consider before we dive into implementation:

1. Order of Domain Events

Imagine you have an employee record that’s created, updated, and then deleted. It’s crucial that other services receive these events in the exact order they happened, first the creation, then the update, and finally the deletion. If, for example, the deletion event reaches a service before the creation event, it would cause confusion.

This is where Apache Kafka comes into play. Kafka’s log-based architecture allows for events to be replayed, enabling consumers to reprocess data from any point in the stream, providing robust support for scenarios requiring fault tolerance and data recovery. It is designed to handle high-throughput data streams while maintaining the order of events, which is crucial for applications that depend on sequence integrity.

2. Approach for Fetching Outbox Events

There are two main ways for the background process to fetch events from the outbox table:

Polling the Outbox Table: The background process can regularly check (or "poll") the outbox table for new events. This approach is straightforward but can be inefficient, it consumes resources even when there are no new events. Plus, if you need real-time event delivery, polling might introduce delays.
Tailing the Transaction Log: Another approach is to tail the transaction log, which is a detailed record of all database changes. It works by continuously reading the changes directly from the database’s transaction log as they happen. This method avoids the inefficiency of polling and can offer near real-time event delivery. However, it’s more complex and usually requires a database-specific solution.

3. Handling Duplicate Events

The background process can sometimes introduce duplicate events. For example, if the process is polling the database, it might successfully send an event to Kafka but crash before removing it from the outbox table. When it restarts, it could send the same event again, leading to duplicates.

Similarly, with transaction log tailing, the process might crash after sending an event but before updating its position in the log. When it restarts, it might re-read the same part of the log and send the event again.

To deal with duplicate events, one common solution is using an idempotency key, which is a unique identifier for each event, ensuring that even if the same event is processed multiple times, it only has an effect once. This way, duplicate events don’t cause problems downstream.

Implementation

The core idea is to use a transactional approach to guarantee that both the database update and the event publishing are treated as a single, atomic operation. Let's break down the process:

Transactional Outbox Entry: When an important change happens in the database (like creating, updating, or deleting a record), an entry is made in a special "outbox" table. This outbox table stores events that need to be published to a message broker like Apache Kafka. Both the database update and the outbox entry are part of the same transaction. This ensures that if one operation fails, the entire transaction is rolled back, preventing any inconsistencies.

  async function createEmployee(data) {
      await prisma.$transaction(async (prisma) => {
          // step 1: save the employee to the database
          await prisma.employee.create({ data });

          // step 2: save the event to the outbox table
          await prisma.outbox.create({
              data: {
                  eventType: 'employee.created',
                  payload: JSON.stringify(data),
                  status: 'PENDING'
              },
          });
      });
  }

Background Processor: A background process or service regularly checks the outbox table for new events. Instead of fetching one event at a time, this processor can work in batches to efficiently handle multiple events at once. After fetching the events, the processor sends them to Kafka and marks them as processed in the outbox table. This approach not only reduces the risk of sending duplicate events but also improves performance by handling multiple events in a single operation.

  async function processOutboxEvents() {
      // fetch unprocessed events in batches (e.g., 10 at a time)
      const events = await prisma.outbox.findMany({
          where: { status: 'PENDING' },
          take: 10
      });

      for (const event of events) {
          try {
              // publish the event to Kafka
              await kafkaProducer.send({
                  topic: event.eventType,
                  messages: [{ value: event.payload }]
              });

              // mark the event as processed
              await prisma.outbox.update({
                  where: { id: event.id },
                  data: { status: 'PROCESSED' }
              });
          } catch (error) {
              console.error('Error processing event:', event.id, error);
              // Optionally, implement a retry mechanism here
          }
      }
  }

Event Consumer: Once the events are in the Kafka queue, consumers can read these events and process them as needed. These consumers might update their own data stores, trigger additional workflows, or notify users. The consumer's job is to ensure that the events are handled appropriately and consistently, based on the business logic.

  const { Kafka } = require('kafkajs');

  const kafkaConsumer = new Kafka({
      clientId: 'my-app',
      brokers: ['kafka:9092']
  }).consumer({ groupId: 'employee-service' });

  async function consumeEvents() {
      await kafkaConsumer.connect();
      await kafkaConsumer.subscribe({ topic: 'employee.created', fromBeginning: true });

      await kafkaConsumer.run({
          eachMessage: async ({ topic, partition, message }) => {
              console.log({
                  value: message.value.toString(),
              });

              // handle the event, e.g., update another service or database
              const eventData = JSON.parse(message.value.toString());
              // process the eventData based on your business logic
          },
      });
  }

  consumeEvents().catch(console.error);

It's important to note that this system is based on eventual consistency. This means that while updates and events may not be synchronized instantly across all services, they will eventually reach a consistent state.

Conclusion

The Outbox Pattern is a reliable approach for ensuring consistency between your database and event-driven systems. While it introduces some complexity and potential performance concerns, these trade-offs are often worth it for the robustness it provides in a microservices architecture.

Thank you so much for tuning in! If you have any doubts or suggestions, feel free to ping me on LinkedIn. Your engagement is greatly appreciated. Happy coding :)