The Outbox Architectural Pattern | NTT DATA

Fri, 13 October 2023

Unlocking Reliability and Scalability: The Outbox Architectural Pattern

In microservices and distributed systems, we often encounter scenarios where we need to write to a database or interact with a slow external service and then notify others about this event. This situation presents unique challenges in maintaining data consistency and reliability.

Maintaining data consistency and reliability is crucial for several reasons:

1. User Experience

In modern applications, users expect a seamless and reliable experience. Inconsistent or unreliable data can lead to user frustration and loss of trust. For example, in an e-commerce system, if a customer's order goes through multiple microservices and the data becomes inconsistent, it can result in billing errors or order discrepancies, leading to dissatisfied customers.

2. Data Integrity

In distributed systems, data can be distributed across multiple databases or services. Ensuring data consistency guarantees that data is accurate and valid across all parts of the system. Data integrity is critical for making informed business decisions and maintaining the quality of the application's data.

3. Compliance and Legal Requirements

In certain industries, such as finance and healthcare, strict compliance and legal requirements mandate data consistency and reliability. Failure to meet these requirements can result in severe legal and financial consequences.

4. Business Logic

Many microservices rely on consistent data to execute their business logic correctly. For instance, a pricing microservice in an e-commerce system needs accurate product and inventory data to calculate prices. Inconsistent data can lead to incorrect pricing and financial losses.

5. Fault Tolerance

Distributed systems are prone to failures, such as network outages, server crashes, or database failures. Ensuring data consistency helps in maintaining system resilience and fault tolerance. It allows for graceful handling of failures without compromising data integrity.

To solve the problem of data consistency and reliability we can use the 'Outbox Architectural Pattern' – a powerful design choice that can be a game-changer.

The Challenge: High-Concurrency Booking System

The challenge is a high concurrency booking system where multiple users are making bookings almost simultaneously. We need to ensure that these bookings are processed reliably, even when confronted with slow database operations.

The Outbox Arichitectural Pattern

Here are the advantages of the Outbox Architectural Pattern:

(See Image1 below)

  1. Immediate User Feedback. Users call the BookingAPI and receive an immediate acknowledgment in the form of OperationalOutcome. No need to wait.
  2. The Outbox Table. The BookingAPI stores booking events in an "Outbox Table" as "PENDING" without delay.
  3. Reliability. Events are recorded in the Outbox Table, ensuring they are never lost, even if there are issues during insertion.
  4. Scalability.: The "Scheduler" component, which is decoupled from the insertion mechanism, periodically retrieves the "READY" events from the Outbox Table and efficiently publishes them to the Message Broker.

Image1 - Outbox architecture web sequence diagram

Implementation Details

Let’s see now an implementation of this architecture for one of NTT DATA’s clients using Azure Cloud in Image2:

Image2 - Outbox architecture implementation in Azure

1. Cosmos DB as the Data Store:

  • Cosmos DB uses one Collection which has two types of items:
    • booking is the actual booking containing the booking reference and passenger data
    • outbox serving as the Outbox holding the information about the status changes in the booking.
  • The two items have the same partitionKey so that they are stored in the same logical partition
  • The Outbox item could include the whole data of the booking or just its ID in order to avoid data repetition
  • The fact of persisting the items in the same logical partition ensures Atomicity, Consistency, Isolation, and Durability (ACID).

  • 2. App Service for the BookingAPI:
    • The BookingAPI is hosted on Azure App Service under an App Service Plan in Java with Spring Boot.
    • Using Maven the dependency is
  • <artifactId>azure-cosmos</artifactId>

    • The BookingAPI creates the two Data Transfer Objects (DAOs) and sets the Outbox object to PENDING status.

    Example code follows which imports the cosmos DB container and creates the Booking and Outbox items in the Collection:

     

     

    3. Function Logic for the Scheduler:

    • The Scheduler is simply an Azure Function App with a TimerTrigger
    • When it is triggered it retrieves the events (from the outbox) in READY status
    • The important part here is that this operation is decoupled from the creation of the bookings and outbox entries.
    • Final step for the function app is to efficiently publish these events to the the Azure Event Hub.
    • Example code for the operations may be found in the code snippet below

    4. Data Ingestion with Event Hubs:

    • The Retrieve Bookings Function App sends data to the Event Hubs
    • Initiates EventHubProducerClient (Azure Library)

    EventHubProducerAsyncClient producer = new EventHubClientBuilder()
    .connectionString("connection-string-for-booking-event-hub")
    .buildAsyncProducerClient();

    • Creates a Batch of events and sends them
     

    5. Transaction Handling:

    • Booking and Outbox items update and booking table update take place within the same transaction to ensure data consistency.

    6. Configuration and Scalability:

    • App Services can be scaled horizontally very fast benefiting also from auto-scaling options
    • Function App uses the Serverless approach which means that it has very high scalability depending on the demand managed by Azure

    7. Monitoring and Error Handling:

    • Monitoring of the health of the Azure Function App, App Service, and Cosmos DB is done with built-in Azure Monitoring which includes graphs and metrics without any configuration needed. Monitoring would include:
      • Detection of HTTP status code ≥ 500 but < 600 and create alerts
      • Detect of high CPU usage and create alert

    Pros of the Outbox Pattern

    • Atomicity. Ensures data consistency and reliability
    • Improved User Experience. Immediate acknowledgments keep users informed.
    • Reliable Processing. Outbox table update and booking table update take place in the same transaction. If one of the two fails, the transaction will be rolled back. There should not be any wrong notification.
    • Scalability. The system efficiently handles high concurrency.

    Cons of the Outbox Pattern

    • Complexity. Introducing the pattern requires thoughtful design and implementation.
    • Potential Duplicate Messages. Duplicate messages are a possibility. It's why the consuming service must be idempotent.
    • Order of Messages. The order of processed messages may differ from the order of ClientApp calls. Systems consuming these events should be designed to handle out-of-order messages gracefully to maintain data consistency.

    In Conclusion

    The Outbox Pattern, when implemented using Azure services, provides a powerful solution to the complex challenge of maintaining data consistency and reliability in microservices and distributed systems. It's a strategic investment in user satisfaction, compliance, and long-term success.


    How can we help you

    Get in touch