Maintaining Data Consistency Across Microservices

December 17, 2019December 16, 2019 Karthikeyan Shanmugam compensating transactions, distributed transactions, GRIT, microservices

by Karthikeyan Shanmugam

When microservice applications are built as a set of modular components, they are easier to understand, simpler to test and effortless to maintain over the life of the application. They enable organizations to achieve agility and be able to improve the time it takes to get working enhancements to production.

On the downside, there are quite a number of challenges—each service has its own database and wherever business transactions span multiple services you need a mechanism to ensure data consistency across services. For example, if you are building a web store application where customers have a credit limit, the application must ensure that a new order will not exceed the customer’s credit limit. Since orders and customers are in different databases, the application cannot simply use a local ACID transaction.

In this article, let’s look at the available options to implement data consistency across microservices.

Compensating Transactions Pattern

In this approach, implement each business transaction that spans multiple services as a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction. If a local transaction fails because it violates a business rule, then executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

There are two ways to do it:

For each local transaction, publish domain events that trigger local transactions in each other services. For example, if you’re building a store application that uses this approach, the following steps should be taken:

The Order Service creates an Order in a pending state and publishes an ORDER_CREATED event
The Customer Service receives the ORDER_CREATED event and attempts to reserve credit for that Order. It publishes either a CREDIT_RESERVED event or a CREDIT_LIMIT_EXHAUSTED event.
The Order Service receives the event and changes the state of the order to either approved or cancelled

Use an orchestration kind of service to tell the participants what local transactions to execute

The Order Service creates an Order in a pending state and creates a CreateOrderOrchestration
The CreateOrderOrchestration sends a ReserveCredit command to the Customer Service
The Customer Service attempts to reserve credit for that Order and sends back a reply
The CreateOrderOrchestration receives the reply and sends either an ApproveOrder or RejectOrder command to the Order Service
The Order Service changes the state of the order to either approved or canceled

Issues and Considerations

Compensation logic is not easily generalized. A compensating transaction is application-specific; it relies on the application having sufficient information to be able to undo the effects of each step in a failed operation.
All the steps in a compensating transaction should be defined as idempotent commands. This enables the steps to be repeated even if the compensating transaction itself fails.
Determination of when a step that implements eventual consistency has failed i.e., one of the steps might not fail immediately, but instead could block or time out. It may be necessary to implement some form of time-out mechanism.
A compensating transaction does not necessarily return the data in the system to the state it was in at the start of the original operation. Instead, it compensates for the work performed by the steps that completed successfully before the operation failed.
The order of the steps in the compensating transaction does not necessarily have to be the mirror opposite of the steps in the original operation. For example, one data store may be more sensitive to inconsistencies than another, and so the steps in the compensating transaction that undo the changes to this store should occur first.
Plan to use retry logic. For example, If a step in an operation that implements eventual consistency fails, try handling the failure as a transient exception and repeat the step.

In the next section, we look at eBay’s new protocol to address the above issues.

GRIT: A Protocol for Distributed Transactions

eBay has developed GRIT protocol for achieving globally consistent distributed transactions across microservices with multiple underlying databases.

The following diagram illustrates the GRIT protocol in a microservices application with two databases:

To understand better, let’s look at key components that make up globally consistent distributed transactions.

Key Components

Global Transaction Manager (GTM): It coordinates global transactions across multiple databases. There can be one or more GTMs.
Global Transaction Log (GTL): It represents the transaction request queue for a GTM. The order of transaction requests in a GTL determines the relative order among global transactions.
Database Transaction Manager (DBTM): The transaction manager at each database realm. It performs the conflict checking and resolution, i.e. local commit decision is located here.
Database Transaction Log (DBTL): The transaction log at each database realm that logs logically committed transactions that relate to this database (including single database transactions and multi-database transactions). The order of transactions in a DBTL determines the order of the whole database system, including the global order dictated by the GTM. A log sequence number (LSN) is assigned to each log entry.
LogPlayer: This component sends log entries, in sequence, to the backend storage servers for them to apply the updates. Each DB server applies log entries of logically committed transactions in order.

Now that we have understood the components, we look at the steps for a distributed transaction.

Steps for a Distributed Transaction

Following diagram shows the main steps for a distributed transaction:

Figure 2: Steps for GRIT distributed transaction

According to GRIT, a distributed transaction goes through three phases:

Optimistic execution (steps 1-4): As the application is executing the business logic via microservices, the database services capture the read-set and write-set of the transaction. No actual data modification occurs in this phase.
Logical commit (steps 5-11): Once the application requests the transaction commit, the read-set and write-set at each database service point are submitted to its DBTM. The DBTM uses the read-set and write-set for conflict checking to achieve local commit decision. The GTM will make the global commit decision after collecting all the local decisions of DBTMs for the transaction. A transaction is logically committed once its write-sets are persisted in log stores (DBTLs) for databases involved. This involves minimum coordination between the GTM and the DBTMs.
Physical apply (steps 12-13): The log players asynchronously send DBTL entries to backend storage servers. The data modification occurs at this phase.

Issues and Considerations

GRIT is able to achieve consistent high throughput and serializable distributed transactions for applications invoking microservices with minimum coordination.
GRIT fits well for transactions with few conflicts and provides a critical capability for applications that otherwise need complex mechanisms to achieve consistent transactions across microservices with multiple underlying databases.

Conclusion

Though there is no single approach that fits all, cloud applications typically use data that is dispersed across data stores. Managing and maintaining data consistency in this environment can become a critical aspect of the system, particularly in terms of the concurrency and availability issues that can arise. You would need to trade strong consistency for availability. This means that you may need to design some aspects of the applications around the notion of eventual consistency and accept that the data that your applications use might not be completely consistent all of the time.