I recently participated in the company's project research and development, and found some minor problems in data management. Based on past experience, I recorded the microservice data design pattern here.

Services in a microservice architecture are loosely coupled and can be developed, deployed, and scaled independently. Each microservice requires different types of data and storage, and because of this each microservice has its own database.

1. Database for each service

Each microservice has its own database and can freely choose how to manage the data.

1.1 Benefits of having a database per service

Loosely coupled, each service can focus more on its own professional field
Free choice of database types, such as RDBMS such as MySQL, wide-column databases such as Cassandra, document databases such as MongoDB, key-value stores such as Redis, and graph databases such as Neo4J.

Do I need to use a different database server for each service? This is not a hard requirement. Let's see what we can do.

1.2 If you are using RDMS, then include the following features:

Private tables —Each service owns a set of tables that can only be accessed by that service.
Dedicated database schema - Each service has a private database schema.
Dedicated database server —Each service has its own database server.

1.3 The challenge of having a database for every service

Queries that require connecting to multiple databases — the following data schema can overcome this challenge.

Event Sourcing
API composition
Command Query Responsibility Separation (CQRS)

Transactions across multiple databases - To solve this problem, we can use the Saga pattern .

2. Event traceability

With event sourcing, the state of a business entity is tracked by a series of state-changing events. Whenever the state of a business entity changes, a new event is added to the event list. Since saving an event is a single operation, it is atomic in nature. By replaying events, the application reconstructs the current state of the entity.

Applications save events in an event store, which is an event database. Events can be added and retrieved from storage using its API. The event store also acts as a message broker. Services can subscribe to events through their API. When a service saves an event in the event store, it is sent to all interested subscribers. When an entity has a large number of events, the application can periodically save a snapshot of the entity's current state to optimize loading. The application looks up the most recent snapshot and the events that have occurred since that snapshot to reconstruct the current state. This reduces the number of events to replay.

2.1 Benefits of Event Sourcing

Using it solves one of the key challenges of event-driven architectures and enables reliable publishing of events when state changes.
Object-relational impedance mismatch problems are avoided by persisting events instead of domain objects.
Provides 100% reliable audit logs for entities.
Allows execution of temporal queries that determine the state of an entity at any point in time.
Business logic based on event sourcing involves loosely coupled entities exchanging events. Makes migrating from a monolithic application to a microservices architecture much easier.

2.2 Disadvantages of Event Sourcing

There is a certain learning cost, and it is still an immature technology.
Querying the event store is difficult, requiring a typical query to reconstruct entity state. May result in inefficient and complex queries. Therefore, applications must use Command Query Responsibility Separation (CQRS) to implement queries. In turn, this means that applications must deal with eventually consistent data.

3. API Composition

You can use the API composition pattern to implement query operations that retrieve data from multiple services. In this pattern, query operations are implemented by invoking the service that owns the data and then combining the results.

3.1 Benefits of API composition

A convenient way to query data in a microservice architecture.

3.2 Disadvantages of API composition

Sometimes queries result in inefficient memory joins for large datasets.

4. Command Query Responsibility Separation (CQRS)

RDBMSs are commonly used as transactional systems of record and text search databases such as Elasticsearch or Solr for text search queries. Some applications keep the databases in sync by writing to both at the same time. Others regularly copy data from the RDBMS to the text search engine. Applications built on this architecture take advantage of multiple databases, the transactional properties of an RDBMS, and the query capabilities of a text database. CQRS generalizes this architecture.

Microservice architectures face three common challenges when implementing queries.

Use the API composition pattern to retrieve data scattered across multiple services, resulting in costly and inefficient in-memory joins.
The data is stored in a format or database that cannot efficiently support the queries required by the service owning the data.
Separation of concerns means that the service that owns the data should not be responsible for implementing query operations.

All three problems can be solved by using the CQRS pattern.

The main goal of CQRS is separation or separation of concerns. Therefore, the persistent data model is divided into two parts: the command side and the query side.

Create, update, and delete operations are implemented by command-side modules and data models. Queries are implemented by query-side modules and data models. By subscribing to events published by the command line, the query side keeps its data model in sync with the command side

4.1 Benefits of CQRS

Achieve efficient query fulfillment —If you implement queries using the API composition pattern, you may experience costly, inefficient memory joins for large datasets. For these queries, it is more efficient to use CQRS views that pre-join data from two or more services.
Ability to efficiently implement many types of queries —It is often difficult to support all queries with a single persistent data model. In CQRS, one or more views are defined to efficiently implement a specific query, removing the limitation of a single data store.
Enables in-application querying based on Event Sourcing - CQRS also overcomes an important limitation of Event Sourcing. The event store only supports queries based on primary keys. The CQRS pattern addresses this limitation by defining one or more aggregate views that are kept up-to-date by subscribing to the stream of events published by the event source aggregate.
Separation of Concerns Improvements - Domain Model and Persistent Data Model do not support commands and queries. CQRS separates the command and query sides of the service into separate code modules and database schemas.

4.2 Disadvantages of CQRS

More complex architecture —in order to update and query views, developers need to write query-side services. Applications may use different types of databases, adding complexity for developers and DevOps.
Handling replication lag - There is a delay between publishing an event on the command side and processing the event by the query side and updating the view.

Five, Saga mode

Using sagas, you can maintain data consistency in a microservice architecture without using distributed transactions. You define a saga for each command that updates data across multiple services. A saga is a sequence of local transactions. Local transactions update data within a single service using the ACID transaction framework.

Sagas utilize compensating transactions to roll back changes. Suppose the nth transaction of saga fails. The first (n-1) transactions must be undone. As a result, a total of (n-1) compensating transactions will be started to rollback changes in reverse order.

5.1 Saga coordination

In order to implement a saga, it needs logic to coordinate its steps. Once a saga has been started by a system command, coordination logic must select and instruct the first saga to execute a local transaction. Once the transaction is complete, the orchestration coordinator selects and invokes the next saga participant. This process continues until the legend is complete. If the local transaction fails, the saga must execute compensating transactions in reverse order.

5.2 There are several ways to build the coordination logic of a saga:

Orchestration : Distributing decision making and sequencing among the participants in the saga. They communicate primarily by exchanging events.

5.2.1 Advantages of orchestration-based saga

Simplicity —Services publish events when business objects are created, updated, or deleted.
Simple Dependencies - No circular dependencies are introduced.
Loosely coupled —the service implements an API called by the orchestrator, so it does not need to be aware of events published by saga participants.
Simplified business logic - In the saga orchestrator, saga coordination logic is localized. Domain objects are unaware of the sagas they are involved in.

5.2.2 Choreography-based disadvantages

Harder to understand —orchestration distributes the implementation of saga among services, and each service is independent, which requires each management to understand each service.
Circular dependencies between services - saga participants subscribe to each other's events, which often creates circular dependencies.
Risk of tight coupling - participants in a saga must subscribe to all events that affect them.

Orchestration - The coordination logic for a saga should be centralized in a saga orchestrator class. During a saga, the orchestrator sends command messages to the participants, telling them what actions they should perform.

Data Design Patterns for Microservice Architecture