[Data Grid Architecture] Data Grid Architecture Pattern

Enterprise data grids are revolutionizing the way businesses manage data. What is the underlying data grid schema?

data grid pattern


Enterprise data grids are emerging as a unique and compelling way to manage data within an enterprise. It brings "product thinking" to enterprise data management, while enabling new levels of agility and data governance in the enterprise. It creates a "self-service" capability with near real-time data synchronization, thereby laying the foundation for the real-time digital enterprise.
But alas, there is no single product that will bring you a data grid. Instead, an enterprise data grid is composed of many common components (see the next section, Data Grid Architecture Review).
But the key to success is understanding how these components interact. In this article, I'll use architectural patterns to describe these interactions.

Data Grid Architecture Review


An enterprise data grid is made up of many components (more details are available here, here and here). Data products are the main building blocks in a data grid, containing operational, analytical and/or engagement data that is synchronized across the organization using the enterprise's data grid. APIs are used to access data in data products. To support federated governance, each data product contains an audit log that records data changes and a catalog of the data it manages.
An enterprise data grid has many data products. Data products subscribe to each other's data so that when one data product changes its data, this change is communicated to the other data products using the change data capture and event stream backbone.
Finally, an enterprise data catalog (a synchronous aggregation of all data product catalogs and data changes) is used to make it easy for any user or developer to find, use, and manage any data across the enterprise, while also providing the basis for understanding data lineage across the enterprise.

5883441af8c645b51fd807f9dc0522ce.png

  • Figure 1, Enterprise Data Mesh Architecture

We will describe the following architectural patterns in this article:

  • Change Data Capture (CDC)

  • Event Streaming Backbone 

  • Data Product Catalog

  • Enterprise Data Product Catalog

  • Immutable Change / Audit Log (Immutable Change / Audit Log )

Data Grid Patterns: Change Data Capture 

Today, it is difficult to deliver data securely, reliably, and consistently across service and application boundaries. There are two ways to address this challenge. First, it is possible to synchronize updates across multiple databases using protocols such as "two-phase commit" (2PC), but this approach is often complex and costly, and is usually reserved for situations where keeping multiple data sources in sync is absolutely critical.
Second One approach is to update the primary database immediately, while updating the secondary database in the future (but not within the scope of a transaction). Problems arise when the time span between updating the primary and secondary databases is longer than expected.
Change data capture (CDC) is the foundational component that enterprise data grids use to address this challenge. CDC works by capturing and publishing entries in the database's transaction log, but most importantly, it does so inconspicuously outside of the original transaction. This means CDC transparently captures changes to operational (or analytical) data without affecting the original application or transaction flow.
(Note: More details here for those looking for details on how the CDC works in the enterprise)

3683d61a8c0b67ac8b637a145eef93f6.png

  • Figure 2, Data Mesh Pattern: Change Data Capture

But what does the CDC do with the captured "event". In Enterprise Data Mesh, it publishes events to the Event Streaming Backbone (the next pattern) for distribution across the enterprise.


Data Grid Patterns: Event Stream Backbone


Event Streaming Backbone distributes events across an enterprise data grid. Events usually come from applications, APIs, and in our case, CDC. Of particular importance, however, is that any published event can be consumed safely, reliably, and in near real-time by any other subscribing entity.

af5813cc948f796a89dfb5822ae4aa64.png

  • Figure 3, Data Mesh Pattern: Event Streaming Backbone

There are several core managed entities in the Event Streaming Backbone:
Events defined by a JSON schema are distributed across an enterprise data grid.

  • Topics are used to queue and distribute events throughout the enterprise; an enterprise data grid uses well-known topics similar to queues by allowing many entities to publish and consume events.

  • Producers publish events to topics; producers in an enterprise data grid could be APIs, applications, or CDCs.

  • Consumers consume events from topics. A consumer in an enterprise data grid can be any entity or application that subscribes to a topic and is notified when an event is available for processing.

  • Event stream processors can process events by event and aggregate events by time window, enabling very sophisticated and powerful analytics techniques in enterprise data grids.

  • The broker manages the above components to ensure secure and reliable communication of events across the enterprise data grid.


Data Grid Patterns: Data Product Catalog


Data, they say, is the new gold and mining that will bring great insight and wealth. But in most enterprises today, data is spread across many groups in the organization. Sales owns the customer data, distribution owns the supply chain, and finance owns the transactions and accounts.
Unfortunately, this makes it very difficult to find the data and, once found, even more difficult to bring it together to make comprehensive business decisions. The result: slow, costly, and uninformed decision-making.
The Data Product Catalog (DPC) contains information about the data ("metadata") of a Data Product. The information provided by the DPC makes it easy for any authorized person or application to find, view, and use data products in the enterprise data grid. DPC offers several benefits:

  • Ease of management by enabling local ownership and accountability.

  • Ease of change and evolution by allowing localization and faster decision making.

  • Ease of finding, viewing and using data, making it easy for any (authorized) entity to find, view and use data (i.e. "self-service").

c685eabd110b4b6375a531559768b2c9.png

  • Figure 4, Data Mesh Pattern: Data Product Catalog

Data Grid Patterns: A Catalog of Enterprise Data Products


The Enterprise Data Product Catalog (EDPC) is a repository that aggregates metadata from all on-premises Data Product Catalogs (DPCs). An enterprise data catalog is used to store information and statistics (metadata) about all data maintained in an enterprise data grid, making it easy to find, view, use and manage data:

  • Data scientists use EDPC to find data locations in the enterprise that can be used to train models.

  • Business users use EDPCs to find the information they need to make business decisions.

  • Developers use EDPC to understand the data structures required by their applications.

  • Governance Professionals uses EDPC to understand and monitor data across the enterprise, enabling federated computing governance within the enterprise data grid.

e501f997494e9c192f15db546cf31dbe.png

Figure 5, Data Mesh Pattern: Enterprise Data Product Catalog

Data Grid Patterns: Immutable Changes/Audit Log


Understanding data lineage—defined as the aggregated list of changes that data has undergone—is critical for governance and regulatory purposes. Why is this important? Consider a common scenario today: the advent of AI/ML is now a must-have capability for businesses. Data scientists use complex models to support and make critical business decisions.
However, in many businesses, notably healthcare and financial, the practical viability of these models depends on the ability to meet regulatory requirements for repeatability and traceability (more information is available here and here) . Unfortunately, most enterprises do not have the ability to track data lineage in the way auditors or regulators require.
An immutable change/audit log for an enterprise data grid addresses this need by preserving historical data changes in the enterprise data grid for future auditing and governance purposes. On-premises data product change/audit logs are automatically updated with any data changes to the data. These logs are then propagated to the Enterprise Data Product Catalog (EDPC) to consolidate the history of data changes across the enterprise.
In other words, the EDPC contains data lineage for all elements in the enterprise data grid. EDPC uses this data to provide a searchable index of metadata - which explicitly includes references to immutable change/audit logs for each data product - allowing data lineage to be easily found and confirmed.

16f8e182cbf279e72927d2907af45a61.png

  • Figure 6, Data Mesh Pattern: Immutable Change/Audit Log

concluding thoughts


The enterprise data grid is becoming the foundational enabler of the real-time digital enterprise. Architectural patterns provide an established way to describe data grid interactions. While there are no tools available out of the box, the first step in building your own organizational data grid is to understand the underlying patterns that enable it.
Hope this article gave you the necessary insight to start your own enterprise data grid!

This article: https://architect.pub/data-mesh-architecture-patterns
Discussion: Knowledge Planet [Chief Architect Circle] or add WeChat trumpet [ca_cto] or add QQ group [792862318]
No public
 
【jiagoushipro】
【Super Architect】
Brilliant graphic and detailed explanation of architecture methodology, architecture practice, technical principles, and technical trends.
We are waiting for you, please scan and pay attention.
WeChat trumpet
 
[ca_cea]
50,000-person community, discussing: enterprise architecture, cloud computing, big data, data science, Internet of Things, artificial intelligence, security, full-stack development, DevOps, digitalization.
 

QQ group
 
[285069459] In-depth exchange of enterprise architecture, business architecture, application architecture, data architecture, technical architecture, integration architecture, security architecture. And various emerging technologies such as big data, cloud computing, Internet of Things, artificial intelligence, etc.
Join the QQ group to share valuable reports and dry goods.

video number [Super Architect]
Quickly understand the basic concepts, models, methods, and experiences related to architecture in 1 minute.
1 minute a day, the structure is familiar.

knowledge planet [Chief Architect Circle] Ask big names, get in touch with them, or get private information sharing.  

Himalayas [Super Architect] Learn about the latest black technology information and architecture experience on the road or in the car. [Intelligent moments, Mr. Architecture will talk to you about black technology]
knowledge planet Meet more friends, workplace and technical chat. Knowledge Planet【Workplace and Technology】
LinkedIn Harry https://www.linkedin.com/in/architect-harry/
LinkedIn group LinkedIn Architecture Group https://www.linkedin.com/groups/14209750/
Weibo‍‍ 【Super Architect】 smart moment‍
Bilibili 【Super Architect】

Tik Tok 【cea_cio】Super Architect

quick worker 【cea_cio_cto】Super Architect

little red book [cea_csa_cto] Super Architect  

website CIO (Chief Information Officer) https://cio.ceo
website CIOs, CTOs and CDOs https://cioctocdo.com
website Architect practical sharing https://architect.pub   
website Programmer cloud development sharing https://pgmr.cloud
website Chief Architect Community https://jiagoushi.pro
website Application development and development platform https://apaas.dev
website Development Information Network https://xinxi.dev
website super architect https://jiagou.dev
website Enterprise technical training https://peixun.dev
website Programmer's Book https://pgmr.pub    
website developer chat https://blog.developer.chat
website CPO Collection https://cpo.work
website chief security officer https://cso.pub    ‍
website CIO cool https://cio.cool
website CDO information https://cdo.fyi
website CXO information https://cxo.pub

Thank you for your attention, forwarding, likes and watching.

Guess you like

Origin blog.csdn.net/jiagoushipro/article/details/131346164