Ditch the database and use Kafka instead!

Databases have long served as systems of record, earning the trust of most companies to store and manage critical data in a reliable and durable manner.

But times are changing. Many emerging trends are impacting the way data is stored and managed today, forcing some technology decision makers to reconsider what innovations there are in data storage. Perhaps, relational databases are beginning to become obsolete.

This article provides you with a new way of "out of the box" recording system - why organizations need to think differently about data storage, the benefits of using Kafka as a recording system, and what are the good implementation ideas, etc. I hope Inspire you guys.

1. Replace relational database with Kafka

KOR Financial is a financial services startup. Why did they choose Kafka instead of relying on a relational database to store data? Andreas, the company's chief technology officer, once worked for Pivotal Software and VMware, and has led the application transformation architecture practice on a global scale. What is the mystery of his decision?

Let me talk about the results first. Using the Kafka solution, it is possible to "cost-effectively and safely store tens or even hundreds of petabytes of data, and keep it for decades." Andreas said, "Using this method not only provides great flexibility for data architecture and scalability, while also enabling lean and agile operations.”

 

2. Breaking the mold: databases are not designed for scale

Times have changed! In the era of digital transformation, data-driven decision-making requires enterprises to have a modern and flexible data architecture. To realize such an architecture, the key to success lies in whether the data storage can be powerful, reliable and flexible.

It is true that we have also seen the rise of big data, distributed systems, cloud computing and real-time data processing in the past two decades, but the traditional database has become a bottleneck, unable to keep up with the speed and quantity of data generated per second.

First, it's because databases aren't designed for scale. Their inherent rigid structure only hinders the flexibility required by enterprise data architectures.

As an operator of financial and trade repositories serving global corporations, as well as complementary modular services, the level of data processing is comparable to purgatory. KOR Financial's innovative approach to data flow first is what differentiates it from its competitors. "Goal: To revolutionize the way derivatives markets and global regulators think about trade reporting, data management and compliance."

Taking Kafka as the core of the architecture is a "qualitative" change in the way of thinking: because this architecture can capture events rather than just states. "Storing data in Kafka instead of a database, and using it as the system of record, makes it possible to track all of these events, process them, and create materialized views of the data based on current or future use cases."

While other trade repositories and brokerage service providers often use databases such as Oracle Exadata for their data storage needs, it can be expensive and present data management challenges. While it allows the execution of SQL queries, the challenge is managing large SQL databases and ensuring data consistency within these systems.

Being in the global mandatory trade reporting business means servicing multiple jurisdictions, each with their own unique data models and interpretations. The task of unified management becomes increasingly complex when all data is consolidated into a single schema or model. Schema evolution is challenging without a historical overview of the data as it is materialized in a specific version of the state, further exacerbating the data management dilemma.

In addition, the scalability of traditional databases is limited when dealing with large amounts of data. In contrast, using Confluence Cloud with Kafka and its unlimited storage allows users to store as much data as they want in Kafka, for as long as needed, and only pay for the storage they use.

While the number of partitions is a consideration, the amount of data that can be put into Confluence Cloud is unlimited, and the storage space grows automatically as needed with unlimited retention time.

It enables technologists to completely abstract away how data is stored under the hood and provides a cost-effective way to keep all of it. Even better, this enables enterprises to extend their operations in an unrestricted manner, and to interpret events in any representation they want, with a high degree of freedom.

3. Kafka that will be active: replay events, replay data

One of the significant advantages of using Kafka as a system of record is its ability to play back data, a native capability that traditional databases lack. For financial scenarios, this feature fits well with the preference of "store events and state", which is critical for accurate calculation of transaction state.

"We receive a bunch of deltas (increments), which we call commits or messages, which contribute to the state of the trade at a given point in time. Each incoming message or event modifies the transaction and changes its current state .If any error occurs during our stream processing logic, it may result in incorrect status output."

If that information were stored directly in a fixed representation or a traditional database, the events that led to that state would be lost. Even if the interpretation of these events is incorrect, the context that led to that interpretation cannot be revisited.

However, by preserving the historical order of events in an immutable and append-only log, Kafka provides the ability to replay those events.

Given the regulatory requirements of the business, everything must be stored immutably. All data received initially needs to be captured and retained. While most databases (including SQL) allow modification, Kafka by design prohibits any changes to its immutable log.

Using Kafka as the system of record and having unlimited storage means going back in time, analyzing how things unfolded, interpreting changes, managing point-in-time historical corrections and creating alternative representations without impacting current operational workloads.

This flexibility offers significant advantages, especially when operating in highly regulated markets where timely and efficient correction of errors is critical.

 

4. Flexibility conquers all 

Using Kafka as the system of record brings significant flexibility to the data architecture. Views can be built specific to each use case, using a dedicated database or technology that precisely aligns with those needs, and then read the Kafka topics containing these event sources.

Take customer data management as an example. It is possible to use a graph database designed specifically for this use case without building an entire system around a graph database since it is just a view or projection based on Kafka.

This approach allows different databases to be used depending on the use case without specifying them as the system of record. Instead, they act as a representation of the data, enabling flexibility to be maintained. Otherwise, it will be inserted into a database, data lake or data warehouse, which are rigid and do not allow the transformation of data into a representation optimized for a specific use case.

From a startup’s perspective, this flexibility also enables avoiding premature lock-in to a particular technology direction. Established in 2021, KOR follows architectural best practices of deferring decisions until the last responsible moment, which can defer commitment to a specific technology choice until it is necessary and compliant. This approach means a technology environment that can adapt and evolve as business needs evolve, enabling future scalability and flexibility.

In addition to flexibility, the use of the Schema Registry ensures data consistency, so developers know where data comes from and what schemas are associated with it. Confluence Cloud also allows explicit evolution policies to be set through a schema registry. For example, if you put all your data into a data lake, it becomes much more difficult to manage all the different versions, different schemas, and different representations of that data.

5. Behind the switching technology: event-driven thinking

Abandoning the database and adopting Kafka as the system of record for storing data seems to be a very fresh approach.

Not all companies can accept this approach. Andreas believes that this requires companies to cultivate a culture of "event-driven models", and this change in thinking should also be extended to the way of developing applications through stream processing, otherwise it will cause compatibility. gender mismatch issues.

The purpose of this is to help team members realize that they are dealing with immutable data and if they have written something, they cannot just go in and change it.

Andreas also suggested that implementing a Kafka-centric architecture starts with a team that understands "the importance of stream processing and events as a proof system." By demonstrating strengths within that team, they can act as ambassadors for other teams, encouraging the adoption of events as the ultimate truth and stream processing with state as the ultimate representation.

6. Written at the end: Can Kafka replace the database?

As early as 2017, Jay Kreps, the co-founder of Apache Kafka and Confluent, clearly stated that "data can be stored in Apache Kafka".

Moreover, data can be stored in Kafka for as long as you want. The New York Times' Apache Kafka launch is a famous example of using Kafka to store data forever. Kafka is used to store all the articles that The New York Times has ever published, replacing their API-based approach.

So can Kafka replace the database? Obviously it is not realistic. Even though the article mentions many "inappropriate" aspects of traditional databases, such as "databases are not designed for scale", it is only limited to solutions in strong real-time scenarios such as finance.

However, the advocated method of breaking the traditional database thinking pattern and redesigning the underlying architecture is worth reflecting and learning from.

7. Expansion of related fields

In the past 10 years, when even traditional enterprises began to digitize on a large scale, we found that in the process of developing internal tools, a large number of pages, scenes, components, etc. were constantly repeated. This repetitive work of reinventing the wheel wasted a lot of time for engineers.

In response to such problems, low-code visualizes certain recurring scenarios and processes into individual components, APIs, and database interfaces, avoiding repeated wheel creation. Greatly improved programmer productivity.

Recommend a software JNPF rapid development platform that programmers should know, adopts the industry-leading SpringBoot micro-service architecture, supports SpringCloud mode, improves the foundation of platform expansion, and meets the needs of rapid system development, flexible expansion, seamless integration and high Comprehensive capabilities such as performance applications; adopting the front-end and back-end separation mode, front-end and back-end developers can work together to be responsible for different sections, saving trouble and convenience. Experience official website: https://www.jnpfsoft.com/?csdn

If you haven't understood the low-code technology, you can experience and learn it quickly!

Guess you like

Origin blog.csdn.net/Z__7Gk/article/details/132538682
Recommended