Apache Cassandra and Apache Ignite: Relational Collocations and Distributed SQL

In the previous article , I reviewed and summarized the shortcomings of the query-driven data model (or unconventional data model) methodology used in Cassandra. It turns out that it is impossible to develop efficient applications with this methodology without a deep understanding of the query. In fact, the application architecture of this scenario will become more complex, difficult to maintain, and will cause a lot of data redundancy.

Furthermore, this question is often overshadowed by the notion that "if you want scalability, speed, and high availability, you have to be prepared to store multiple copies of your data and sacrifice SQL and strong consistency." This argument might have been ten years ago Correct, but now completely wrong!

Without exaggeration, we chose another ASF member, Apache Ignite . In this article, I will explain the application architecture based on Ignite, and then measure its maintenance cost.

The application of our choice is still to track the vehicles produced by all manufacturers, and then understand the production capacity of each single manufacturer. If you have read the first article, you should know that the relationship model is as follows:

Next, can I use Ignite's CREATE TABLE command to create these three tables and then run the SQL-driven application? Not necessarily, if you don't need to associate data stored in different tables, then it's OK. But according to the previous article, the premise is that the application needs to support two kinds of related queries:

  1. Q1: Get the models produced by a manufacturer in a specific time period.
  2. Q2: Get the output of a specific model from a manufacturer.

In Cassandra's case, we created a table for each query to avoid the association problem, so with Ignite, do we have to go through the same process? Not at all. In fact, Ignite's non-concatenated associations are fully functional, and if the three tables are already built, no additional work is required. However, this is no more efficient and faster than juxtaposition . So first learn more about relational juxtaposition and then see how this concept is used in Ignite.

A data model based on collocated associations

Relational juxtaposition is a powerful concept in Ignite (and other distributed databases like Google Spanner and MemSQL) to store related data on a cluster of nodes. So what data is relevant? Especially in the context of relational databases, this is very simple, just mark a parent-child relationship between business objects , specify a relationship key in the CREATE TABLE statement , and leave the rest to Ignite!

Taking the application of vehicle and manufacturer as an example, it is reasonable to use the manufacturer as the parent entity and the vehicle as the child entity. For example, after configuring in this way, all vehicle data produced by a manufacturer will be stored on the same node, as shown in the following figure:

As shown in the figure, the vehicles produced by Toyota are stored in node 1, while the vehicles produced by Ford are stored in node 2. This is the relationship juxtaposition, and the vehicles are stored in the node where the corresponding manufacturer is located.

To achieve such data distribution, Vendorthe SQL definition of the table is as follows:

CREATE TABLE Vendor (
    id INT PRIMARY KEY,
    name VARCHAR,
    address VARCHAR
);

Vendor data will be randomly distributed throughout the cluster, and Ignite will use the primary key column to calculate the node where the vendor data resides. Next is the Cartable:

CREATE TABLE Car (
    id INT,
    vendor_id INT,
    model VARCHAR,
    year INT,
    price float,
    PRIMARY KEY(id, vendor_id)
) WITH "affinityKey=vendor_id";

The vehicle table has a affinityKeyparameter, configured as a vendor_idcolumn, that tells Ignite that the vehicle is stored on the vendor_idcorresponding cluster node.

Repeat the same process on the Productiontable, and its data is also stored on the vendor_idcorresponding cluster node, as follows:

CREATE TABLE Production (
    id INT,
    car_id INT,
    vendor_id INT,
    country VARCHAR,
    total INT,
    year INT,
    PRIMARY KEY(id, car_id, vendor_id)
) WITH "affinityKey=vendor_id";

In this way, the data model is completed, and the next step is to enter the code of the application, and then develop the necessary queries.

SQL query with association

Ignite clusters can be queried using familiar SQL, which supports distributed SQL associations and secondary indexes. Ignite supports two types of associations: collocated and non-collocated . Assuming that the tables to be associated are already collocated and all local data is available, then a collocated association avoids the movement of data (needed for the association), which is the most efficient and performant in a distributed database. If some tables cannot be relationally collocated, but still need to be associated, then a non-collocated association is a backup plan. This type of association is slower because it requires movement of data between cluster nodes when associated.

Before, the and tables have been configured Vendor, the next step is to take advantage of the collocation association to write a SQL for Q1 :CarProduction

SELECT c.model, p.country, p.total, p.year FROM Vendor as v
JOIN Production as p ON v.id = p.vendor_id
JOIN Car as c ON c.id = p.car_id
WHERE v.name = 'Ford Motor' and p.year >= 2017
ORDER BY p.year;

Can it be faster? Of course. The following defines secondary indexes for Vendor.namesum columns:Production.year

CREATE INDEX vendor_name_id ON Vendor (name);
CREATE INDEX prod_year_id ON Production (year);

Queries targeting Q2 also require no extra work:

SELECT p.country, p.total, p.year FROM Vendor as v
JOIN Production as p ON v.id = p.vendor_id
JOIN Car as c ON c.id = p.car_id
WHERE v.name = 'Ford Motor' and c.model = 'Explorer';

Now, if the boss asks to add a new feature, he can quickly construct a new set of SQL to satisfy him. Finish! As a comparison, if you want to support Q2, you can see how the Cassandra-based architecture works .

Architecture Simplification: Mission Accomplished!

Ignite's data model based on relational collocation, and the query-driven model based on Cassandra has the following advantages:

  • The data layer of the application is modeled based on a familiar relational model, which is easy to maintain;
  • Data is accessed using standard SQL syntax;
  • Relational collocation provides additional benefits of modern distributed databases:

Using Ignite instead of Cassandra, the simplified software architecture is not the only benefit. After a while, there will be ideas about strong consistency and memory extreme performance.

This article is translated from Denis Magda's blog .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324397552&siteId=291194637