Apache Cassandra and Apache Ignite: Enhance Apache Cassandra with Ignite

Apache Cassandra is one of the leaders in open source distributed NoSQL disk databases. As a key infrastructure, it has been deployed in many companies such as Netflix, eBay, Expedia, etc. It is fast, linearly scalable to thousands of nodes, and first-class data. Center copy and popular.

Apache Ignite is a memory-centric distributed database, caching and processing platform that can process transactions, analytics and streaming workloads at memory-level speeds for petabytes of data, supporting JCache, SQL99, ACID transactions, and machine learning.

Apache Cassandra is a classic solution in its domain, and like any domain-specific solution, its strengths are built on a few compromises, a typical factor being the limitation of disk storage, which Cassandra has done. There may be many optimizations to solve these problems.

To give an example of trade-offs: without ACID and SQL support, transactions and analysis cannot be performed at will. If the data is not well adapted in advance, these compromise factors will cause logical troubles to users and lead to product inconsistencies. Correct use, or even negative experience, or lead to data sharing between different types of storage, infrastructure fragmentation and application data logic complexity.

But as a user of Cassandra, is it possible to use it with Apache Ignite? As a premise, the purpose is to maintain the existing Cassandra system and then solve its limitations. The answer is: yes, we can deploy Ignite as a memory layer on top of Cassandra, and we will describe how to achieve this later in this article.

1. Limitations of Cassandra

First, let's briefly go over the main limitations, which are the ones we're trying to solve:

  1. Limited by the characteristics of disk or SSD, bandwidth and response time are limited;
  2. The data structure is optimized for sequential reading and writing. It is not adapted for the performance optimization of traditional relational data operations. It cannot standardize data and efficiently perform associated operations, and strictly enforces such as GROUP BY and ORDER. limits;
  3. Because of the second reason, the lack of support for SQL also leads to the limited functionality of CQL;
  4. lack of ACID transactions;

Although Cassandra can be used for other purposes, that's fine, but if these problems can be solved, Cassandra's capabilities will be significantly enhanced. By combining people and horses, you can get a rider, which is already a completely different thing than people and horses alone.

So how to circumvent these restrictions?

The traditional method is to split the data, part of it is stored in Cassandra, and other parts that Cassandra cannot guarantee are stored in different systems.

The downside of this approach is increased complexity (potentially also resulting in a decrease in speed and quality) and increased maintenance costs, compared to using a single system for data storage, the application needs to combine data from different data sources, even, Weakening of any other system can have serious negative consequences, forcing infrastructure teams to work their way through.

2. Apache Ignite as a memory layer

Another way is to put another system on top of Cassandra. After the responsibilities are divided, Ignite can provide the following capabilities:

  1. Resolves performance limitations due to disk: Ignite runs entirely in memory, one of the fastest and cheapest storage available!
  2. Fully supports standard SQL99: including association, GROUP BY, ORDER BY and DML, can standardize data, facilitate analysis, focus on memory performance, open the potential of HTAP, and conduct real-time analysis of production data;
  3. Support for JDBC and ODBC standards: easy to integrate with existing tools, such as Tableau, and frameworks such as Hibernate or Spring Data;
  4. Support ACID transactions: if consistency must be required, then this is necessary;
  5. Distributed computing, streaming data processing, machine learning: Many new business scenarios can be quickly implemented using the technical dividends provided by Ignite.

The Ignite cluster uses Cassandra data for queries. After writing through is enabled, all data changes will be written back to Cassandra. In the next step, Ignite holds the data and can freely use SQL, run transactions, and enjoy memory-level speed.

In addition, the data can also be analyzed in real-time using visualization tools such as Tableau.

3. Configuration

Next, walk through a simple example of Ignite and Cassandra integration to illustrate how they work together and what features are available.

First, create the necessary tables in Cassandra, inject some data, then initialize a Java project, write some DTO classes, and then show the core part: configuring Ignite to work with Cassandra.

The example uses macOS, Cassandra3.10 and Ignite2.3. In Linux, the commands are similar.

3.1. Cassandra tables and data

First, put the Cassandra distribution in the ~/Downloads folder, then go into that folder and unzip it:

$ cd ~/Downloads
$ tar xzvf apache-cassandra-3.10-bin.tar.gz
$ cd apache-cassandra-3.10

Start Cassandra with the default configuration, for testing this is enough:

$ bin/cassandra

Next, use Cassandra's interactive terminal to create the data structures for testing. The common id is used here as the primary key. For tables in Cassandra, the choice of the primary key is very important, which is related to subsequent data extraction, but this example is simplified:

$ cd ~/Downloads/apache-cassandra-3.10
$ bin/cqlsh
CREATE KEYSPACE IgniteTest WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

USE IgniteTest;

CREATE TABLE catalog_category (id bigint primary key, parent_id bigint, name text, description text);
CREATE TABLE catalog_good (id bigint primary key, categoryId bigint, name text, description text, price bigint, oldPrice bigint);

INSERT INTO catalog_category (id, parentId, name, description) VALUES (1, NULL, 'Appliances', 'Appliances for households!');
INSERT INTO catalog_category (id, parentId, name, description) VALUES (2, 1, 'Refrigirators', 'The best fridges we have!');
INSERT INTO catalog_category (id, parentId, name, description) VALUES (3, 1, 'Washing machines', 'Engineered for exceptional usage!');

INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (1, 2, 'Fridge Buzzword', 'Best fridge of 2027!', 1000, NULL);
INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (2, 2, 'Fridge Foobar', 'The cheapest offer!', 300, 900);
INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (3, 2, 'Fridge Barbaz', 'Premium fridge in your kitchen!', 500000, 300000);
INSERT INTO catalog_good (id, categoryId, name, description, price, oldPrice) VALUES (4, 3, 'Appliance Habr#', 'Washes, squeezes, dries!', 10000, NULL);

Check that the saved data is correct:

cqlsh:ignitetest> SELECT * FROM catalog_category;

id | description | name | parentId
----+--------------------------------------------+--------------------+-----------
1 | Appliances for households! | Appliances | null
2 | The best fridges we have! | Refrigirators | 1
3 | Engineered for exceptional usage! | Washing machines | 1

(3 rows)
cqlsh:ignitetest> SELECT * FROM catalog_good;

id | categoryId | description | name | oldPrice | price
----+-------------+---------------------------+----------------------+-----------+--------
1 | 2 | Best fridge of 2027! | Fridge Buzzword | null | 1000
2 | 2 | The cheapest offer! | Fridge Foobar | 900 | 300
4 | 3 | Washes, squeezes, dries! | Appliance Habr# | null | 10000
3 | 2 | Premium fridge in your kitchen! | Fridge Barbaz | 300000 | 500000

(4 rows)

3.2. Initialize the Java project

There are two ways to use Ignite: one is to download the release package from the official website , then add the jar file to the classpath and configure it using XML, or use Ignite as a Maven dependency of the Java project. This article will use the second method.

Create a new project with Maven and add the following libraries:

  • ignite-cassandra-store: for Cassandra integration;
  • ignite-spring: Configure Ignite using Spring XML files.

Both libraries depend on ignite-coreand contain the core functionality of Ignite:

<dependencies>
    <dependency>
        <groupId>org.apache.ignite</groupId>
        <artifactId>ignite-spring</artifactId>
        <version>2.3.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.ignite</groupId>
        <artifactId>ignite-cassandra-store</artifactId>
        <version>2.3.0</version>
    </dependency>
</dependencies>

Next, create the DTO class for mapping Cassandra's tables:

import org.apache.ignite.cache.query.annotations.QuerySqlField;

public class CatalogCategory {
    @QuerySqlField private long id;
    @QuerySqlField private Long parentId;
    @QuerySqlField private String name;
    @QuerySqlField private String description;

    // public getters and setters
}

public class CatalogGood {
    @QuerySqlField private long id;
    @QuerySqlField private long categoryId;
    @QuerySqlField private String name;
    @QuerySqlField private String description;
    @QuerySqlField private long price;
    @QuerySqlField private long oldPrice;

    // public getters and setters
}

Annotations are added to these attributes so that they @QuerySqlFieldcan be queried through Ignite SQL. If an attribute is not annotated with this annotation, it cannot be extracted or filtered through SQL.

Of course, fine-tuning can also be done, such as defining indexes and full-text retrieval, but this is beyond the scope of this article. For more information on configuring Ignite SQL, you can check the corresponding documentation .

3.3. Configuring Ignite

src/main/resourcesCreate a configuration file named in the directory apacheignite-cassandra.xml, the following is the complete configuration, the key parts will be explained later:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean class="org.apache.ignite.cache.store.cassandra.datasource.DataSource" name="cassandra">
        <property name="contactPoints" value="127.0.0.1"/>
    </bean>

    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="cacheConfiguration">
            <list>
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="CatalogCategory"/>
                    <property name="writeThrough" value="true"/>
                    <property name="sqlSchema" value="catalog_category"/>
                    <property name="indexedTypes">
                        <list>
                            <value type="java.lang.Class">java.lang.Long</value>
                            <value type="java.lang.Class">com.gridgain.test.model.CatalogCategory</value>
                        </list>
                    </property>
                    <property name="cacheStoreFactory">
                        <bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory">
                            <property name="dataSource" ref="cassandra"/>
                            <property name="persistenceSettings">
                                <bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings">
                                    <constructor-arg type="java.lang.String"><value><![CDATA[
                                        <persistence keyspace="IgniteTest" table="catalog_category">
                                            <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/>
                                            <valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/>
                                        </persistence>]]></value></constructor-arg>
                                </bean>
                            </property>
                        </bean>
                    </property>
                </bean>

                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="CatalogGood"/>
                    <property name="readThrough" value="true"/>
                    <property name="writeThrough" value="true"/>
                    <property name="sqlSchema" value="catalog_good"/>
                    <property name="indexedTypes">
                        <list>
                            <value type="java.lang.Class">java.lang.Long</value>
                            <value type="java.lang.Class">com.gridgain.test.model.CatalogGood</value>
                        </list>
                    </property>
                    <property name="cacheStoreFactory">
                        <bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory">
                            <property name="dataSource" ref="cassandra"/>
                            <property name="persistenceSettings">
                                <bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings">
                                    <constructor-arg type="java.lang.String"><value><![CDATA[
                                        <persistence keyspace="IgniteTest" table="catalog_good">
                                            <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/>
                                            <valuePersistence class="com.gridgain.test.model.CatalogGood" strategy="POJO"/>
                                        </persistence>]]></value></constructor-arg>
                                </bean>
                            </property>
                        </bean>
                    </property>
                </bean>
            </list>
        </property>
    </bean>

</beans>

The above configuration can be divided into two parts. First, define a data source to connect to Cassandra, and the second is the configuration of Ignite itself. The configuration of the first part is relatively simple:

 <bean class="org.apache.ignite.cache.store.cassandra.datasource.DataSource" name="cassandra">
        <property name="contactPoints" value="127.0.0.1"/>
    </bean>

Use the IP address to define the Cassandra data source to connect to.

The next step is to configure Ignite, in this case there are only minor differences from the default configuration, just overriding the cacheConfigurationproperties, which contains a set of Ignite caches mapped to Cassandra tables:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="cacheConfiguration">
            <list>
                ...
            </list>
        </property>
    </bean>

The first cache maps to Cassandra's catalog_categorytable:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
	<property name="name" value="CatalogCategory"/>
	...
</bean>

Each cache has read-through and write-through modes enabled. For example, if a write operation is performed on Ignite, then Ignite will automatically send an update operation to Cassandra. Next, specify the catalog_categorymode used in Ignite:

<property name="readThrough" value="true"/>
<property name="writeThrough" value="true"/>
<property name="sqlSchema" value="catalog_category"/>
<property name="indexedTypes">
	<list>
		<value type="java.lang.Class">java.lang.Long</value>
		<value type="java.lang.Class">com.gridgain.test.model.CatalogCategory</value>
	</list>
</property>

Finally, establish a connection to Cassandra, which has two main sub-segments, first, to point to the previously created data source, and then to associate the Ignite cache with the Cassandra table.

Originally, it persistenceSettingsis better to point to an external configuration mapping XML configuration file through attributes, but for simplicity, this XML is directly embedded in the Spring configuration file as a CDATA fragment:

<property name="cacheStoreFactory">
	<bean class="org.apache.ignite.cache.store.cassandra.CassandraCacheStoreFactory">
		<property name="dataSource" ref="cassandra"/>
		<property name="persistenceSettings">
			<bean class="org.apache.ignite.cache.store.cassandra.persistence.KeyValuePersistenceSettings">
				<constructor-arg type="java.lang.String"><value><![CDATA[
					<persistence keyspace="IgniteTest" table="catalog_category">
						<keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/>
						<valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/>
					</persistence>]]></value></constructor-arg>
			</bean>
		</property>
	</bean>
</property>

The mapping configuration looks very straightforward:

<persistence keyspace="IgniteTest" table="catalog_category">
    <keyPersistence class="java.lang.Long" strategy="PRIMITIVE" column="id"/>
    <valuePersistence class="com.gridgain.test.model.CatalogCategory" strategy="POJO"/>
</persistence>

At the top level ( persistencelabel), the key space (in this case, IgniteTest) and the table to be associated (catalog_category) are declared, and then the primary key of the Ignite cache is declared as Long type, which is a basic type, which corresponds to the Cassandra table. id column. In this example, the value class is associated with a column in the Cassandra table CatalogCategoryby means of reflection (the strategy is ).POJO

More details about the mapping are beyond the details of this article, and you can refer to the relevant documentation for details .

The second part is about the cache configuration related to product data, which is roughly the same.

3.4. Start

Use the following classes to start:

package com.gridgain.test;

import org.apache.ignite.Ignite;
import org.apache.ignite.Ignition;

public class Starter {
    public static void main(String... args) throws Exception {
        final Ignite ignite = Ignition.start("apacheignite-cassandra.xml");

        ignite.cache("CatalogCategory").loadCache(null);
        ignite.cache("CatalogGood").loadCache(null);
    }
}

Here the Ignition.start(...)method is used to start an Ignite node, the ignite.cache(...).loadCache(null)method is used to preload data from Cassandra into Ignite.

3.5.SQL

After the Ignite cluster is started and Cassandra is connected, Ignite SQL queries can be executed. For example, you can use any client that supports JDBC or ODBC. In this example, SquirrelSQL is used. First, you need to add the Ignite JDBC driver to the tool : 1Use jdbc:ignite://localhost/CatalogGoodthe URL form to establish a connection, where localhost is the address of a node in the Ignite cluster , then CatalogGood is the default requested cache name. 2Finally, several SQL queries can be executed:

SELECT cg.name goodName, cg.price goodPrice, cc.name category, pcc.name parentCategory
FROM catalog_category.CatalogCategory cc
  JOIN catalog_category.CatalogCategory pcc
  ON cc.parentId = pcc.id
  JOIN catalog_good.CatalogGood cg
  ON cg.categoryId = cc.id;
goodName goodPrice category parentCategory
Fridge Buzzword 1000 Refrigerators Appliances
Fridge Foobar 300 Refrigerators Appliances
Fridge Barbaz 500,000 Refrigerators Appliances
Appliance Habr # 10000 Washing machines Appliances
SELECT cc.name, AVG(cg.price) avgPrice
FROM catalog_category.CatalogCategory cc
  JOIN catalog_good.CatalogGood cg
  ON cg.categoryId = cc.id
WHERE cg.price <= 100000
GROUP BY cc.id;
name avgPrice
Refrigerators 650
Washing machines 10000

4 Conclusion

In this simple example, we show how the introduction of Ignite brings in-memory performance SQL functionality to an existing Cassandra system. Therefore, if you are suffering from the limitations of Cassandra mentioned in this article, then you need to consider Ignite as an alternative technical solution.

This article is translated from the blog of Artem Schitow, Business Architect at GridGain .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324373591&siteId=291194637