Online installation of cassandra-1-NoSQL database

Apache Cassandra is an open source distributed NoSQL database system. It was originally developed by Facebook to store data in simple formats such as inboxes. It integrates the data model of Google BigTable and the fully distributed architecture of Amazon Dynamo. Facebook opened Cassandra as open source in 2008. Since then, due to its good scalability, Cassandra has been adopted by well-known Web 2.0 websites such as Digg and Twitter, and has become a popular distributed structured data storage solution.

Highly available and scalable distributed database

1 Introduction to cassandra

1.1 Core structure

(1)
Where Node stores data. It is a collection of Cassandra's infrastructure components
(2) datacenter
related nodes. The data center can be a physical data center or a virtual data center. Separate data centers should be used for different workloads, whether physical or virtual. The replication is set by the data center. Using a separate data center can prevent Cassandra transactions from being affected by other workloads and keep requests close to each other to reduce latency. Depending on the replication factor, data can be written to multiple data centers. The data center must not span physical locations.
(3) Cluster
A cluster contains one or more data centers. It can span physical locations.
(4) Commit log is
for durability, all data must be written to the commit log before writing (log write priority). After all data is refreshed to SSTables, it can be archived, deleted, or recycled.
(5) SSTable (Sorted String Table)
An SSTable is an immutable data file, and Cassandra periodically writes memtables into it. Only append SSTables and store them on disk in order, and maintain SSTables for each Cassandra table.
(6) CQL Table is an
ordered collection obtained by table rows. A table consists of multiple columns and has a primary key.

1.2 Core components

(1) Gossip is
a peer-to-peer communication protocol used to discover and share the location and status information of other nodes in the Cassandra cluster. Gossip information is also saved locally by each node for immediate use when the node is restarted.
(2) The Partitioner
partition program determines which node will receive the first copy of a piece of data, and how to distribute other copies across other nodes in the cluster. Each row of data is uniquely identified by a primary key. The primary key may be the same as its partition key, but it may also contain other cluster columns. Partitioner is a hash function that derives the token from the primary key of a row. The partitioner uses the token value to determine which nodes in the cluster receive a copy of the row. Murmur3Partitioner is the default partitioning strategy for new Cassandra clusters, and it is the right choice for new clusters in almost all cases.
(3) Replication factor
The total number of copies in the entire cluster. A copy factor of 1 means that each row has only one copy on a node. A copy factor of 2 means that each row has two copies, and each copy is located on a different node. All copies are equally important, there is no master copy. You can define the replication factor for each data center. Generally, the replication policy should be set to be greater than 1, but not to exceed the number of nodes in the cluster.
(4) Replica placement strategy
Cassandra stores copies of data on multiple nodes to ensure reliability and fault tolerance. The copy strategy determines which node to place the copy on. The first copy of the data is the first copy, and it is not unique in any sense. It is strongly recommended to use the NetworkTopologyStrategy strategy, because it can be easily expanded to multiple data centers when it needs to be expanded in the future. When creating a keyspace, you must define a copy placement strategy and the number of copies required.
(5) Snitch
snitch defines a group of machines as data centers and racks (topology), and the replica strategy uses these data centers and racks to place replicas.

When creating a cluster, a snitch must be configured. All snitchs use a dynamic snitch layer that monitors performance and selects the best copy for reading. It is enabled by default and is recommended for most deployments. Configure the dynamic snitch threshold for each node in the cassandra.yaml configuration file.
(6) cassandra.yaml is
used to set the cluster initialization attributes, table cache parameters, tuning and resource utilization attributes, timeout settings, client connections, backup and security main configuration files.

2 Online installation

(1) Add the Cassandra warehouse to the yum source
#cd /etc/yum.repos.d/
#vi cassandra.repo

[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS

(2) Installation
#yum -y install cassandra
download is slow, manually download it from the address below.
https://mirrors.tuna.tsinghua.edu.cn/apache/cassandra/redhat/311x/
Insert picture description here#yum install -y cassandra-3.11.9-1.noarch.rpm After the
installation is complete, it is found that it is not registered as a system service .
Need to restart the machine to take effect

3 Offline installation

URL http://cassandra.apache.org/Download address
Insert picture description here#wget https://mirrors.tuna.tsinghua.edu.cn/apache/cassandra/3.11.9/apache-cassandra-3.11.9-bin.tar.gz

4 Use

#systemctl start cassandra
Cassandra provides a REPL tool called cqlsh, which is a command line interactive tool written in Python, which can easily create keyspace, table, CRUD and other operations.
#cqlsh
Insert picture description herecqlsh> quit Exit
cqlsh> help view command
View the specific application mode of cassandra through the following website.
https://www.w3cschool.cn/cassandra/

Guess you like

Origin blog.csdn.net/qq_20466211/article/details/112291509
Recommended