Link tracking Jaager using cassandra

cassandra architecture

Reprinted from:  Cassandra Internal Architecture - Golden Fish - Blog Garden

Cassandra is an open source, distributed, non-central node, elastically scalable, highly available, fault-tolerant, consistent coordination, column-oriented NoSQL database

Cassandra cluster (Cluster)

  • Cluster
    • Data center(s)
      • Rack(s)
        • Server(s)
          • Node (more accurately, a vnode)

  • Node: an instance running cassandra
  • Rack (rack): a collection of nodes
  • DataCenter (data center): a collection of racks
  • Cluster (cluster): maps to the set of all nodes that have a complete token circle

 Coordinator

When a client connects to a node to initiate a read or write request, the node acts as a bridge between客户端应用 the cluster and is called a coordinator to determine that the request should be obtained according to the cluster configuration.拥有相应数据节点环(ring)中的哪个节点

The node specified by the CQL connection -his the coordinating node

  • Any node in the cluster may become the coordinator
  • Each client request may be coordinated by a different node
  • The replication factor is managed by the coordinator (replication factor: how many nodes a new piece of data should be replicated to)
  • The coordinator applies for a consistency level (consistency level: how many nodes in the cluster must respond to read and write requests)

Partitioner

Partitioners determine how data is distributed within the cluster. In Cassandra, each row of the table is identified by a unique primarykey, which partitioneris actually 一hash函数used to 计算primary key的token. Cassandra places the corresponding row in the cluster based on the token value.

Cassandra provides three different partitioners

  • Murmur3Partitioner (default) - based on the MurmurHash hash value to evenly distribute data in the cluster 
  • RandomPartitioner - Evenly distributes data in the cluster based on MD5 hash value
  • ByteOrderedPartitioner - maintains an ordered distribution of data vocabulary by the bytes of the key

Virtual node (Vnode)

Each virtual node corresponds to a token value, each token determines 节点在环中的位置and 节点应当承担的一段连续的数据hash值的范围,therefore each node has a continuous token, and this continuous token forms a closed circle.

Without using virtual nodes, the number of ring tokens = the number of machines in the cluster. For example, there are 6 nodes in total, so the number of tokens = 6. Because the replication factor = 3, a record must exist in three nodes in the cluster. Simple way It is to calculate the hash value of the rowkey, which token falls in the ring, the first copy of data is on that node, and the remaining two copies fall on the last two nodes of this node on the token ring.

A, B, C, D, E, and F in the figure are the range of the key, and the real value is the hash ring space. For example, the interval from 0 to 2^32 is divided into 10 parts. Each segment is 1/10 of 2^32. Node 1 contains A, F, E means that the data whose key range is in A, F, E will be stored on node 1. And so on.

If you do not use virtual nodes, you need to manually calculate and allocate a token for each node in the cluster. Each token determines the position of the node in the ring and the range of a continuous data hash value that the node should undertake. In the upper part of the figure, each node is assigned a separate token to represent a position in the ring, and each node stores and maps the row key to a hash value, which falls within the unique range of continuous hash values ​​that the node should bear The data. Each node also contains copies of rows from other nodes.

The use of virtual nodes allows each node to have multiple smaller discrete hash value ranges. In the lower part of the figure, the nodes in the cluster use virtual nodes, and the virtual nodes are randomly selected and not continuous. The storage location of the data is also determined by the hash value mapped by the row key, but it falls within a smaller partition range.

Benefits of using virtual nodes

  • No need to calculate and allocate tokens for each node
  • No need to rebalance cluster load after adding and removing nodes
  • Rebuild abnormal nodes faster

Replication

There are currently two replication strategies available:

  • SimpleStrategy: Only for a single data center, the first replica is placed in the node determined by the partitioner, and the rest of the replicas are placed in the subsequent nodes in the clockwise direction of the above nodes.

  • NetworkTopologyStrategy: Can be used for more complex multi-data centers. You can specify how many replicas are stored in each data center.

The replication strategy is specified when creating the keyspace, such as

CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy','replication_factor' : 3 };  
CREATE KEYSPACE Excalibur WITH REPLICATION = {'class' :'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

 Among them, the names of data centers such as dc1 and dc2 should be consistent with the names configured in snitch. The above topology strategy indicates that 3 copies are configured in dc1, and 2 copies are configured in dc2.

gossip

The Gossip protocol is an internal communication technique for nodes in a cluster to communicate with each other. Gossip is an efficient, lightweight, and reliable broadcast protocol for exchanging data between nodes. It is a decentralized, fault-tolerant and peer-to-peer communication protocol. Cassandra uses gossibibing for peer discovery and metadata propagation.

The specific form of the gossip protocol is the seeds seed node in the configuration file. One point to note is that the seed nodes of all nodes in the same cluster should be consistent. Otherwise, if the seed nodes are inconsistent, sometimes the cluster will split, that is, two clusters will appear . Generally start the seed node first, and discover other nodes in the cluster as soon as possible. Each node exchanges information with other nodes. Due to randomness and probability, all nodes in the cluster will be exhausted. At the same time, each node will save all the nodes in the cluster. Other nodes. In this way, whichever node you connect to, you can know all other nodes in the cluster. For example, if cql connects to a node in the cluster, you can get the status of all nodes in the cluster. That is to say, any node is about the nodes in the cluster The state of the information should all be consistent!

The gossip process runs every second and exchanges information with up to 3 other nodes, so that all nodes can quickly learn about other nodes in the cluster. Since the whole process is decentralized, there is no node to coordinate the gossip information of each node. Each node will independently select one to three nodes for gossip communication. It always selects the active peer node in the cluster, probabilistically The selection of seed nodes and unreachable nodes.

The gossip protocol is similar to the three-way handshake of tcp. Using the conventional broadcast protocol, there can only be one message per round, and the message is allowed to spread gradually within the cluster. However, with the gossip protocol, each gossip message will only contain three messages. Increase A certain degree of anti-entropy. This process allows fast convergence between nodes exchanging data with each other.

First, the system needs to configure several seed nodes, such as A and B, each participating node will maintain the status of all nodes, node->(Key,Value,Version), a larger version number indicates that its data is newer, the node P can only directly update its own state, and node P can only indirectly update the data of other nodes maintained locally through the gossip protocol.

The general process is as follows,

 SYN: Node A randomly selects some nodes, here you can only choose to send a summary, that is, do not send valus, to avoid too large a message

ACK: After node B receives the message, it will merge it with the local one. Here, the merged version is the comparison version. The larger version indicates that the data is newer. For example, node A sends data C (key, valuea, 2) to node B , and node B stores C (key, valueb, 3) locally, so because the version of B is relatively new, the merged data is the data stored locally by B, and then it will be sent back to node A.

ACK2: Node A receives the ACK message and applies it to the local data.

Consistency Level

Consistency refers to whether all replica sets of a row of data are up-to-date and in sync. Cassandra expands the concept of final consistency. For a read or write operation, the so-called adjustable consistency concept refers to the client that initiates the request. The consistency level parameter can be used to specify the consistency required for this request.

Write Consistency

level describe usage
ANY Any node write operation has been successful. If all replica nodes are down, the write operation can still return success after recording a hinted handoff event. If all the replica nodes are down, the written data cannot be read until the down replica nodes recover. Minimal latency waits and ensures that write requests do not fail. Provides the lowest consistency and highest availability relative to the other levels.
ALL The write operation must write the data of the specified row to the commit log and memtable of all replica nodes. Provides the highest consistency and lowest availability relative to the other levels.
EACH_QUORUM The write operation must write the data of the specified row to the commit log and memtable of the quorum number of replica nodes in each data center. Strictly guarantee the same level of consistency for multi-datacenter clusters. For example, if you want, when a data center is down, or the write operation of replica nodes that cannot meet the quorum number succeeds, the write request returns failure.
LOCAL_ONE The write operation of any replica node in the local data center is successful. For the case of multiple data centers, it is often expected that at least one replica node is successfully written, but it is not expected to have any cross-data center communication. LOCAL_ONE can meet such needs.
LOCAL_WHERE The write operation of the quorum number of replica nodes in the local data center is successful. Avoid communication across data centers. Cannot be used with SimpleStrategy. It is used to ensure the data consistency of the local data center.
LOCAL_SERIAL The quorum number of replica nodes in the local data center conditionally write successfully. It is used to implement linearizable consistency under lightweight transaction to avoid unconditional updates. .
ONE The write operation of any replica node has been successful. Meet the needs of most users. Generally, the replica node closest to the coordinator node is executed first.

Notice:

Even if consistency level ON or LOCAL_QUORUM is specified, write operations will still be sent to all replica nodes, including replica nodes in other data centers. The consistency level only determines the number of replica nodes that need to ensure that the write operation is successful before notifying the client that the request is successful.

read consistency

level describe usage
ALL Query data from all replica nodes, and return the latest timestamp data among the data returned by all replicas. If a replica node does not respond, the read operation will fail. Provides the highest consistency and lowest availability relative to the other levels.
EACH_QUORUM Query data from quorum replica nodes in each data center, and return the data with the latest timestamp. 同LOCAL_WHERE。
LOCAL_SERIAL Same as SERIAL, but only limited to the local data center. Same as SERIAL.
LOCAL_WHERE Query data from quorum replica nodes in each data center, and return the data with the latest timestamp. Avoid communication across data centers. It fails when using SimpleStrategy.
LOCAL_ONE Returns the data of the replica node closest to the coordinator node in the local data center. It is the same as the usage of this level in the write operation Consistency level.
ONE Returns the result returned by the closest replica determined by the snitch. By default, read repair will be triggered in the background to ensure that the data of other replicas is consistent. Provides the highest level of availability, but returns results that are not necessarily up to date.
QUORUM Read the results of the quorum number of nodes in all data centers, and return the latest result of the merged timestamp. Strong consistency is guaranteed, although read failures are possible.
SERIAL Allows to read the current (including uncommitted) data, if an uncommitted transaction is found during the read process, then commit it. Lightweight transactions.
TWO Returns the latest data from the two closest replicas. Similar to ONE.
THREE Returns the latest data for the three closest replicas. Similar to TWO.

About the QUORUM level

The QUORUM level ensures that data is written to the specified quorum number of nodes. The value of a quorum is rounded up by the following formula:

(sum_of_replication_factors / 2) + 1

sum_of_replication_factors refers to the sum of all replication_factor settings for each data center.

For example, if the replication factor of a single data center is 3, the quorum value is 2 - indicating that the cluster can tolerate up to 1 node down. If the replication factor is 6, the quorum value is 4 - indicating that the cluster can tolerate up to 2 nodes down. If there are two data centers, the replication factor of each data center is 3, and the quorum value is 4 - indicating that the cluster can tolerate up to 2 nodes down. If there are 5 data centers, the replication factor of each data center is 3, and the quorum value is 8.

If you want to ensure read and write consistency, you can use the following formula:

(nodes_written + nodes_read) > replication_factor

For example, if the application uses the QUORUM level to read and write, and the replication factor value is 3, then this setting can ensure that 2 nodes will definitely be written and read. The number of read nodes plus the number of write nodes (4) is larger than the replication factor (3), so that consistency can be ensured.

docker installation

#!/bin/bash

# 安装docker自动化脚本

apt-get update
# 安装前提依赖
apt-get -y install ca-certificates curl gnupg lsb-release
# 安装GPG证书
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
# 写入软件源信息
add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
# 安装docker
apt-get -y install docker-ce docker-ce-cli containerd.io
# 安装必要工具
apt-get -y install apt-transport-https ca-certificates curl software-properties-common

# 配置用户组
groupadd docker

# 修改 /etc/docker/daemon.json (如果该文件不存在,则创建)
content='{
	"registry-mirrors": [
		"https://hub-mirror.c.163.com"
	]
}'
filedir="/etc/docker"
filename="/etc/docker/daemon.json"

if [[ ! -d $filedir  ]]; then 
	mkdir -p $filedir
fi

if [[ ! -e $filename ]]; then 
	echo "creating $filename" 
	touch $filename 
	echo $content > $filename
else
	echo "$filename has exist"
fi

cassandra deployment

It is easier to deploy with docker, which contains jave environment and python

root@xxxxxxxxx:~/cassandra# cat deploy.sh 
#!/bin/bash

# 获取本机eth0的IP地址
ip=$(ifconfig eth0 | grep 'inet ' | awk -F' ' '{print $2}')

docker run  \
	-e CASSANDRA_SEEDS=10.0.25.55 \
	-e CASSANDRA_BROADCAST_ADDRESS=$ip \
	-e CASSANDRA_DC=DC1 \
	-e CASSANDRA_RACK=RAC1 \
	-e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch \
	-p 7000:7000 -p 9042:9042 -p 7199:7199 \
	-v /var/lib/cassandra:/var/lib/cassandra \
	-v /var/log/cassandra:/var/log/cassandra \
	cassandra:4.0.4

7000 is used as the cluster communication port (port 7001 if SSL is enabled).
Port 9042 is used for client connections of the native protocol.
Port 7199 is used for JMX,
port 9160 is used for the obsolete Thrift interface

-v:  bind a volume, host file path: docker internal file path

Modify the jeager yaml file

root@iZwz9cb01zyiaqwmuvtd04Z:~# cat prod-with-kafka.yaml
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: prod
spec:
  strategy: streaming
  collector:
    options:
      kafka:
        producer:
          topic: jaeger-test
          brokers: xxxxx:9092,xxxxx:9092,xxxxx:9092
          batch-size: 5120000
          batch-linger: 1s
          batch-max-messages: 20
#           认证
#           plaintext:
#             username:
#             password:
  ingester:
    options:
      kafka:
        consumer:
          topic: jaeger-test
          brokers: xxxxx:9092,xxxx:9092,xxxxx:9092
      ingester:
        deadlockInterval: 0
  agent:
    resources:
      limits:
        cpu: "0.1"
        memory: 128Mi
      requests:
        cpu: '0.01'
        memory: 64Mi
        #  storage:
        #    type: elasticsearch
        #    options:
        #      es:
        #        server-urls: http://es-cn-xxxxxxx.elasticsearch.aliyuncs.com:9200
        #        username: elastic
        #        password: Super111$
  storage:
    type: cassandra
    options:
      cassandra:
        servers: xxxxxx
        port: 9042
        username: cassandra
        password: cassandra
        keyspace: jaeger_v1_test
    cassandraCreateSchema:
      datacenter: "DC1"
      mode: "test"

References

1. docker run -e SPAN_STORAGE_TYPE=cassandra jaegertracing/jaeger-collector:1.34 --help is used to obtain jaeger yaml configuration instructions

2.  Operator for Kubernetes — Jaeger documentation  jeager official website

Guess you like

Origin blog.csdn.net/Horsdy123/article/details/124883890