Cassandra data modeling is the most important thing: the primary key

Cassandra data modeling is the most important thing to know: the primary key

Using data modeling relationships, you can start from the primary key, but the RDBMS data model more effective with regard to foreign key relationships between tables and relationships constraints. Because Cassandra can not use the JOIN, so creating a data model complexity is much lower. The complexity of the Apache Cassandra compromise that advance understanding of your query and data access patterns.

1. Simple primary key:

Examples: student_id is the primary key of the person

create table person (student_id int primary key, fname text, lname text, 
                     dateofbirth timestamp, email text, phone text );

2. Composite key

  • C1: Only one primary key partition key, no clustering key.
  • (C1, C2): partitioning key columns C1, C2 is a column of the cluster key.
  • (C1, C2, C3, ...): partitioning key columns C1, column C2, C3, and the like cluster key.
  • (C1, (C2, C3, ...)): same as 3, i.e. partitioning key column C1, C2, C3 ... constituting a cluster key column.
  • (((C1, C2, ...), (C3, C4, ...))): column C1, C2 as the partition key, columns C3, C4, ... for the cluster key.

It is important to note that, when the composite is the key C1, C2, when C3, C1 is the first key partition key, the rest of the key as part of the key cluster. In order to produce a composite partition key, we must specify the key parentheses, for example: ((C1, C2), C3, C4). In this case, C1, and C2 is the key part of the partition, while C3 and C4 are part of a cluster bond.

1. partitioning key

Object partition key is stored in the identification group of the row partition or node set. Cluster from reading or writing data, a function called a hash value calculated Partitioner partition key. The hash value is used to determine the node containing the line / partition.

For example, the line between the partition key range of 1000 to 1234 may reside in the node A, and the line between the partition key range of 1235 to 2000 may reside in the Node B, as shown in FIG. Value is 1233, it is stored in the node A.

2. Cluster key

Purpose is to store the cluster key line data in sorted order. Sort the data based on columns that contained in the cluster key. This arrangement allows using a clustering key to retrieve the data becomes effective.

Example 1

CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
分区userid,集群键排序方式:added_date DESC, videoid ASC


SELECT * FROM user_videos WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 LIMIT 10;

Example 2

create table marks(stuid int,exam_date timestamp,marks float, exam_name text, 
                   primary key (stuid,exam_date));

Partition stuid, exam_date default ascending sort
SELECT * FROM user_videos WHERE userid = 522b1fe2-2e36-4cef -a667-cd4237d08b89 LIMIT 10;
the query to the query is "the last 10 users to upload video" and simply add the ORDER BY clause CLUSTERING It can achieve very fast, useful and efficient query.

This may seem like a pre-optimized, but this feature is enabled use cases add very compelling.

in conclusion

Apache Cassandra complexity trade off that advance understanding of your query and data access patterns. (A reflection of an anti-pattern)

Reference article

https://dzone.com/articles/cassandra-data-modeling-primary-clustering-partiti
https://www.datastax.com/blog/2016/02/most-important-thing-know-cassandra-data-modeling-primary-key

Guess you like

Origin www.cnblogs.com/victor2302/p/12173424.html