Getting Started with DynamoDB and Some Notes

What is DynamoDB

Amazon DynamoDB distributed NoSQL database service, supports dynamic expansion and reliable performance

DynamoDB Basic Concepts

Composition of DynamoDB

  • Table: Represents a "table" used to store DynamoDB data, which is similar to a relational database
  • Items: A table can have 0 to N (N > 0) items, it is similar to a "row" in a relational database, in DynamoDB there is no limit to the number of items.
  • Attributes: Multiple attributes make up an item, which are similar to "fields" in relational databases. Tables in DynamoDB are schema-free except for the primary key, so projects can have different attributes, sizes, and data types

DynamoDB primary key

A primary key is used in DynamoDB to identify unique items, which must be specified when creating a table. DynamoDB primary key supports two types

  • A primary key consisting of a single partition key: consists of a partition key attribute, which identifies a unique item, as well as which partition the item is stored in. DynamoDB calculates the value of the partition key based on an internal hash function, and the result is used to decide which partition to store the item in.
  • Partition key + sort key: Add a sort key to a single partition key, and these two attributes form a composite primary key. Likewise, the partition key determines which partition an item is stored in, and the sort key determines the order in which items with the same partition key are sorted within the same partition.  

DynamoDB流

DynamoDB streams can capture data modification events in a table, which are written to the stream in the order in which they occurred. Streams have a life cycle, which lasts 24 hours from being counted to ending, and will be automatically deleted from the stream after expiration.

The following events will trigger the generation of the stream

  • When adding a new item to the table, at this point the flow captures the entire item i.e. all properties
  • When updating an item, at this point the stream will wave or modify properties in the item
  • When deleting an item, the stream is captured before the item is deleted

DynamoDB Secondary Index

DynamoDB provides local secondary indexes and global secondary indexes, and there are certain differences between the two.

  local secondary index  global secondary index
concept

The meaning of a local secondary index is "local", which means that the index scope of the secondary index is limited to the table partitions with the same partition key.

That is to say, the index can only be indexed in the same partition

 Global means that a query performed on an index can span all data in all partitions in the base table
consistency  Support strong consistency  Only supports eventual consistency
limit  Only 5 local secondary indexes can be created in a table  Only 5 global secondary indexes can be created in a table

 

 

 

 

 

 

DynamoDB read and write limits

When creating a table, you need to specify the read and write throughput of the table. In a production environment, if the actual throughput exceeds the throughput currently set for DynamoDB, an exception will eventually be thrown after retries (if the DynamoDB Client has set a retry mechanism).

  • Read Throughput Units: Read throughput depends on the size of the item and whether eventual consistency or strong consistency is required
    Read capacity units for eventual consistency = 2 reads of 4KB items per second
    Read capacity units for strong consistency = 1 4KB read/sec
    If an item larger than 4KB is read at a time, DynamoDB consumes additional read capacity units.
  • Write Throughput Unit: The write capacity unit depends on the size of the item being written.
    One write capacity unit = 1 item write of up to 1KB per second
    If an item larger than 1KB needs to be written, DynamoDB needs to consume an additional write capacity unit.
  • Other limitations: DynamoDB specifies an item size limit of 400KB. If the size of an item exceeds this limit, more capacity units will be consumed.

Once the DynamoDB limit is exceeded, the request will be throttled. This kind of error cannot be automatically recovered in the program, so you need to set the throughput and an item size reasonably.

DynamoDB throughput provisioning value calculation

The throughput can be specified when the table is created, and the running online business can also be dynamically adjusted through the DynamoDB background visualization interface or the provided API. The API approach is relatively flexible. Note: DynamoDB only allows 4 downward adjustments per table per day, and unlimited upward adjustments .

This preset value has a set of calculation methods. If your business read and write volume can be estimated, please refer to the following formula

  • Strong consistency read capacity calculation: rounded up (item size / 4KB) * Estimated number of reads per second For
    example: strong consistency read requirements, an item size is 3KB, expected to read 80 items per second.
    3KB / 4KB = 0.75, rounded up = 1
    1 * 80 = 80 read capacity units
  • Calculation of eventually consistent read capacity: the same as the calculation of strongly consistent read capacity, in the final result * 2
  • Write capacity calculation: round up (item size / 1KB) * Estimated number of writes per second
    For example: the item size is 512 bytes, and it is expected to write 100 items per second.
    512 bytes / 1KB = 0.5 rounded up = 1
    1 * 100 = 100 write capacity units

DynamoDB Partitioning

Calculation of the initial number of partitions

During data storage, DynamoDB divides table items into multiple partitions, which are backed by SSDs. The distribution of data is mainly determined according to the partition value.

The DynamoDB service is solely responsible for partition management, including the number of starting table partitions and partition splitting.

First of all, we need to know that a partition in DynamoDB can store about 10GB of data, and supports up to 3000 read requests per second and 1000 write requests per second.

When a table is created, the number of table partitions is initialized according to the preset read and write throughput. The calculation formula is as follows:

Number of initialized partitions = round up ( (read preset throughput / 3000) + (write preset throughput / 1000) )

For example: read preset 5000, write preset 2000, then apply the formula (5000 / 3000) + (2000 / 1000) = 3.6667 After rounding up = 4 partitions

Then each partition can support 5000 / 4 = 1250 reads and (2000 / 4) = 500 writes

partition split

DynamoDB can be dynamically expanded, and throughput can be guaranteed. But these are based on its partition design.

Partition split method

Since DynamoDB is solely responsible for partition management, the timing and method of partition splitting are determined by DynamoDB. DynadmoDB will automatically split existing partitions when necessary to provide more partitions to support throughput. We first understand how the split

  1. In step 1 in the figure, DynamoDB will allocate two new partitions
  2. In step 2 in the figure, DynamoDB evenly distributes the original partition data to the new partition
  3. In step 3 in the figure, DynamoDB no longer allocates data to the original partition

Partition split trigger condition

As mentioned above, "A partition in DynamoDB can hold about 10GB of data and supports up to 3000 read requests per second and 1000 write requests per second"

Then the partition trigger conditions are also triggered around the actual storage volume and read and write of the partition.

  • Increased provisioned throughput: If the current partitioned table cannot meet the new provisioned throughput, DynamoDB will double the current number of partitions

    What the figure shows is that initially the table is allocated 4 partitions (read preset 5000, write preset 2000, then apply the formula (5000 / 3000) + (2000 / 1000) = 3.6667 After rounding up = 4 partitions Each partition has
    1250 read units and 500 write units. At this time, when the read capacity is adjusted from 5000 to 8000, 4 partitions cannot be satisfied, and DynamoDB will double the number of partitions 4 * 2 = 8 partitions, so each partition sauces 1000 read units and 250 write units

  • Increased storage requirements: If the data in a partition exceeds the limit of 10G, DynamoDB will split the partition into two, and the data will be evenly distributed between the new two partitions. There are many reasons for a partition exceeding the limit of 10G. The most common one is that the partition key hash is not enough, causing data to be shifted to a certain partition.

    As shown in the figure, after the red box partition is filled with data, DynamoDB will split this partition into two new partitions. Before the split, the total capacity is limited to 8 partitions * 10GB capacity = 80GB, and the total capacity after removal is limited to 9 partitions* 10GB capacity = 90GB
    Note: The two partitions split from the partition will only share the read and write throughput of the original partition. For example:
    5000 reads and 2000 writes, DynamoDB creates 4 partitions at this time, and the read and write capacity of each partition is
    5000 / 4 = 1250 Read capacity
    2000 / 4 = 500 Write capacity
    Assuming that one of the partitions is about to be full of 10G, DynamoDB This partition will be split into two partitions. At this time, the table has a total of 5 partitions, and their write capacity is
    3. The read and write capacity of the three partitions is still 1250 read capacity and 500 write capacity. partition reads and writes are 1250/2 = 625 reads and 500/2 = 250 writes

 

Reference: https://docs.aws.amazon.com/zh_cn/amazondynamodb/latest/developerguide/Introduction.html

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324635881&siteId=291194637