Utilize DynamoDB and S3 combined with gzip compression to maximize player data storage

foreword

In some traditional game architectures, MySQL is used to store player archive data, and sub-databases and sub-tables are used to disperse the storage and performance pressure of a single database and single table, so as to support more players. As the amount of data grows, the varchar type in the data table can no longer meet the storage requirements of a single field in the game, and the application of the blob field has the lowest transformation cost for this architecture. Therefore, some games begin to be initially designed. The structure uses the Blob field as the storage of the player's game tasks, props and other data.

There is a bug in the Blob field in MySQL 5.6 / 5.7 ( MySQL Bugs: #96466 ). This bug may cause the database cluster to crash and cause data loss. Even in MySQL 8.0, due to the design limitation of the engine itself, when a single table exceeds 20GB, frequent updates will lead to limited performance of the database. And as the table grows, the performance problem will become more and more obvious.

As the game business explodes and grows, traditional relational databases need to undergo application transformation when sub-databases and tables are divided, and there is a certain amount of downtime for maintenance. Moreover, after these expansions are completed, shrinking during the sunset period of the game also requires application transformation, which undoubtedly causes a lot of extra workload for the business development and basic operation and maintenance departments.

DynamoDB is very applicable to this scenario. At any stage of business development, expansion with zero downtime and automatic scaling can be realized. And all this is completely transparent to the application layer. At the same time, in daily operation and maintenance, it can also be dynamically expanded and contracted according to the business load, thereby further reducing costs.

The Amazon cloud technology developer community provides developers with global development technology resources. There are technical documents, development cases, technical columns, training videos, activities and competitions, etc. Help Chinese developers connect with the world's most cutting-edge technologies, ideas, and projects, and recommend outstanding Chinese developers or technologies to the global cloud community. If you haven't paid attention/favorite yet, please don't rush over when you see this, click here to make it your technical treasure house!

overview

This article mainly describes how to store as much data as possible under the limit and maximize the utilization space of extended storage when the storage capacity exceeds the limit according to the limit of DynamoDB (each item must be less than 400KB) in the game scene. It focuses on how to use DynamoDB+S3 to save the large amount of data attributes in the player's archive, so as to avoid the situation that the old archive of S3 is read when the data is written to S3 after the data is stored on S3. At the same time, use gzip compression to reduce data size, reduce IO overhead and improve performance.

architecture diagram

Combat coding

Target

  1. All data are compressed with gzip before saving, and decompressed with gzip after reading.
  2. S3 storage and DynamoDB's binary field storage can be adaptive. If the compressed user data is greater than the specified value, it will be written to S3, otherwise it will be directly saved to the field in the current database item.
  3. When the DynamoDB project is read, parse the decompressed fields, if the string starts with s3://, continue to get data from S3
  4. Set the S3 read lock field to determine whether the current state is being written to S3 to block the reading process. Before each item needs to be written to S3, read_lock is set to True, and after S3 is successfully written, it is set to False. After reading the record, whether read_lock is True, if it is judged to be blocked, the process will wait for a period of time and then retry until the number of retries exceeds the specified value. After the retry times out, the reading process will think that the writing process may never succeed in writing for some reason, so it will set read_lock to False.

Step 1: Initialize environment parameters

from time import sleep
import boto3
import gzip
import random
import json
import hashlib
import logging

# 写入 S3 的门槛,超过这个值数据会写入 S3,否则保存在数据库内,默认值 350KB
UPLOAD_TO_S3_THRESHOLD_BYTES = 358400
# 用户数据库保存的目标S3存储桶
USER_DATA_BUCKET = 'linyesh-user-data'
# 遇到 S3 有读锁,重新请求最大次数,超出次数限制锁会被自动清除
S3_READ_LOCK_RETRY_TIMES = 10
# 遇到 S3 有读锁,读请求重试间隔时间
S3_READ_RETRY_INTERVAL = 0.2

dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

Parameter Description

  • UPLOAD_TO_S3_THRESHOLD_BYTES: The maximum data storage length limit for the field. The unit is: number of bytes. Due to DynamoDB, the data size of an item (Item) is limited to 400KB. In addition to the largest field in the data archive, we must also reserve some space for other fields to prevent the entire Item from exceeding 400KB.
  • USER_DATA_BUCKET: S3 is used to store large field data of players exceeding 400KB. It needs to be built in advance. For specific steps, please refer to: Create a bucket
  • S3_READ_LOCK_RETRY_TIMES: Limits the number of read request retries when the player's archive on S3 is in the writing state. When the item is in the read lock state, the reader process will wait for a period of time and then try again.
  • S3_READ_RETRY_INTERVAL: Interval time between read retries in read lock state, unit: second.

Note: S3_READ_LOCK_RETRY_TIMES乘以S3_READ_RETRY_INTERVAL In theory, the time must be less than the maximum upload time of the S3 archive, so the actual use of the code in this article should adjust these two parameters according to the possible size of the archive. Otherwise, there may be a high probability that dirty reads will occur in the archive.

Step 2: Create a DynamoDB table

def create_tables():
    """
    创建表
    :return:
    """
    response = dynamodb.create_table(
        TableName='players',
        KeySchema=[
            {
                'AttributeName': 'username',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'username',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )

    # Wait until the table exists.
    response.wait_until_exists()

    # Print out some data about the table.
    logger.debug(response.item_count)

Step 3: Write auxiliary logic

Exponential backoff function

def run_with_backoff(function, retries=5, **function_parameters):
    base_backoff = 0.1  # base 100ms backoff
    max_backoff = 10  # sleep for maximum 10 seconds
    tries = 0
    while True:
        try:
            return function(function_parameters)
        except (ConnectionError, TimeoutError):
            if tries >= retries:
                raise
            backoff = min(max_backoff, base_backoff * (pow(2, tries) + random.random()))
            logger.debug(f"sleeping for {backoff:.2f}s")
            sleep(backoff)
            tries += 1

S3 path judgment function

def is_s3_path(content):
    return content.startswith('s3://')

S3 file fetching

def get_s3_object(key):
    response = s3.get_object(Bucket=USER_DATA_BUCKET, Key=s3_key_generator(key))
    return response['Body']

check size exceeded

def check_threshold(current_size):
     return current_size > UPLOAD_TO_S3_THRESHOLD_BYTES

S3 Key Generation Function

This function can randomly assign the player's archives to different Prefixes under the S3 bucket, which is beneficial to improve the performance of IO in S3.

def s3_key_generator(key):  
    s3_prefix = hashlib.md5((key).encode('utf-8')).hexdigest()[:8]  
    return s3_prefix + '/' + key 

File upload to S3

def upload_content_to_s3(obj_param):  
    s3_key = s3_key_generator(obj_param['key'])  
    try:  
        response = s3.put_object(  
            Body=obj_param['content_bytes'],  
            Bucket=USER_DATA_BUCKET,  
            Key=s3_key)  
        return "s3://%s/%s" % (USER_DATA_BUCKET, s3_key)  
    except Exception as e:  
        logger.error(e)  
        raise e  

Step 4: Write the main logic

Write a single item to a DynamoDB database

def put_item(load_data):  
    gzip_data = gzip.compress(load_data)  # 压缩数据  
    logger.debug('压缩后大小%.2fKB,原始大小 %.2fKB,压缩率 %.2f%%' % (  
        len(gzip_data) / 1024.0,  
        len(load_data) / 1024.0,  
        100.0 * len(gzip_data) / len(load_data)))  
  
    table = dynamodb.Table('players')  
    player_username = 'player' + str(random.randint(1, 1000))  
    if check_threshold(len(gzip_data)):  
        try:  
            # 读锁保护  
            table.update_item(  
                Key={  
                    'username': player_username,  
                },  
                UpdateExpression="set read_lock = :read_lock",  
                ExpressionAttributeValues={  
                    ':read_lock': True,  
                },  
            )  
  
            # 写入数据到 S3  
            s3_path = run_with_backoff(upload_content_to_s3, key=player_username, content_bytes=gzip_data)  
            # 解除读锁保护,同时存储数据在 S3 上到路径  
            response = table.put_item(  
                Item={  
                    'username': player_username,  
                    'read_lock': False,  
                    'inventory': gzip.compress(s3_path.encode(encoding='utf-8', errors='strict')),  
                }  
            )  
            logger.debug('成功上传大纪录到S3,路径:%s' % s3_path)  
        except Exception as e:  
            logger.debug('存档失败')  
            logger.error(e)  
    else:  
        response = table.put_item(  
            Item={  
                'username': player_username,  
                'inventory': gzip_data,  
            }  
        )  
        logger.debug('成功上传纪录, username=%s' % player_username) 

Read a player record in the database

def get_player_profile(uid):  
    """ 
    读取记录 
    :param uid: 玩家 id 
    :return: 
    """  
    table = dynamodb.Table('players')  
    player_name = 'player' + str(uid)  
  
    retry_count = 0  
    while True:  
        response = table.get_item(  
            Key={  
                'username': player_name,  
            }  
        )  
  
        if 'Item' not in response:  
            logger.error('Not Found')  
            return {}  
  
        item = response['Item']  
        # 检查读锁信息, 如果存在锁根据参数设置,间隔一段时间重新读取记录  
        if 'read_lock' in item and item['read_lock']:  
            retry_count += 1  
            logger.info('当前第%d次重试' % retry_count)  
            # 如果超时无法读取记录,则消除读锁,并重新读取记录  
            if retry_count < S3_READ_LOCK_RETRY_TIMES:  
                sleep(S3_READ_RETRY_INTERVAL)  
                continue  
            else:  
                table.update_item(  
                    Key={  
                        'username': player_name,  
                    },  
                    UpdateExpression="set read_lock = :read_lock",  
                    ExpressionAttributeValues={  
                        ':read_lock': False,  
                    },  
                )  
  
        inventory_bin = gzip.decompress(item['inventory'].value)  # 解压缩数据  
        inventory_str = inventory_bin.decode("utf-8")  
        if is_s3_path(inventory_str):  
            player_data = gzip.decompress(get_s3_object(player_name).read())  
            inventory_json = json.loads(player_data)  
        else:  
            inventory_json = json.loads(inventory_str)  
  
        user_profile = {**response['Item'], **{'inventory': inventory_json}}  
        return user_profile  

Finally, write the test logic

Prepare several json files of different sizes and observe the changes written to the database.

if __name__ == '__main__':  
    path_example = 'small.json'  
    # path_example = '500kb.json'  
    # path_example = '2MB.json'  
    with open(path_example, 'r') as load_f:  
        load_str = json.dumps(json.load(load_f))  
        test_data = load_str.encode(encoding='utf-8', errors='strict')  
    put_item(test_data)  
  
    # player_profile = get_player_profile(238)  
    # logger.info(player_profile)  

If you need to test the read lock, you can manually set the read_lock of a single item in the database to True, and then observe the changes in the read logic during the process.

Summarize

In this test, it is found that the data in json format has a compression rate of about 25% after using gzip. In theory, we can store a data item with a maximum size of about 1.6MB in a single item (item). Even if there is a small amount of data that exceeds 400KB after compression, it can be stored on S3, and only the path of metadata and large field data on S3 is stored in DynamoDB.

gzip will bring some additional computing and IO overhead, but these overheads will mainly fall on the game server, and reduce the IO overhead for the database.

In most scenarios, player data rarely exceeds 400KB even without compression. In this case, it is recommended to try to compare the performance data of the two scenarios with and without compression enabled. to decide which one is more suitable for your game.

limit

For games with single-user and high-concurrency archiving requirements, the above design does not include the scenario of concurrent writing after the data is stored on S3. If there is a need for this scenario, some application logic or architecture adjustments are required.

The author of this article

forestry

Amazon solution architect, responsible for consulting and architectural design of Amazon-based cloud computing solutions. With more than 14 years of research and development experience, he has created tens of millions of user APPs, and has contributed to many Github open source projects. He has rich practical experience in many fields such as games, IOT, smart cities, automobiles, and e-commerce.

Article source: https://dev.amazoncloud.cn/column/article/630a281576658473a321ffeb?sc_medium=regulartraffic&sc_campaign=crossplatform&sc_channel=CSDN 

Guess you like

Origin blog.csdn.net/u012365585/article/details/130813038