pymysql reading large data memory stuck solutions

Background: Currently, only 5G (the latter continues to grow) in the table, but one of the fields (hereinafter referred to detail fields) saved 2M (not necessarily 2M, part of 0, the average is down 2M), the field is stored in an array, json memory array of N data. The fields are as follows:

[{"A": "A", "B": "B", "C": "C", "D": "D"}...]

If split the table, it may be a good number of demolition, if the multi-line memory according to Alibaba "Java Development Manual" put forward a single table rows over five million lines, nor is it recommended. Hope can advise what big brothers.
Back to the topic, a start is stored in two tables, a table stored basic information (A table), a table (B in Table) associated field memory, and detail fields. Seemingly did not dim, now you want to request two tables together for BI to deal with. A copy of the table goes directly underlying field, the field is updated by traversing B according to the association table. However, when the data in the select memory read too much direct stuck (dog's head). So look for how to deal with big data by pymysql online. Solutions are as follows:

1. The sub-limit read data batch operation:

import pymysql

up_db = pymysql.connections.Connection(host=MYSQL_HOST,
                               port=MYSQL_PORT,
                               user=MYSQL_USER,
                               password=MYSQL_PASSWORD,
                               db=MYSQL_DB,
                               charset='utf8mb4',)
                               

count = 0
while True:
    # if count == 2:
    #     break

    select_sql = "select sec_report_id,detail from sec_report_original_data_detail limit %s,2"%(count)
    up_cursor = up_db.cursor()
    up_cursor.execute(select_sql)
    result = up_cursor.fetchall()
    for data in result:
        sec_report_id = data[0]
        detail = data[1]
        update_sql = "update `sec_report_original_data_intact` set detail = '%s' where `sec_report_id` = '%s' " % (
        db.escape_string(detail), sec_report_id)
        print(update_sql)
        res = up_cursor.execute(update_sql)
        if res:
            print(res)
            up_db.commit()
            print(f'{sec_report_id}插入成功')

    count+=2

You can solve the problem, but only a few took to do the test (I use the second), where I did not write termination conditions, to use the words of a friend add their own.

2. pymysql is SSCursornot cached cursor

pymysql.cursors.SSCursorInstead of the default cursorwill be a reading from a database record, so as not to cause memory card dead, but there are caveats:

  • The cursor object can only be read after all the rows in order to deal with other sql. If you need to execute in parallel sql, you need to regenerate a connection
  • Read all lines must be disposable after each reading process data faster, not more than 60s, otherwise mysql will disconnect the connection (not encountered this problem, encountered can discuss)
import pymysql

db = pymysql.connections.Connection(host=MYSQL_HOST,
                               port=MYSQL_PORT,
                               user=MYSQL_USER,
                               password=MYSQL_PASSWORD,
                               db=MYSQL_DB,
                               charset='utf8mb4',
                               cursorclass=pymysql.cursors.SSDictCursor)

up_db = pymysql.connections.Connection(host=MYSQL_HOST,
                               port=MYSQL_PORT,
                               user=MYSQL_USER,
                               password=MYSQL_PASSWORD,
                               db=MYSQL_DB,
                               charset='utf8mb4',)
                               
up_cursor = up_db.cursor()
cursor = pymysql.cursors.SSCursor(db)
select_sql = "select sec_report_id,detail from sec_report_original_data_detail"
cursor.execute(select_sql)
result = cursor.fetchone()

try:
    while result is not None:
        sec_report_id = result[0]
        detail = result[1]
        update_sql = "update `sec_report_original_data_intact` set detail = '%s' where `sec_report_id` = '%s'"%(db.escape_string(detail),sec_report_id)
        res = up_cursor.execute(update_sql)

        if res:
            print(res)
            up_db.commit()
            print(f'{sec_report_id}插入成功')
        result = cursor.fetchone()
except Exception as e:
    print(e)
finally:
    up_cursor.close()
    cursor.close()
    db.close()

Solve the one-time big data reading methods, but did not find that particularly good storage detailin the data field approach, there are friends to understand can communicate about.

Guess you like

Origin www.cnblogs.com/mangM/p/11899498.html