Use pymysql Mysql Python package that interact - the connection, database, build tables, insert data

Use pymysql Mysql Python package that interact - the connection, database, build tables, insert data

☆☆ of handwritten documents are, most of the source code are already labeled.
'Gifts of roses, hand left lingering fragrance', if this document helpful to you, please point to the author a little red hearts are extremely grateful.

0 background and configuration

0.1 Background

 On exposure to a project, since Python can use the library a lot, high efficiency and other advantages. So, in this project to use Python and Mysql made the following interaction:

  1. From reptiles, collect statistical data channel access to documents and other items to be stored in the original database, you can use the interactive Python and XML data stored in EXCEL or other forms of fixed data into XML format and written by JAVA backend platform to XML file format of the data to the Mysql.
  2. Mysql read from the original data, the data needs to be processed, cleaned, and then directly write Mysql, which relates to the connection processing library , building a database , build tables , write data operation.

 The author has the use of Python written in a fixed format of the data XML achieved. Readers according to their own projects or different needs in different formats to create their own XML.

 Now, the author for the second point, to start using Python SQL statements to interact with MYSQL.

0.1 Environment

Native environment:

  • System: win 9
  • Python version: python3.6.4
  • MySQL Version: 5.7.23
  • Navicat Premium Version: 12.0

Third-party packages:

  • pymysql

installation method:

pip install pymysql

1 Mysql interact with

1.1 connection, get a cursor

 Mysql according to its own rules, and then write, when exported, are required to first get a connection and execute cursors database. Can be connected based on IP, password, database name:

conn = pymysql.connect(host='', user='', password='', charset='')
cursor = conn.cursor()

 In practice, you may want to connect to multiple databases, it is possible to host \ user \ psd, etc. as dictionary:

def get_sql_conn(DB):
    """
    通过字典的方式连接,有一点好处,就是需要连接多个库、表的时候,可以创建多个字典,好区分.
    :return: 返回mysql数据库的连接与游标
    """
    try:
        conn = pymysql.connect(host=DB["host"], user=DB["user"], password=DB["password"], db=DB["db"],
                               charset=DB["charset"])
        cursor = conn.cursor()
        print("连接Mysql成功.")
    except Exception as e:
        conn = pymysql.connect(host=DB["host"], user=DB["user"], password=DB["password"], charset=DB["charset"])
        cursor = conn.cursor()
        print("连接Mysql失败:", e)
    return conn, cursor

1.2 Create a library

$ Emsp; create your own library in Mysql, first we need to understand, building a database SQL statement:

sql = 'CREATE DATABASE %s' % (dbname)

SQL command to convey Mysql, you need to connect an IP database and execute cursors were:

cursor.execute(sql)
# conn.commit()

Of course, you created this library may already exist, and this time choose to use or create a new data processing library:

def create_sql_db(self, conn, cursor, dbname):
    """
    :param conn: 连接符
    :param cursor: 游标
    :param dbname: 需要创建的库名,str
    :return: 打印创建成果
    """
    try:
        sql = 'CREATE DATABASE %s' % (dbname)
        cursor.execute(sql)
        print('创建库:\t{},成功.'.format(dbname))
        sql = 'USE %s' % (dbname)
        cursor.execute(sql)
        print('使用当前库:\t{}.'.format(dbname))
    except Exception as e:
        print("库:\t'{}'已经存在.\t{}".format(dbname, e))
        sql = 'USE %s' % (dbname)
        cursor.execute(sql)
        print('使用当前库:\t{}.'.format(dbname))
    finally:
        # conn.commit()
        print('\n\n')

1.3 Create a table

 The same basic idea and creating tables create a library of ideas, the foundation to build the table SQL statement:

sql = 'CREATE TABLE tablename(
(colname1 INT(num) ... ,
 colname2 VARCHAR(num) ... ,
 colname3 FLOAT(num) ... ,
 colname4 DATETIME(0) ... ,
) 

Note: The characters in length datetime, can only be 0.
 Execution and default:

  cursor.execute(sql)
# conn.commit()

 But to build the table than building a database of more than troublesome, because it involves a description of the column. For example, if you have a 5-dimensional data 10w strip, if you want to automate construction of the table, then you need:

  1. Gets the name of this 5
  2. From the first column, the type of acquisition, data up to the maximum length of strip 10w (storage redundancy reduction)
  3. Combination Column name Type + Length +

So, we look at how to find the most type and maximum length:

  • Up type,
     may be counted by its own internal isinstance (i, type). For example, the type of data counts Int:

    count_int = 0
    for i in list:
    if isinstance(i, int):
    count_int += 1

  • Maximum length:
     The UTF-8 encoding, '\ u4e00' <= between s <= '\ u9fff' in Chinese characters, Chinese characters occupy three bytes, one byte English character, other characters are one byte to determine the length of the string:

    if '\ u4e00' <= s <= '\ u9fff': # utf-8 encoded Chinese section
    count_zh + = 1

Specifically, look for a large amount of data list (10w-level) data type and the presence of up to a maximum length, see download all the full code.

Construction of the table complete code:

    def create_sql_tb(conn, cursor, tablename, dataframe, col_limit=''):
    """
    :param conn: 连接符
    :param cursor: 游标
    :param tablename: 创建的表名
    :param dataframe: 按照数据框的情况,创建列名、列的长度
    :param col_limit: 对列的其他限制
    :return: 创建情况
    """


    list_Type, list_Len = find_dfm_cols_most(dataframe)
    print(list_Type, list_Len)
    list_colname = list(dataframe.columns)  # 新表的列名为数据框的列名
    list_col_desc = []  # 列描述,sql语言
    for i in range(len(list_colname)):
        print('正创建列:\t{} '.format(list_colname[i]))
        # col_limit = input('Please enter the limits of the column:\t')
        col_desc = list_colname[i] + ' ' + list_Type[i] + '(' + str(list_Len[i]) + ')' + col_limit
        list_col_desc.append(col_desc)
    sql = 'CREATE TABLE %s' % (tablename) + '(' + ','.join(list_col_desc) + ')'  # 用于执行的sql语句
    sql = sql.replace('datetime(0)', 'datetime')
    try:
        # print(sql)
        cursor.execute(sql)
    except Exception as e:
        print('表名:{}已存在.'.format(e))
        select = input('选择是否覆盖建立该表(Y/N):')
        if select == 'Y' or 'y':
            sql1 = 'drop table if exists {}'.format(tablename)  # 删除表
            cursor.execute(sql1)
            sql = 'CREATE TABLE %s' % (tablename) + '(' + ','.join(list_col_desc) + ')'
            sql = sql.replace('datetime(0)', 'datetime')
            cursor.execute(sql)
            print('Rebuild table:\t{}'.format(tablename))
        elif select == 'N'or'n':
            print('Opt out.')
    finally:
        conn.commit()

1.4 Insert Data

 As previously building a database, build tables, insert data also need to connect and execute a cursor Mysql database. Insert the SQL statement:

INSERT INTO activity_rank_logs ('colname1','colname2',...) VALUES (58, 'str1', ...), 
(69, 'str2', ...), 
.... 
(85, 'strn', ...); # 插入多行语句

 Note: You can use '' .jion (list), the statement string configured:

list_columns = list(dataframe.columns)  # 列名
insertSql = "INSERT INTO " + tablename + " (`" + "`,`".join(list_columns) + "`) VALUES "  # 执行SQL语句的前半句

 According to the test, as the time for maximum efficiency 5000 is inserted, it is judged if the data is greater than 5000, press 5000 for each insertion, but if the data is less than 5000, insert only once:

def save_sql_data(conn, cursor, tablename, dataframe):
    """
    5000条存储一次,加快存储速率
    insert into tb (cols) values (),(),...()
    :param conn: 连接符
    :param cursor: 游标
    :param tablename: 存入的表名
    :param dataframe: 数据框
    :return: 存储状态
    """
    list_columns = list(dataframe.columns)  # 列名
    insertSql = "INSERT INTO " + tablename + " (`" + "`,`".join(list_columns) + "`) VALUES "  # 执行sql前半句
    valueStrings = []  # 值字符串
    total_len = len(dataframe.values)  # 取出是一个array,[[r1],[r2],[r3]...[rn]].n为多少列,取n.
    count_extra = 0  # 余下的数据
    count_insert = 5000  # 一次导入多少数据
    count_mod = total_len % count_insert  # 取余  5 % 3 = 2 ,还剩余多少数据
    count_div = total_len // count_insert  # 取模 5 // 3 = 1 ,5000的次数
    print('数据共计: ', total_len)
    if total_len > count_insert:  # 1.数据框大5000列
        for row in dataframe.values:  # 每行
            values = list(row)  # 转列表
            values = map(str, values)  # 转列表中元素为字符
            valueString = "('" + "','".join(values) + "')"  # 每行用jion方法,凑成(e1,e2...er)
            valueStrings.append(valueString)  # valueStrings最大5000
            if len(valueStrings) == count_insert:
                insertSql_copy = insertSql
                insertSql_copy += ",".join(valueStrings)  # 塑造成存5000次的执行sql
                valueStrings = []  # 值字符串清0,继续存
                count_extra += 1  #
                try:
                    # print(insertSql_copy)
                    cursor.execute(insertSql_copy)  # 执行
                    print('Insert success!')  # 插入一次成功,12w的数据,24次,1s. 1w2的数据,三次,0.08s。
                except Exception as e:
                    print(e, '\n')  # 执行错误
            elif count_extra == count_div and len(valueStrings) == count_mod:  # 判断是剩下的数据,不到5000也执行
                insertSql_copy = insertSql
                insertSql_copy += ",".join(valueStrings)
                try:
                    # print(insertSql_copy)
                    cursor.execute(insertSql_copy)  # 执行
                    print('Insert success!')  # 插入一次成功,12w的数据,24次,1s. 1w2的数据,三次,0.08s。
                except Exception as e:
                    print(e, '\n')  # 执行错误
    elif total_len <= count_insert:  # 2.数据框小于5000行
        for row in dataframe.values:
            values = list(row)
            values = map(str, values)
            valueString = "('" + "','".join(values) + "')"
            valueStrings.append(valueString)
        insertSql_copy = insertSql
        insertSql_copy += ",".join(valueStrings)
        try:
            print(insertSql_copy)
            cursor.execute(insertSql_copy)
            print('Insert success!')
        except Exception as e:
            print(e, '\n')
    conn.commit()
    print('插入次数:', count_extra, '截断', len(valueStrings))

Guess you like

Origin blog.csdn.net/qq_40260867/article/details/86440685