Incorrect String Value for encoded characters - python insert in to my sql

personalt :

I am parsing out a report form amazon, splits up lines into fields and then creates a mysql upload. Data I believe is originally iso-8859-1. Data uploads fine to mysql unless it has some special characters in it like an Ä or ®. If that happens I get an error like pymysql.err.InternalError: (1366, "Incorrect string value: '\\xAE Kids...' for column 'item-name' at row 74") & TypeError: can only concatenate str (not "bytearray") to str. I can hack around it by doing a replacing the bytes but I dont want to build a giant list plus I really want to store the proper values. I tried changing my mysql character sets and collations but that didnt seem to fix. I feel like the fix is a simple fix but i am already been trying things for a few hours.

report_as_dict = report.parsed
report_as_dict = report_as_dict.replace(b' \r\n', b'\r\n')  # remove black space at end

 multi_line_rebuild=list()
    for line in line_split[1:]:
        field_split = line.split(b'\t')
        logger.debug('Field Split : %s', field_split)
        field_split = [x.replace(b'\x92', b'') for x in field_split]  # removes single quotes
        field_split = [x.replace(b'\xA0', b'') for x in field_split]  # removes (
        field_split = [x.replace(b'\xAE', b'') for x in field_split]  # removes @
        field_split = [x.replace(b'\xCD', b'l') for x in field_split]  # replaces l with ' with l
        field_split = [x.replace(b'\xE4', b'a') for x in field_split]  # replaces a with two dots with a

        multi_line_rebuild.append(field_split)

 ....


 run_query_with_warnings(query_string, field_split=multi_line_rebuild)

Function

def run_query_with_warnings(warn_type, query_string, **kargs):

db = MySQLdb.connect(host=cred.host, user=cred.user, password=cred.password, db=cred.db, port=cred.port)
cursor = db.cursor()
cursor.executemany(query_string, kargs['field_split'])
snakecharmerb :

The code is trying to write text encoded as ISO-8859-1 to tables set to expect UTF-8.

There are two solutions:

  • Set the charset argument on the connection to latin1 (this is the same as ISO-8859-1) and let the connection handle re-encoding the bytes to UTF-8

    db = MySQLdb.connect(host=cred.host, user=cred.user, password=cred.password, 
                         db=cred.db, port=cred.port, charset='latin1')
    
  • decode the encoded bytes to str and let the connection perform the encoding.

    report_as_dict = report.parsed.decode('ISO-8859-1')
    

If the code is doing nothing apart from writing the bytes directly to the database then the first option is fine; if the bytes are undergoing further manipulations then decoding to str will keep things simple.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=17894&siteId=1