File storage

1 format()

形如str.format(), The contents of the format used in place of str  {} 和 

This function can be used when a large number of its file to be read is stored separately

>>>. 1 " {} {} " .format ( " Hello ", " World ") # is not provided to specify the location, default order 2 ' Hello World ' . 3 . 4 >>> " {{0}}. 1 " .format ( " Hello ", " World ") # set the specified position . 5 ' Hello World ' . 6 . 7 >>> " {}. 1. 1} {0} { " .format ( " Hello ", " World ") # Set the specified position . 8 ' World Hello World'

Reference: https://blog.csdn.net/zhchs2012/article/details/84328742

2 storage file is stored in the default directory of the current program is located, is also available under .//newfilewalk representation stored in the current directory newfilewalk

If there is no newfilewalk folder will complain, to create a new advance

data_xls.to_csv('.\\chengji\\化学.csv', encoding='utf-8')

If the document 3 to be read back a program previously stored in, to add index = False At this time the front () storing to_csv, this time will not store the index column, or prone to the number of columns does not correspond to the program later mistake

4 to_csv () function

Parameters: mode: When it is desired mode with later data additional writing file = 'a +', must pay attention to this header = None, or will be written to the file name of the column, the subsequent process is prone to error, the error It can only be used to find a dichotomy.

 

app_actived_train = pd.read_csv('./labelencoder_file/app_actived_train.csv', iterator=True)

pieceID = 1
loop = True
while loop:
    try:
        df = app_actived_train.get_chunk(100000)  # 10万
        df.columns = ['index', 'uid', 'appid']
        df1 = df['uid']
        df1 = pd.DataFrame(df1)
        df = df['appid'].str.split('#', expand=True).stack().reset_index(level=1, drop=True).rename('appid')
        df = {'index': df.index, 'appid': df.values}
        df = pd.DataFrame(df)
        df = pd.merge(df1, df, left_index=True, right_on='index', how='left')
        df.to_csv('./labelencoder_file/app_actived_train.csv', mode='a+', index=False, header=None)
        print(pieceID * 100000)
        pieceID += 1
        del df, df1
    except StopIteration:
        loop = False
        print('imps_log process finish!')

 

Guess you like

Origin www.cnblogs.com/xxswkl/p/10993325.html