python more complicated process demo

outline

Afternoon need a simple data processing, directly to hand write the script processing, but found that the efficiency is too low, too slow, it changed to a multi-process;

Program involves computing, file read and write, a lot of content in view of the calculation, use the multi-process (computationally intensive).

Code

Import PANDAS AS PD
 from pathlib Import the Path
 from concurrent.futures Import ProcessPoolExecutor 

parse_path = ' / DATAl / V-gazh / the CRSP / dsf_full_fields / the parse ' 
source_path = ' / DATAl / V-gazh / the CRSP / dsf_full_fields / 2th_split ' directory has # 3.3W a csv file, serial then greatly reduced efficiency 


DEF parseData (): 
    source_path_list = List (the Path (source_path) .glob ( ' * .csv ' )) 
    multi_process = ProcessPoolExecutor (= 20 is max_workers ) 
    multi_results =multi_process.map (FUNC, source_path_list) 


DEF FUNC (P): 
    source_p = STR (P) 
    parse_p = STR (P) .replace ( ' 2th_split ' , ' the parse ' ) 
    DF = pd.read_csv (source_p) 
    DF [ ' DATE ' ] = pd.to_datetime (DF [ ' DATE ' ] .astype (STR)). dt.date 
    df.sort_values ([ ' DATE ' ], InPlace = True)
     # processing close to a negative value (abs), added status identification 
    DF [ ' is_close ' ] = DF ['PRC'].map(lambda x: 0 if x < 0 or pd.isna(x) else 1)
    df['PRC'] = df['PRC'].abs()
    df.rename(columns={'CFACPR': 'factor'}, inplace=True)
    df['adj_low'] = df['BIDLO'] * df['factor']
    df['adj_high'] = df['ASKHI'] * df['factor']
    df['adj_close'] = df['PRC'] * df['factor']
    df['adj_open'] = df['OPENPRC'] * df['factor']
    df['adj_volume'] = df['VOL'] / df['factor']
    # calc change
    df['change'] = df['adj_close'].diff(1) / df['adj_close'].shift(1)
    # tt = pd.DataFrame({'A': [1, 2, 3, 4, 6], 'B': [4, 5, 6, 8, 1]})
    df.to_csv(parse_p, index=False)


parseData()

 

Guess you like

Origin www.cnblogs.com/bigtreei/p/12011435.html