outline
Afternoon need a simple data processing, directly to hand write the script processing, but found that the efficiency is too low, too slow, it changed to a multi-process;
Program involves computing, file read and write, a lot of content in view of the calculation, use the multi-process (computationally intensive).
Code
Import PANDAS AS PD from pathlib Import the Path from concurrent.futures Import ProcessPoolExecutor parse_path = ' / DATAl / V-gazh / the CRSP / dsf_full_fields / the parse ' source_path = ' / DATAl / V-gazh / the CRSP / dsf_full_fields / 2th_split ' directory has # 3.3W a csv file, serial then greatly reduced efficiency DEF parseData (): source_path_list = List (the Path (source_path) .glob ( ' * .csv ' )) multi_process = ProcessPoolExecutor (= 20 is max_workers ) multi_results =multi_process.map (FUNC, source_path_list) DEF FUNC (P): source_p = STR (P) parse_p = STR (P) .replace ( ' 2th_split ' , ' the parse ' ) DF = pd.read_csv (source_p) DF [ ' DATE ' ] = pd.to_datetime (DF [ ' DATE ' ] .astype (STR)). dt.date df.sort_values ([ ' DATE ' ], InPlace = True) # processing close to a negative value (abs), added status identification DF [ ' is_close ' ] = DF ['PRC'].map(lambda x: 0 if x < 0 or pd.isna(x) else 1) df['PRC'] = df['PRC'].abs() df.rename(columns={'CFACPR': 'factor'}, inplace=True) df['adj_low'] = df['BIDLO'] * df['factor'] df['adj_high'] = df['ASKHI'] * df['factor'] df['adj_close'] = df['PRC'] * df['factor'] df['adj_open'] = df['OPENPRC'] * df['factor'] df['adj_volume'] = df['VOL'] / df['factor'] # calc change df['change'] = df['adj_close'].diff(1) / df['adj_close'].shift(1) # tt = pd.DataFrame({'A': [1, 2, 3, 4, 6], 'B': [4, 5, 6, 8, 1]}) df.to_csv(parse_p, index=False) parseData()