K line data extraction
According to the original data set format, generate a new table as required:
1. The first, last, maximum and minimum values of close data per minute,
2. The growth of vol data per minute (the last data of vol per minute minus the first data)
3. Summarize this information to generate a new table
(field names: ['time','open','close','high','low','vol'])
import pandas as pd
import time
start=time.time()
df=pd.read_csv(‘data.csv‘)
df=df.drop('id',axis=1) #delete the id column
df1=pd.DataFrame(columns=['time','open','close','high','low','vol'])#Create a new target data table
for i in df.groupby('time'): #Group by time
new_df=pd.DataFrame(columns=['time','open','close','high','low','vol']) #Create an empty table for temporary dumping of required data
new_df.time=i[1].time[0:1] # Take each group of time as the new table time
new_df.open=i[1].close[0:1] #Take the first close data of each group as the new table open data
new_df.close=i[1]['close'].iloc[-1] # Take the last close data of each group as the new table close data
new_df.high=i[1]['close'].max() # Take the maximum value of each group of close data as the new table high data
new_df.low=i[1]['close'].min() # Take the minimum value of each group of close data as the new table low data
new_df.vol=i[1]['vol'].iloc[-1] - i[1]['vol'].iloc[0] #Subtract the minimum value from the maximum value of each set of vol data for the new table vol data
df1=pd.concat([new_df,df1],axis=0) #纵向合并数据到目标数据表
df2=df1.sort_values(‘time‘) #按time列值进行排序df2.reset_index(inplace=True, drop=True) #重置行索引print(df2) #打印目标数据表stop=time.time() #查看耗时print(‘共计耗时:{}秒‘.format(stop-start))