An article takes you to use pandas to process csv files

1. My needs

For such a csv table, you need
to splice it (1) the name and date of the business department and the stock code
(2) for data other than the purchase amount, you need to add their purchase amount, and each purchase The symbol of the amount multiplied by the transaction number indicates the purchase amount corresponding to the business name

For example: xx company, 20190731,1, stock 1,4000, C20201010, xxxx
The result I want here is: xx company 2019713C20201010,4000

Insert picture description here

Two, the code

(1) First of all, since the file is gbk, you need to pay attention to encoding
(2) Date is of int type, so it needs to be converted to string

import pandas as pd
import numpy as np

#读取数据
filename = "test.csv"
# 读取 excel 表,根据文件的编码指定编码方式
data = pd.read_csv(filename, encoding='gbk')
# 将所有内容转为字符串
# data = data.applymap(str)
# 将日期这一列转为字符串
data['日期'] = data['日期'].apply(str)

# print(data.loc[0,'营业部名称'])
# print(data.loc[0,'日期'])
# print(data.loc[0,'股票代码'])
# print(data.loc[0,'买卖序号'])
# print(data.loc[0,'买入金额'])

# 拼接:营业部名称+日期+股票代码
data['name_date_code'] = data['营业部名称'] + data['日期'] + data['股票代码']
# 取"买卖序号"的符号和买入金额相乘
# np.sign 获取序号对应的符号
data['buy'] = np.sign(data['买卖序号']) * data['买入金额']
data = data.drop(['营业部名称', '日期', '买卖序号', '股票名', '买入金额', '股票代码', 'data_stock'], axis=1)

# 将 name_date_code 相同的行,金额相加
buy_sum = data.groupby('name_date_code')['buy'].sum()
# 将相加的金额加入数据data,缺失数据用0填充
data['buy_sum'] = data.loc[:, 'name_date_code'].map(buy_sum).fillna(0)
# 将买入金额删掉,只剩下两列数据
data = data.drop(['buy'], axis=1)
# 删除重复行
data = data.drop_duplicates()
# 写入数据,同样需要注意指定编码格式
data.to_csv("YYBD_result.csv", encoding='gbk',index=False)

Three, summary

(1) Encoding format, normally utf-8 does not need to be specified, just use the default

(2) pandas reads a row of data

# data.iloc 取一整行
print(data.iloc[0])

(3) Pandas is really powerful in processing data, string splicing, type conversion, and deleting duplicate rows are really convenient

Insert picture description here

Guess you like

Origin blog.csdn.net/nanhuaibeian/article/details/108996071