1. Con base en el flujo de transacciones, calcule los siguientes indicadores de cada objeto de transacción,
el número total de transacciones de ingresos, el total de meses de ingresos, el número máximo de transacciones de ingresos mensuales y la fecha del número máximo de transacciones de ingresos mensuales (año , mes, formato de día)
import pandas as pd
path = r"C:\Users\xxx\Desktop\科技数据.xlsx"
df = pd.read_excel(path) #读取excel数据
df['_COL0']= pd.to_datetime(df['_COL0']) #日期数据格式转换
res = df[df['_COL4 '] == '收入'] #查询收入数据
res.reset_index(inplace=True, drop=True)
res.info()
res.head() #查看数据
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2134 entries, 0 to 2133
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 2134 non-null object
1 _COL0 2134 non-null datetime64[ns]
2 x 2134 non-null object
3 _COL4 2134 non-null object
4 _col5 2134 non-null float64
5 交易对方 2133 non-null object
dtypes: datetime64[ns](1), float64(1), object(4)
memory usage: 100.2+ KB
identificación | _COL0 | X | _COL4 | _col5 | contraparte | |
---|---|---|---|---|---|---|
0 | 1000050001202305060010\n153222358 | 2023-05-06 18:10:45 | transferir | ingreso | 676.0 | N° 303, N° 7, Carril 3 |
1 | 1000050001202305060717\n985751052 | 2023-05-06 18:10:10 | transferir | ingreso | 1120.0 | Bei Wei 401 |
2 | 1000050001202305061319\n409873668 | 2023-05-06 17:32:31 | transferir | ingreso | 1100.0 | Baihuadong Beiwei\n501 |
3 | 1000050001202305061412\n834865884 | 2023-05-06 17:31:41 | transferir | ingreso | 2002.0 | Tienda No. 5, Carril 1\n Tienda Tongquanzhi 15\n323538188 |
4 | 1000050001202305060217\n908459280 | 2023-05-06 17:26:32 | transferir | ingreso | 887.0 | No. 401, carril 5\n(inquilino) |
# 计算每个交易对象的指标
def max_count_date(x):
idx = x.groupby(x['_COL0'].dt.month)['_col5'].transform('count').idxmax()
return res.iloc[idx,1].strftime('%Y%m%d')
grouped = res.groupby('交易对方')
summary = grouped.agg(总收入笔数=('交易对方', 'count'), 总收入月数=('_COL0',lambda x: x.dt.month.nunique()) )
max_monthly_income = grouped.apply(lambda x: x['_col5'].groupby(x['_COL0'].dt.month).count().max())
summary['最大月收入笔数'] = max_monthly_income.astype('int')
summary['最大月收入笔数所在日期'] = grouped.apply(max_count_date)
summary #查看指标结果
Ingresos totales | Ingresos totales en meses | Número máximo de transacciones de ingresos mensuales | La fecha del ingreso mensual máximo. | |
---|---|---|---|---|
contraparte | ||||
.san | 1 | 1 | 1 | 20230126 |
/ | 20 | 6 | 5 | 20230214 |
166858006\n45 | 1 | 1 | 1 | 20220619 |
No. 30, 201 (\nMovido) | 1 | 1 | 1 | 20220802 |
N° 30 202 | 3 | 3 | 1 | 20230506 |
... | ... | ... | ... | ... |
Longjiang 802 Barbacoa\nParrilla | 6 | 5 | 2 | 20230119 |
Longjiang 804 (la tarifa de electricidad cambió a 1 yuan) | 1 | 1 | 1 | 20220702 |
Industria Longjiang 901\nPropietario | 10 | 10 | 1 | 20230405 |
ꦿsellado 訫ꦿbloqueo\n爱꧔ꦿএ᭄ | 1 | 1 | 1 | 20230412 |
꧁꫞꯭ único\n二꫞꧂ | 2 | 2 | 1 | 20221103 |
502 filas × 4 columnas
2. Haga coincidir los indicadores calculados anteriormente con la tabla de datos original, haga coincidir por objeto de transacción (nombre de la persona) y agregue las columnas de indicadores anteriores a la tabla original.
# 将指标匹配到原始数据表中
df = df.merge(summary, on='交易对方', how='left')
df[df.duplicated('交易对方')].sort_values('交易对方') #查看指标是否正确
identificación | _COL0 | X | _COL4 | _col5 | contraparte | Ingresos totales | Ingresos totales en meses | Número máximo de transacciones de ingresos mensuales | La fecha del ingreso mensual máximo. | |
---|---|---|---|---|---|---|---|---|---|---|
164 | 1800008005230419011159\n1100710500 | 2023-04-19 15:55:11 | Transferir fuera del Pase de cambio-\nPara cambiar | otro | 6000.0 | / | 20.0 | 6.0 | 5.0 | 20230214 |
1806 | 4200001705202212059573\n010442 | 2022-12-05 16:34:49 | Transferir a Cambio-\nDesde Cambio | otro | 7714.0 | / | 20.0 | 6.0 | 5.0 | 20230214 |
1777 | 4200001709202212080999\n467748 | 2022-12-08 12:05:07 | Transferir a Cambio-\nDesde Cambio | otro | 9806.0 | / | 20.0 | 6.0 | 5.0 | 20230214 |
1516 | 4200001715202301033827\n069012 | 2023-01-03 22:01:40 | Transferir a Cambio-\nDesde Cambio | otro | 193,5 | / | 20.0 | 6.0 | 5.0 | 20230214 |
800 | 4200001788202303023467\n241839 | 2023-03-02 18:29:55 | Transferir a Cambio-\nDesde Cambio | otro | 10321.5 | / | 20.0 | 6.0 | 5.0 | 20230214 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
638 | 1000050001202303070411\n163291909 | 2023-03-07 15:31:25 | transferir | ingreso | 332.0 | Industria Longjiang 901\nPropietario | 10.0 | 10.0 | 1.0 | 20230405 |
2329 | 1000050001202210051317\n077385423 | 2022-10-05 17:41:53 | transferir | ingreso | 591.0 | Industria Longjiang 901\nPropietario | 10.0 | 10.0 | 1.0 | 20230405 |
1752 | 1000050001202212081215\n944574957 | 2022-12-08 15:34:45 | transferir | ingreso | 338.0 | Industria Longjiang 901\nPropietario | 10.0 | 10.0 | 1.0 | 20230405 |
2145 | 1000050001202211050215\n131777661 | 2022-11-05 14:03:06 | transferir | ingreso | 388.0 | Industria Longjiang 901\nPropietario | 10.0 | 10.0 | 1.0 | 20230405 |
2224 | 1000039901000210306181\n885746052 | 2022-10-30 15:31:23 | Sobre rojo de WeChat | ingreso | 88.0 | ꧁꫞꯭ único\n二꫞꧂ | 2.0 | 2.0 | 1.0 | 20221103 |
2597 filas × 10 columnas
3. Defina los primeros dos meses de un objeto de transacción con un número de mes de transacción mayor o igual a 2 como 1 y 2, y no defina un objeto de transacción con un número de mes de transacción menor que 2. Y coincide con la tabla original, es decir, se agrega una nueva columna, first_twoM, a la tabla original.
df['first_twoM'] = np.nan
def get_first_two_m(x):
months = x.dt.month.drop_duplicates()
if len(months) >=2:
first_t = months[0:2].values
arr1 = x.dt.month.values == first_t[0]
idx1 = np.where(arr1 == True)
arr2 = x.dt.month.values == first_t[1]
arr2 = np.where(arr2, 2, np.nan)
arr2[idx1] = 1
return arr2
df.sort_values(['交易对方', '_COL0'], inplace=True)
df['first_twoM'] = df.groupby('交易对方')['_COL0'].transform(get_first_two_m)
df
identificación | _COL0 | X | _COL4 | _col5 | contraparte | Ingresos totales | Ingresos totales en meses | Número máximo de transacciones de ingresos mensuales | La fecha del ingreso mensual máximo. | primero_dosM | ¿Los meses de negociación son consecutivos? | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1293 | 1000039801000301266118\n805364803 | 2023-01-26 19:03:36 | Sobre rojo de WeChat | ingreso | 30.0 | .san | 1.0 | 1.0 | 1.0 | 20230126 | NaN | NaN |
2584 | 1000039801202209056110\n700265037 | 2022-09-06 14:35:30 | 微信红包-退\n款 | 收入 | 1.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 1.0 | 0.0 |
2154 | 1000050001202211040915\n379319646 | 2022-11-04 21:08:44 | 转账-退款 | 收入 | 2250.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
2036 | 4200001605202211089277\n950418 | 2022-11-08 00:48:37 | 转入零钱通-\n来自零钱 | 其他 | 1.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
2034 | 4200001627202211082029\n141916 | 2022-11-08 00:50:31 | 转入零钱通-\n来自零钱 | 其他 | 12500.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
199 | 1000039801000304126238\n646974024 | 2023-04-12 22:45:58 | 微信红包 | 收入 | 100.0 | ꦿ封訫ꦿ鎖\n愛꧔ꦿএ᭄ | 1.0 | 1.0 | 1.0 | 20230412 | NaN | NaN |
2224 | 1000039901000210306181\n885746052 | 2022-10-30 15:31:23 | 微信红包 | 收入 | 88.0 | ꧁꫞꯭独一无\n二꫞꧂ | 2.0 | 2.0 | 1.0 | 20221103 | 1.0 | 1.0 |
2196 | 1000039801000211036179\n418892051 | 2022-11-03 13:01:38 | 微信红包 | 收入 | 2.0 | ꧁꫞꯭独一无\n二꫞꧂ | 2.0 | 2.0 | 1.0 | 20221103 | 2.0 | 1.0 |
2045 | 1000050001221107000531\n19264996526500 | 2022-11-07 16:57:13 | 转账 | 支出 | 220.0 | (违约退租\n)30号302 | NaN | NaN | NaN | NaN | NaN | NaN |
1674 | 1000107101202212160157\n5756505506 | 2022-12-16 19:11:46 | 二维码收款 | 收入 | 254.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3283 rows × 12 columns
4、根据交易日期,将连续的月份定义为1,不连续定义为0。例如一个人交易月份
202201、202202、202205、202206 、202208、202210、202211、202302
def transaction_date(x):
x.sort_values(inplace=True)
month = x.dt.month.drop_duplicates()
if len(month) <=1 :
return #0 只有一个月份数的可以在这里设置返回零
dif1 = abs(month.diff(periods=-1))
dif1 = dict(zip(month.values,dif1.values))
dif2 = abs(month.diff(periods=1))
dif2 =dict(zip(month.values,dif2.values))
transaction_month = set([i for i,j in dif2.items() if j in [1,11]] + [i for i,j in dif1.items() if j in [1,11]])
return x.dt.month.isin(transaction_month).astype('int')
df['交易月份是否连续'] = df.groupby('交易对方')['_COL0'].transform(transaction_date)
df
id | _COL0 | x | _COL4 | _col5 | 交易对方 | 总收入笔数 | 总收入月数 | 最大月收入笔数 | 最大月收入笔数所在日期 | first_twoM | 交易月份是否连续 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1293 | 1000039801000301266118\n805364803 | 2023-01-26 19:03:36 | 微信红包 | 收入 | 30.0 | .San | 1.0 | 1.0 | 1.0 | 20230126 | NaN | NaN |
2584 | 1000039801202209056110\n700265037 | 2022-09-06 14:35:30 | 微信红包-退\n款 | 收入 | 1.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 1.0 | 0.0 |
2154 | 1000050001202211040915\n379319646 | 2022-11-04 21:08:44 | 转账-退款 | 收入 | 2250.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
2036 | 4200001605202211089277\n950418 | 2022-11-08 00:48:37 | 转入零钱通-\n来自零钱 | 其他 | 1.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
2034 | 4200001627202211082029\n141916 | 2022-11-08 00:50:31 | 转入零钱通-\n来自零钱 | 其他 | 12500.0 | / | 20.0 | 6.0 | 5.0 | 20230214 | 2.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
199 | 1000039801000304126238\n646974024 | 2023-04-12 22:45:58 | 微信红包 | 收入 | 100.0 | ꦿ封訫ꦿ鎖\n愛꧔ꦿএ᭄ | 1.0 | 1.0 | 1.0 | 20230412 | NaN | NaN |
2224 | 1000039901000210306181\n885746052 | 2022-10-30 15:31:23 | 微信红包 | 收入 | 88.0 | ꧁꫞꯭独一无\n二꫞꧂ | 2.0 | 2.0 | 1.0 | 20221103 | 1.0 | 1.0 |
2196 | 1000039801000211036179\n418892051 | 2022-11-03 13:01:38 | 微信红包 | 收入 | 2.0 | ꧁꫞꯭独一无\n二꫞꧂ | 2.0 | 2.0 | 1.0 | 20221103 | 2.0 | 1.0 |
2045 | 1000050001221107000531\n19264996526500 | 2022-11-07 16:57:13 | 转账 | 支出 | 220.0 | (违约退租\n)30号302 | NaN | NaN | NaN | NaN | NaN | NaN |
1674 | 1000107101202212160157\n5756505506 | 2022-12-16 19:11:46 | 二维码收款 | 收入 | 254.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3283 rows × 12 columns
- 自定义excel表格式导出
class writer():
def __init__(self, df: list, sheet_name: list, key: list, path: str):
self.df = df #dataframe数据
self.sheet_name = sheet_name #sheet名称
self.key = key #需要上色字段
self.path = path #excel保存地址
def save(self):
path = "C:/Users/15028/Desktop/{}{}(导出时间{}).xlsx".format(pd.Timestamp.now().strftime("%y%m%d"),self.path,pd.Timestamp.now().strftime("%H时%M分%S秒"))
writer = pd.ExcelWriter(path,engine='xlsxwriter') #创建pandas.ExcelWriter实例,赋值给writer
for j in range(len(self.sheet_name)): #遍历sheet名称列表
#创建sheet表并写入dataframe数据
self.df[j].to_excel(writer, sheet_name=self.sheet_name[j], index=False #不写入索引index=False
,freeze_panes=(1,2)) #设置固定1、2列(固定列无法左右移动)
workbook = writer.book #工作蒲格式方法
worksheet = writer.sheets[self.sheet_name[j]] #读取sheet表
# 计算每列的合适字符宽度,放到元组中
widths = (self.df[j].astype(str).applymap(lambda x: 40 if len(x)>61 else len(x)+9 if len(x)<9 else len(x)).agg(max).values)
#设置每列宽度大小
[worksheet.set_column(i, i, width) for i, width in enumerate(widths)] #计算的宽度,设置列宽
color = ['#03A89E','#00C78C','#FFFFCD','#FFC0CB','#808A87','#FFE384','#ED9121','#40E0D0','#FFFFCD']*5 #颜色列表
(worksheet.set_column(i,i,widths[i],workbook.add_format({
'fg_color': color[i-11],
'valign': 'vcenter',# 垂直对齐方式
'font_size': 10, #字体大小
'border': 4, #单元格边框宽度
'align': 'left' # 水平对齐方式
})) for
i,x in enumerate(self.df[j].columns) if self.df[j].columns[i] in self.key[0])
#设置首行宽度
worksheet.set_row(0,20,workbook.add_format({
'fg_color':'#40E0D0'#背景颜色
,'bold': True,#字体加粗
'valign': 'vcenter',# 垂直对齐方式
'font_size': 16, #字体大小
'border': 10, #单元格边框宽度
'align': 'left' # 水平对齐方式
}))
# worksheet.set_column("A:B",None,None,{'hidden':1}) #A至B列,隐藏
writer.close()
writer1 = writer(df=[df],path='交易数据',key=[df.columns[-6:]],sheet_name=['交易表'])
writer1.save()