Table – switch between three columns
Join us to have a table format similar to the following:
north | sky | superior | Heavy | stone | open | bear | Tang | Qin | |
---|---|---|---|---|---|---|---|---|---|
north | 0 | 75 | 33 | 45 | 166 | 67 | 52 | 69 | 3 |
sky | 75 | 0 | 46 | 21 | 214 | 70 | 64 | 221 | 70 |
superior | 33 | 46 | 0 | 55 | 31 | 2 | 0 | 2 | 1 |
Heavy | 45 | 21 | 55 | 0 | 29 | 0 | 0 | 0 | 0 |
stone | 166 | 214 | 31 | 29 | 0 | 8 | 8 | 9 | 5 |
open | 67 | 70 | 2 | 0 | 8 | 0 | 0 | 5 | 0 |
bear | 52 | 64 | 0 | 0 | 8 | 0 | 0 | 12 | 0 |
Tang | 69 | 221 | 2 | 0 | 9 | 5 | 12 | 0 | 19 |
Qin | 3 | 70 | 1 | 0 | 5 | 0 | 0 | 19 | 0 |
We want to convert it into a three-column form:
source | target | value |
---|---|---|
north | sky | 75 |
north | superior | 334 |
north | Heavy | 45 |
north | stone | 166 |
So how to do it?
pandas library to read Excel files and convert them into three columns
import pandas as pd
# 读取Excel文件
df = pd.read_excel('网络.xlsx', index_col=0)
# 删除0距离值
df = df[df != 0].dropna(how='all').dropna(axis=1, how='all')
# 将表格转换为三列式
triplets = []
for row in df.index:
for col in df.columns:
if not pd.isna(df.loc[row, col]):
triplets.append([row, col, df.loc[row, col]])
# 将三列式保存到Excel文件中
triplets_df = pd.DataFrame(triplets, columns=['起始城市', '目标城市', '距离值'])
triplets_df.to_excel('network.xlsx', index=False)
This code will read an Excel file called "web.xlsx" and convert it to a three-column format. It removes entries with a distance value of 0 and adds the distance value between each pair of cities in the table as a separate row to a list. Finally, it will convert this list into a new pandas DataFrame and save it to an Excel file named "network.xlsx". Note that we set index=False
to avoid writing the DataFrame index to the Excel file
How to implement the three-column conversion symmetric matrix in the above operations?
Using Python and the pandas library, you can convert a three-column format into a table format in an Excel file
Here is a sample code:
import pandas as pd
# 读取三列式
df = pd.read_excel('三列式.xlsx')
# 将三列式转换为表格格式
pivot_table = df.pivot_table(index='起始城市', columns='目标城市', values='距离值')
# 将表格保存为Excel文件
pivot_table.to_excel('网络.xlsx')
This code will read an Excel file called "three-column.xlsx" and convert it to a table format. It will use the start city and destination city as index and column labels, and the distance value as the cell value. Finally, it will save the table to an Excel file called "web.xlsx".
Note that if you have duplicate city pairs in the three-column, then when using pivot_table
the method you need to specify how to combine the duplicate values. By default, pivot_table
the method uses average to combine duplicate values, but you can use aggfunc
parameters to specify other aggregation functions, such as min
, max
or sum
.
But we found that there will be a lot of null values when generated, so how to brake the null values to generate 0? Excel can be operated directly, but how to implement the specific code?
#在将三列式转换为表格格式时,如果缺少某些值,则可以使用fillna()方法将缺失值填充为0。
import pandas as pd
# 读取三列式
df = pd.read_excel('三列式.xlsx')
# 将三列式转换为表格格式,并将缺失值填充为0
pivot_table = df.pivot_table(index='起始城市', columns='目标城市', values='距离值').fillna(0)
# 将表格保存为Excel文件
pivot_table.to_excel('网络.xlsx')
This code will read an Excel file called "three-column.xlsx" and convert it to a table format. It will use the start city and destination city as index and column labels, and the distance value as the cell value. It then fillna()
fills the missing values with 0 using the method. Finally, it will save the table to an Excel file called "web.xlsx".
Note that if you have duplicate city pairs in the three-column, pivot_table
missing values will be automatically filled with NaN when using the method. Therefore, before saving the table to the Excel file, you need to use fillna()
the method to fill NaN with 0 to avoid errors.
If you need data and code, please pay attention to my WX:Jdaystudy