Python simply processes excel data (splitting and merging cells, merging sheets according to the header, adding columns, adding content operations)

Introduction : A large amount of data is stored in an excel table. Use the python language to preprocess excel with stored data as required, making it a neat data that is easy to use
Keywords : Python Excel openpyxl Pandas

Description of the problem: Temperature Data Sheet
An excel file that stores meteorological information in different regions has multiple sheets, and the stored data types are consistent, and there are merged cells in some sheets.
Require:

  1. Merge sheets according to the corresponding data attribute type
  2. After merging into one sheet, a new column is added, and the cells in the first column are filled with the sheet name corresponding to the data
  3. Split all the merged cells in the table, and fill the split cells with the contents of the original cells

Detailed code explanation:
insert image description here
put the test excel file and the excel file to be processed in the same folder, which is convenient for filling in the file path and reducing the amount of code. You can create a blank file and place the processed data (or place it in the original file)

main.py
function:
1. Complete the insertion of a column of content for each sheet and fill in the sheet name
2. Split the cells in all sheets and fill in the original value

import openpyxl
import pandas as pd

wb = openpyxl.load_workbook('weatherdata.xlsx')

for worksheet in wb._sheets:
   worksheet.insert_cols(idx=1)
   worksheet.cell(1, 1).value ="省份"
   for i in range(2,worksheet.max_row+1):
       worksheet.cell(i, 1).value=worksheet.title


for worksheet in wb._sheets:
    m_list = worksheet.merged_cells
    cr = []
    for m_area in m_list:
        r1, r2, c1, c2 = m_area.min_row, m_area.max_row, m_area.min_col, m_area.max_col
        if r2 - r1 > 0:
            cr.append((r1, r2, c1, c2))
            print('符合条件%s' % str(m_area))
    for r in cr:
        worksheet.unmerge_cells(start_row=r[0], end_row=r[1],start_column=r[2], end_column=r[3])
        for i in range(r[1] - r[0] + 1):
            for j in range(r[3] - r[2] + 1):
                worksheet.cell(row=r[0] + i, column=r[2] + j).value=worksheet.cell(r[0], r[2]).value


wb.save('weatherdata.xlsx')

The part of splitting cells is to learn from the code content of this blogger
https://blog.csdn.net/weixin_44788825/article/details/104526131

Step2.py
merges multiple sheets into one sheet according to the header, and saves the content in the output.xlsx file

import openpyxl
import pandas as pd

iris = pd.read_excel('weatherdata.xlsx',None)#读入数据文件
keys = list(iris.keys())
iris_concat = pd.DataFrame()
for i in keys:
    iris1 = iris[i]
    iris_concat = pd.concat([iris_concat,iris1])
iris_concat.to_excel('output.xlsx')

wbc = openpyxl.load_workbook('output.xlsx')
sheet1 = wbc.active
sheet1.delete_cols(1)
wbc.save('output.xlsx')

Note: There may be problems with the import of openoyxl and pandas packages. After downloading with pip, it may still prompt that it is not installed in pycharm. You can directly install and use it in pycharm according to the prompts.

Guess you like

Origin blog.csdn.net/qq_45742383/article/details/124647922