Modern life, it is difficult not to deal with the excel sheet, excel sheet has the advantages of easy to use, but when a large amount of data in the table, we need to copy and paste some data (such as ID number) from other statistical forms in time we will be more tired, after all, we are not machines, not for a long time to do some boring repetitive operations. Imagine a scenario where we have a table to fill thousands of lines, you need to enter the corresponding ID number by name, but before we have done a similar table, some of the same person's name with the ID number is complete , then we will need to pass one by one to find the name, identity card number and then copy to our current table do go.
When I repeated day after day with these operations, I always wanted to have an automated tool to do this operation, as the man freed me from this inhuman torture, the thought of finally python, because so I can focus on some of the little details of the internal language, which focus on solving this problem.
Install command pip install openpyxl (line installation) or easy_install openpyxl.
openpyxl operation can be divided into four steps, the first step to create or load an existing workbook workbook into memory, respectively, using
Import load_workbook openpyxl from from openpyxl Import the Workbook # Loading workbook in the prior WB1 = load_workbook ( 'lalala.xlsx') "" " in a large amount table when the data source, here we can use the openpyxl read_only mode load source table the benefits of doing so is not the whole tables are loaded into memory "" " wb1 = load_workbook (filename = 'lalala.xlsx', READ_ONLY = True) # create a Workbook wb2 = Workbook ()
The second step is to operate the excel table sheet, workbook () created by the default Workbook name of the active sheet to Sheet, can be verified by python interactive command line.
# Access activity sheet WS = wb.active # Set sheet heading ws.title = "the Range names" # to create Pi is the title of the sheet WS = wb.create_sheet (title = "Pi") # get the title sheet Sheet1 ws = wb [ 'Sheet1']
The third step is the operating sheet in the cell. It should be noted that the location of a cell is determined by it in the column with the line, such as a cell, it is in column A, and in the third row, can be accessed through ws [ 'A3']. further having a column with the cell row attributes, cell.row with cell.column data types as shown in FIG.
Special attention when loading workbook with read_only mode, cell.row with cell.column are int object. cell.column cell is recorded from the number of offset columns where the first column, is not truly representative of the number of columns in the workbook capital letters, such as "A".
# Get the first row, the data type tuplerow = ws [1] # Get A column, the data type tuplecolumn = ws [ 'A'] # set value for the F5 ws [ 'F5'] = ' sfs' # Set the cell value ws [ 'F5']. value = 'hello' # obtain a cell number of rows m = ws [ 'F5'] . row # to get the number of columns cell of n = ws [ 'F5'] . column # obtain a particular area values, such as from F5 to F30, the data type tuplek = ws [ 'F5': 'F30'] # obtain the value of a specific area, such as from F5 to G30, the data type tuplej = ws [ 'F5': 'G30' ] # Gets sheet maximum number of rows row_count = ws.max_row # obtain sheet maximum number of columns column_count = ws.max_column
last step to save changes to note here, if you want to save the table (microsoft office or wps) in other software when you open, save operation will complain
wb1.save('empty_book.xlsx') wb2.save(filename='other_book.xlsx')
Implementation requirements
Get_info_from_excel.py create a new file to edit with your favorite text editor, you first need to introduce openpyxl library load_workbook module. You can use load_workbook loaded excel table already exist.
from openpyxl import load_workbook
Our aim is to extract information from the source excel sheet and bulk copy to the target excel table, we first define some variables.
# Source Table Name source_file_name = 'lalala.xlsx' # object table name target_file_name = 'lelele.xlsx' # source table to extract information sheet source_sheet_name = 'Sheet2' # destination table to bulk copy sheet information target_sheet_name = 'Sheet2 ' # header row in the source table which row source_header_row. 3 = # rows in the target table header row which target_header_row = 2 # source table to extract information according to which column of data, the source table header row source_cell_condition =' name ' # to copy the destination table column data information according to which, according to the target table heading row target_cell_condition = 'name' # source table columns to extract information source_cell_filled = 'ID number' # target table columns you want to copy information target_cell_filling = 'identity No. '
The source table with the destination table into memory, to facilitate the next step the two tables.
# A large amount of the data source table when here we can use openpyxl of read_only mode load source table, the benefits of doing so is not the whole tables are loaded into memory # wb_w = load_workbook (source_file_name) wb_r = load_workbook (filename = source_file_name, READ_ONLY = True) wb_w = load_workbook (target_file_name)
Rows from the sheet name with the title number of rows in the source table with the destination table to obtain the title has already been defined:
ws_r=wb_r[source_sheet_name] ws_w=wb_w[target_sheet_name] header_row_r=ws_r[source_header_row] header_row_w=ws_w[target_header_row]
Source operating table header row, we want to get information:
"" " When openpyxl loaded with read_only workbook mode, the acquired cell is not an ordinary cell, tested cell.column integer offset into columns, so here we define a function to process, convert integer to excel the real number of columns, such as "a", "BB" et "" " DEF readOnly_offsetColunmNumber_toRealColumn (number): column = '' IF number <= 26 is: column = CHR (number + the ord ( 'a') -. 1) the else : number1 = // Number 26 is column1 = CHR (number1 the ord + ( 'A') -. 1) number2% = Number 26 is Column2 = CHR (number2 the ord + ( 'A') -. 1) column column1 + = Column2 return column # initialize two variables, the conditions are the source column of the table, to copy a column source_condition_column = '' source_filled_column = '' "" " Title loop source table columns, the conditions to obtain the position of columns and column location to be copied, then conditions are obtained by cyclically embedded maximum number of rows "" " for cell in header_row_r: if cell.value==source_cell_condition: source_condition_column=readOnly_offsetColunmNumber_toRealColumn(cell.column) elif cell.value==source_cell_filled: source_filled_column=readOnly_offsetColunmNumber_toRealColumn(cell.column)
Operation target table header row, we want to get information:
# Initialize two variables, namely, the condition of the target table the column, the column to be pasted target_condition_column = '' target_filling_column = '' "" " title bar loop target table, the conditions to obtain a column and a position of the column to be pasted, then by the maximum number of rows nested loops to get the condition column "" "for cell_j in header_row_w: IF cell_j.value == target_cell_condition: target_condition_column = cell_j.column elif cell_j.value == target_cell_filling: target_filling_column = cell_j.column
Now we've got all the required information, the actual time to paste the data.
"" " Conditional loop target table columns, conditions inside nested loops source table column, a cell condition once the target table columns same as the value of a cell with the conditions of the source table columns, we'll want to copy the source table columns cell in the same row value cell of the same row of the table gives the target column to be pasted. "" " for cell_m in ws_w [target_condition_column + STR (+ target_header_row. 1): target_condition_column + STR (ws_w.max_row)]: for cell_N in ws_r [ + STR source_condition_column (+ source_header_row. 1): source_condition_column + STR (ws_r.max_row)]: IF cell_m [0] == cell_N .Value [0] .Value: ws_w [target_filling_column + STR (cell_m [0] .Row)]. value = ws_r [source_filled_column + str ( cell_n [0] .row)]. value
Finally, save the target workbook on it.
wb_w.save(target_file_name)
This article is reproduced in https://www.py.cn/toutiao/11131.html