Table of contents
1. Install dependent libraries
9. Get the number of rows and columns
14. Computing Summary Statistics
Whether you are a data analyst, financial specialist or researcher, Excel is one of the essential tools in your daily work. With the powerful functions of Python, the efficiency and flexibility of Excel data processing can be greatly improved. Next, let's explore these commonly used Excel operations together to inject more convenience and efficiency into the workflow!
1. Install dependent libraries
Use the `pip` command on the command line to install the `pandas` and `openpyxl` libraries, which are used to process Excel and read/write Excel files, respectively.
pip install pandas openpyxl
2. Import library
Import the `pandas` and `openpyxl` libraries in the Python script.
import pandas as pd
from openpyxl import Workbook, load_workbook
3. Read Excel file
Use the `read_excel()` function to read data from the Excel file, which returns a DataFrame object that contains the data in the Excel file.
data = pd.read_excel('filename.xlsx')
Note that `filename.xlsx` is the name of the Excel file.
4. Write Excel file
Write data to an Excel file using the `to_excel()` function, which writes the data in a DataFrame object to the specified Excel file.
data.to_excel('new_filename.xlsx', index=False)
`index=False` means not to include index columns.
5. Create a worksheet
Create a new worksheet using the `create_sheet()` function.
workbook = Workbook()
worksheet = workbook.create_sheet('Sheet1')
In this example, we have created a new sheet called 'Sheet1'.
6. Access worksheets
Use the `active` attribute or the `get_sheet_by_name()` function to access an existing sheet.
worksheet = workbook.active
# 或
worksheet = workbook.get_sheet_by_name('Sheet1')
The `active` attribute accesses the active sheet, while the `get_sheet_by_name()` function accesses the sheet with the specified name.
7. Read cell data
Use the `cell()` method to get the value of a specific cell, you need to provide the row number and column number.
cell_value = worksheet.cell(row=1, column=1).value
In this example, we read the cell data in the first row and first column.
8. Write cell data
To write a value to a specific cell using the `cell()` method, the row and column numbers are also required.
worksheet.cell(row=1, column=1, value='Hello')
In this example, the string 'Hello' is written to the cell in the first row and first column.
9. Get the number of rows and columns
Use the `shape` property to get the number of rows and columns of the data table.
num_rows = data.shape[0]
num_cols = data.shape[1]
The `shape` attribute returns a tuple containing the number of rows and columns.
10. Filter data
Filter data using conditional filter statements, for example, based on a column whose value is greater than a certain value.
filtered_data = data[data['Column'] > 10]
In this example, we filter the data with 'Column' greater than 10.
11. Sort data
Use the `sort_values()` function to sort the data by the specified columns.
sorted_data = data.sort_values(by='Column')
In this example, we sort the data in ascending order by the column 'Column'.
12. Add a new row
Use the `append()` function to add new rows of data to the DataFrame object.
new_data = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
data = data.append(new_data, ignore_index=True)
In this example, we added a new row containing columns 'A', 'B' and 'C'.
13. Delete row or column
Use the `drop()` function to drop specific rows or columns.
data = data.drop(index=0) # 删除第一行
data = data.drop(columns=['Column1', 'Column2']) # 删除指定列
In this example, we delete the first row and columns named 'Column1' and 'Column2'.
14. Computing Summary Statistics
Use the `describe()` function to calculate basic statistics about the data, such as mean, standard deviation, etc.
summary_stats = data.describe()
In this example, we calculated basic statistics on the data.
These are common operations when working with Excel using Python. Depending on your specific needs, you can select one or more of these operations to process and manipulate Excel files. Hope to help you!
Summarize
From reading and writing Excel files, creating and accessing worksheets, to reading and writing cell data, to data filtering, sorting and summary statistics, these operations cover the key steps in the data processing process. Using Python to process Excel can not only improve work efficiency, but also provide more flexibility and customization options for data processing.
At the same time, it should be noted that this is just the tip of the iceberg in Excel processing. Python has more powerful functions and libraries to explore in processing Excel, such as xlrd, xlwt, xlsxwriter, etc. If these skills and tools can be used flexibly according to actual needs in the work, the efficiency and quality of data processing will be greatly improved.