pandas—common uses for data selection

When using pandas, statistical calculations are often performed on a certain row, a certain column, and data that meets the conditions.
The following summarizes common methods of pandas data selection, including the use of loc, iloc and other methods.
First read the data:

df = pd.read_excel('zpxx.xlsx')

1. Obtain elements, indexes, and column names

You can use the basic attributes values, index, and columns of DataFrame to obtain the elements, indexes, and column names respectively.

print('获取元素:\n', df.values)  # 返回二维列表

Insert image description here

print('获取索引:\n', df.index)  # 返回行的索引,可使用list转换为列表格式

Insert image description here

print('获取列名:\n', df.columns)  # 返回字段名,可使用list转换为列表格式

Insert image description here
2. Row selection

(1) head() and tail() methods

The head() and tail() methods provided by DataFrame can achieve the acquisition of multiple rows of data, and obtain continuous data from the beginning or the end. The default is the first or last 5 rows of data; you can enter the number of accessed rows in the method to achieve the target row. Number of views.
by default:

print('前5行(默认)数据:\n', df.head())
print('后5行(默认)数据:\n', df.tail())

Insert image description here
Specify the number of rows to view:

print('指定查看前3行数据:\n', df.head(3))

Insert image description here
Specify the target number of rows to view for a field:

print('指定查看【关键词】字段的前3行数据:\n', df['关键词'].head(3))

Insert image description here
(2) Slicing method

Format: df[m:n], m and n represent the specified number of rows, left closed and right open

print('查看第2-第6行数据:\n', df[1:6])

Insert image description here
3. Column selection
(1) Use a dictionary to access the value of a certain key.
Select a column: df['column name']
. Select multiple columns: df[['column name 1', 'column name 2', 'column name3']]

Select a column:

print('选取【采集时间】列:\n', df['采集时间'])

Insert image description here
Select multiple columns:

print('选取多列:\n', df[['关键词', '采集时间']])

Insert image description here

(2) Method of accessing attributes. Usage
: df. column name.
It is best not to use it. It is easy to cause confusion between field names and internal fixed method names.

print('选取【采集时间】列:\n', df.采集时间)

Insert image description here

4. Loc and iloc row and column selection
(1) loc usage
Syntax: df.loc [row index name or condition, column index name]
loc is a slicing method for the DataFrame index name. The index name must be passed in, otherwise it cannot be executed; And the row index cannot be empty, otherwise it will lose its meaning.
In the first usage, both row and column indexes are available:

print('选取【采集时间】整列数据:\n', df.loc[:, '采集时间'])  # loc用法

Insert image description here

print('选取前5行的【采集时间】:\n', df.loc[:4, '采集时间'])  # loc用法

Insert image description here
Note: If the row index is an interval, both the front and back are closed intervals. The ":4" above represents the row index [0:4], which are all closed intervals.

print('选取第3行的【采集时间】:\n', df.loc[2, '采集时间'])  # loc用法

Insert image description here
The second type only has row labels:
Note: If the row index is an interval, both the front and rear are closed intervals.

print('选取第一行', df.loc[0])

Insert image description here

print('选取第2行,第4行:\n', df.loc[[0, 3]])

Insert image description here

print('选取前3行:\n', df.loc[0:2])

Insert image description here
The third type is to pass in conditions:

print('选取【学历】是本科的数据:\n', df.loc[df['学历'] == '本科', ['学历', '所在地']])

Insert image description here
(2) iloc usage
Syntax: df.iloc [row index position, column index position]
The difference between iloc and loc is that iloc selects data based on position. Only integer data is accepted, such as df.iloc[1], df.iloc[1,2], df[:4,3], df[1,[1,2,5]]

print('选取【关键词】字段的前4行数据:\n', df.iloc[:4, 0]) # iloc用法

Insert image description here
Note: ":4" here means the row position [0,4), starting from 0, left closed and right open; "0" means the [keyword] field is in the first position.
Overall, loc is more flexible to use and the code is more readable.

5. ix data selection

The ix method can receive both the index name and the index position when used.
Syntax: df.ix [row index name or position or condition, column index name or position]
Note: When the index name and position partially overlap, ix identifies the name first by default.
The ix method has been removed after pandas 1.0.0 and replaced with the loc and iloc methods.
Insert image description here
The above are common uses of pandas data selection.

[Search [digit code] on WeChat to follow me]
-end-

Guess you like

Origin blog.csdn.net/LHJCSDNYL/article/details/124391206