Quick introduction to pandas for Python big data (2)

3. DataFrame’s row and column labels and row and column position numbers

3.1 DataFrame row labels and column labels

1) If shown, they are the row labels and column labels of the DataFrame.

img

2) Get the row labels of the DataFrame

# 获取 DataFrame 的行标签
china.index

img

3) Get the column labels of the DataFrame

# 获取 DataFrame 的列标签
china.columns

img

4) Set the row labels of the DataFrame

# 注意:DataFrame设置行标签时,并不会改变原来的DataFrame,而是返回的副本
china_df = china.set_index('year')

img

3.2 Row position number and column position number of DataFrame

In addition to row and column labels, a DataFrame also has row and column position numbers.

Row position number: from top to bottom, the first row is numbered 0, the second row is numbered 1,..., the nth row is numbered n-1

Column position numbers: from left to right, the first column is numbered 0, the second column is numbered 1,..., the nth column is numbered n-1

img

Note: By default, row labels and row position numbers are the same.

4. DataFrame obtains data in specified rows and columns

The following examples all use the loaded gapminder.tsv data set to operate. Note that the year column is set as the row label.

img

4.1 The loc function obtains data in the specified row and column

Basic format :

grammar illustrate
df.loc[[行标签1, ...], [列标签1, ...]] Get the data of the corresponding column of the corresponding row based on the row label and column label. The result is: DataFrame
df.loc[[行标签1, ...]] The result of obtaining the data of all columns of the corresponding row according to the row label is: DataFrame
df.loc[:, [列标签1, ...]] The result of obtaining the data of the corresponding columns of all rows based on the column labels is: DataFrame
df.loc[行标签] 1) If the result has only one row, the result is: Series 2) If the result has multiple rows, the result is: DataFrame
df.loc[[行标签]] Whether the result is one row or multiple rows, the result is a DataFrame
df.loc[[行标签], 列标签] 1) If the result has only one column, the result is: Series, and the row label is used as the index label of the Series 2) If the result has multiple columns, the result is: DataFrame
df.loc[行标签, [列标签]] 1) If the result has only one row, the result is: Series, and the column label is used as the index label of the Series 2) If the result has multiple rows, the result is DataFrame
df.loc[行标签, 列标签] 1) If the result has only one row and one column, the result is a single value 2) If the result has multiple rows and one column, the result is: Series, and the row label is used as the index label of Series 3) If the result has one row and multiple columns, the result is: Series, and the column label is as Index label of Series 4) If the result has multiple rows and columns, the result is: DataFrame

Demo example :

示例1:获取行标签为 1952, 1962, 1972 行的 country、pop、gdpPercap 列的数据
示例2:获取行标签为 1952, 1962, 1972 行的所有列的数据
示例3:获取所有行的 country、pop、gdpPercap 列的数据
示例4:获取行标签为 1957 行的所有列的数据
示例5:获取行标签为 1957 行的 lifeExp 列的数据

Example implementation :

1) Example 1: Get the data of the country, pop, and gdpPercap columns whose row labels are 1952, 1962, and 1972

# 示例1:获取行标签为 1952, 1962, 1972 行的 country、pop、gdpPercap 列的数据
china_df.loc[[1952, 1962, 1972], ['country', 'pop', 'gdpPercap']]

img

2) Example 2: Get the data of all columns with row labels 1952, 1962, 1972

# 示例2:获取行标签为 1952, 1962, 1972 行的所有列的数据
china_df.loc[[1952, 1962, 1972]]

img

3) Example 3: Get the data of the country, pop, and gdpPercap columns of all rows

# 示例3:获取所有行的 country、pop、gdpPercap 列的数据
china_df.loc[:, ['country', 'pop', 'gdpPercap']]

img

4) Example 4: Get the data of all columns with row label 1957

# 示例4:获取行标签为 1957 行的所有列的数据
china_df.loc[1957]

img

# 示例4:获取行标签为 1957 行的所有列的数据
china_df.loc[[1957]]

img

5) Example 5: Get the data of the lifeExp column with the row label 1957

# 示例5:获取行标签为 1957 行的 lifeExp 列的数据
china_df.loc[[1957], 'lifeExp']
或
china_df.loc[1957, ['lifeExp']]
或
china_df.loc[1957, 'lifeExp']

img

4.2 The iloc function obtains data in the specified row and column

Basic format :

grammar illustrate
df.iloc[[行位置1, ...], [列位置1, ...]] Get the data of the corresponding column of the corresponding row based on the row position and column position. The result is: DataFrame
df.iloc[[行位置1, ...]] The result of obtaining the data of all columns of the corresponding row according to the row position is: DataFrame
df.iloc[:, [列位置1, ...]] The result of obtaining the data of the corresponding columns of all rows according to the column position is: DataFrame
df.iloc[行位置] The result is only one row, and the result is: Series
df.iloc[[行位置]] The result is only one row, and the result is: DataFrame
df.iloc[[行位置], 列位置] The result has only one row and one column. The result is: Series. The row label is used as the index label of the Series.
df.iloc[行位置, [行位置]] The result has only one row and one column. The result is: Series. The column label is used as the index label of the Series.
df.iloc[行位置, 行位置] The result has only one row and one column, and the result is a single value

Demo example :

示例1:获取行位置为 0, 2, 4 行的 0、1、2 列的数据
示例2:获取行位置为 0, 2, 4 行的所有列的数据
示例3:获取所有行的列位置为 0、1、2 列的数据
示例4:获取行位置为 1 行的所有列的数据
示例5:获取行位置为 1 行的列位置为 2 列的数据

Example implementation :

1) Example 1: Get the data of columns 0, 1, and 2 in rows 0, 2, and 4

# 示例1:获取行位置为 0, 2, 4 行的 0、1、2 列的数据
china_df.iloc[[0, 2, 4], [0, 1, 2]]

img

2) Example 2: Get the data of all columns in rows 0, 2, and 4

# 示例2:获取行位置为 0, 2, 4 行的所有列的数据
china_df.iloc[[0, 2, 4]]

img

3) Example 3: Get the data of all rows whose column positions are 0, 1, and 2

# 示例3:获取所有行的列位置为 0、1、2 列的数据
china_df.iloc[:, [0, 1, 2]]

img

4) Example 4: Get the data of all columns in row position 1

# 示例4:获取行位置为 1 行的所有列的数据
china_df.iloc[1]

img

# 示例4:获取行位置为 1 行的所有列的数据
china_df.iloc[[1]]

img

5) Example 5: Get the data with row position 1 and column position 2

# 示例5:获取行位置为 1 行的列位置为 2 列的数据
china_df.iloc[[1], 2]
或
china_df.iloc[1, [2]]
或
china_df.iloc[1, 2]

img

4.3 Slicing operations of loc and iloc

Basic format :

grammar illustrate
df.loc[起始行标签:结束行标签, 起始列标签:结束列标签] Obtain the data of the corresponding column of the corresponding row according to the range of the row and column labels, including the starting row and column labels and the ending row and column labels.
df.iloc[起始行位置:结束行位置, 起始列位置:结束列位置] Obtain the data of the corresponding column of the corresponding row according to the position of the row and column labels, including the starting row and column position, but not including the ending row and column position.

Demo example :

示例1:获取 china_df 中前三行的前三列的数据,分别使用上面介绍的loc和iloc实现

img

Example implementation :

1) Example 1: Get the data of the first three rows and the first three columns of china_df, using the loc and iloc introduced above respectively.

# 示例1:获取 china_df 中前三行的前三列的数据,分别使用上面介绍的loc和iloc实现
china_df.loc[1952:1962, 'country':'lifeExp']
或
china_df.iloc[0:3, 0:3]

img

4.4 [] syntax to obtain data in specified rows and columns

Basic format :

grammar illustrate
df[['列标签1', '列标签2', ...]] Get the data of the corresponding columns of all rows based on the column labels. The result is: DataFrame
df['列标签'] Get the data of the corresponding columns of all rows based on the column labels 1) If the result has only one column, the result is: Series, and the row label is used as the index label of the Series 2) If the result has multiple columns, the result is: DataFrame
df[['列标签']] Get the data of the corresponding columns of all rows based on the column labels. The result is: DataFrame
df[起始行位置:结束行位置] Get the data of all columns of the corresponding row according to the specified range, excluding the end row position

Demo example :

示例1:获取所有行的 country、pop、gdpPercap 列的数据
示例2:获取所有行的 pop 列的数据
示例3:获取前三行的数据
示例4:从第一行开始,每隔一行获取一行数据,一共获取3行

Example implementation :

1) Example 1: Get the data of the country, pop, and gdpPercap columns of all rows

# 示例1:获取所有行的 country、pop、gdpPercap 列的数据
china_df[['country', 'pop', 'gdpPercap']]

img

2) Example 2: Get the pop column data of all rows

# 示例2:获取所有行的 pop 列的数据
china_df['pop']

img

# 示例2:获取所有行的 pop 列的数据
china_df[['pop']]

img

3) Example 3: Get the data of the first three rows

# 示例3:获取前三行的数据
china_df[0:3]

img

4) Example 4: Starting from the first row, get one row of data for every other row, and get 3 rows in total.

# 示例4:从第一行开始,每隔一行获取一行数据,一共获取3行
china_df[0:6:2]

img

Summarize

  • Able to know DataFrame and Series data structures
  • Ability to load csv and tsv data sets
  • Ability to distinguish row and column labels and row and column position numbers of DataFrame
  • Ability to obtain data in specified rows and columns of DataFrame
    • loc
    • iloc
    • Slicing operations of loc and iloc
    • []

Guess you like

Origin blog.csdn.net/xianyu120/article/details/133300081