Panda Dataframe Basics

Pandas DataFrame basics

1.1 Introduction

Pandas is an open source Python library for data analysis. There are two data implementations: Dataframe and Series format.

Dataframe represents an entire spreadsheet or rectangle of data.

Series represents a single column, specifically a subset of Dataframe, representing one of its columns.

1.2 Load the data set

guide library

import pandas
import pandas as pd

Example

df = pandas.read_csv(r'../file_name',sep='\t')
print(df.head())
Commonly used attributes and functions: 
type() #Built-in function to view the data type of a variable Example: type(df) 
df.shape #Get the number of rows and columns 
df.colums #Get the column name 
df.dtypes #Get the type of each column 
df. info() #Get more data information

1.3 View columns, rows and cells

1.3.1 Column subsets

#Get a single column 
columns_1 = df['colums_name'] 
#Get multiple columns 
columns_name = df[['colums_1','colums_2','colums_3']] 
#Use
the function to view the obtained columns 
colums_1.head() 
colums_1.tail( ) 
​colums_name.head
() 
colums_name.tail()

1.3.2 Row subsets

#Two methods loc and iloc 
#loc gets the row subset (row name, time series) based on the index label. The following example is the case where the row name is equal to the row number. 
#iloc gets the row subset (row number) based on the row index 
​#
Get the first row and start counting from 0 
df.loc[0] 
#Get the last row and return the Series type 
df_row_index = df.shape[0] - 1 
df.loc[df_row_index] 
#Function tail returns the last row and returns Daraframe data type 
df.tail(n=1) 
#loc function cannot enter an unknown tag name, such as -1, an error will be reported.

#loc selects multiple rows 
#Selected 2,12,112,1112 rows 
df.loc[[1,11,111,1111]]

#iloc Gets the positive index and negative index of a single row 
#Gets the 2nd row 
df.iloc[1] 
#Gets the 100th row 
df.iloc[99] 
#Gets the last row 
df.iloc[-1] 
#Gets
multiple rows 
df.iloc [1,11,111,1111]

1.3.3 Mixed acquisition of row and column subsets

The general syntax for loc and iloc is to use square brackets with commas. The left side of the comma is the row value of the row subset to be fetched, and the right side is the column value of the column subset to be fetched, that is, df.loc[[row],[column]],

df.iloc[[row],[column]].

#Keep in mind the difference between loc and iloc 
df.loc['row name', 'column name'] 
df.loc[[row name, row name],['column name 1', 'column name 2']] 
df
. iloc[[line number],[column number]] 
df.iloc[[line number 1,line number 2],['column number 1','column number 2']]

1.4 Grouping and aggregation calculations

1.4.1 Grouping method

#groupby() function 
df.groupby('condition')['displayed column']. Aggregation function 
Example: df.groupby('year')['age'].mean() displays the average age of each year 
# Multi-condition grouping 
Example: df.groupby(['year','continent'])[['age','gdp']].mean() displays the average age and average GDP of each country in each 
year
The paving function reset_index() is beautiful but loses the sense of layering

1.4.2 Group frequency calculation

1.5 Basic Drawing

The garbled characters are because the Chinese default display is not set.

Guess you like

Origin blog.csdn.net/date3_3_1kbaicai/article/details/134401188