[Turn] pandas quick start

table of Contents

pandas series and dataFrame

pandas select data

pandas setting value

Pandas handles missing data NaN

pandas import and export

pandas data merge

pandas combined with plot drawing

END

pandas series and dataFrame

The relationship between pandas and numpy: numpy is a list, pandas is a dictionary, and pandas is built on numpy.

The form of Series: the index is on the left and the value is on the right. If no index is specified for the data, an integer index from 0 to N-1 (N is the length) is automatically created.

DataFrame is a tabular data structure, each column can be a different value type, both row index and column index.

Take the "indicator 1" column of df: df['indicator 1']

Create a set of data without a given row label and column label: pd.DataFrame(np.arange(12).reshape((3,4)))

Create data with special treatment for each column of data:

View data type: df.dtypes

View the serial number of the index (row): df.index

View the name of each data (column): df.columns

View all values of df: df.values

View the summary of the data (count, average, maximum value, etc.): df.describe()

Flip data: df.T

Sort the index of the data : df.sort_index(axis=1, ascending=False)

Sort the values of the data : df.sort_values(by='A')

pandas select data

Select a column A: df['A'] or df.A

Choose from 0 to 2 more columns: df[0:2]

Select 1 row according to the label : df.loc[0] [Select the row with index 0]

Select all rows (: means all rows): df.loc[:,['A','B']] [Select columns A and B of all rows]

df.loc[3,['A','B']] [Select two columns A and B in row 3]

Select data according to location (index) : df.iloc[3,1] [data in row 3 and column 1]

df.iloc[2:3,0:3] [Data in rows 2 to 3, columns 0 to 3]

df.iloc[[0,3],0:3] [Data in rows 0 and 3, columns 0 to 3]

Filter by judgment: df[df.A==2] [Select the row whose column A is 2]

pandas setting value

Use index: df.iloc[2,2] = 1111 [Modify the second row, the second column]

Use label: df.loc['20190401','B'] = 2222 [Modify row '20190401', column'B']

According to the condition: df.B[df.A>4] = 0 [If column A is greater than 4, change it to 0]

By row or column: df['F'] = np.nan [Add a new column F, and set the value to NaN]

               df_original['title_keywords'] = ['' for _ in range(content_num)]

Add data: df['G'] = pd.Series([1,2,3,4,5,6], index=pd.date_range('20190401',periods=6))

Pandas handles missing data NaN

Directly remove rows or columns with NaN (pd.dropna()):

Replace the value of NaN with other values (pd.fillna()): df.fillna(value=0) [Replace all with 0]

Determine whether each value is missing data: df.isnull()

Check whether there is NaN in the entire data table, and return True if it exists: np.any(df.isnull()) == True

pandas import and export

Formats that pandas can read and access: csv, excel, json, html, pickle, etc. [ official document ]

Read csv: data = pd.read_csv('student.csv')

Access data into pickle: data.to_pickle('student.pickle')

pandas data merge

Concat vertical merge: res = pd.concat([df1, df2, df3], axis=0) [It can be found that the index has not changed]

Ignore_index (reset index) of concat parameter: res = pd.concat([df1, df2, df3], axis=0, ignore_index=True )

The join of the concat parameter (combination method, default join='outer'): res = pd.concat([df1, df2], axis=0, join='outer') [Vertical merge according to column, with the same column The top and bottom are merged together, and the other independent columns form their own columns, and the original positions without values are filled with NaN. 】

The join of the concat parameter (the merge method is changed to join='inner'): res = pd.concat([df1, df2], axis=0, join='inner') [Only the same columns are merged together, the others Will be abandoned]

The join_axes of the concat parameter: [join according to the row index of df1, the following figure 2 removes join_axes]

Append has only vertical merging, not horizontal merging:

df1.append(df2, ignore_index=True) [Merge df2 under df1, reset index]

df1.append([df2, df3], ignore_index=True) [Merge multiple dfs, merge df2 and df3 under df1, reset index]

df1.append(s1, ignore_index=True) [Merge series, merge s1 to df1, reset index]

Merge is used for two groups of data with key column [Sample data is as follows]

Combine key1 and key2 columns and print out four results ['left','right','outer','inner']:

The indicator=True of the merge parameter will put the merged record in a new column

The indicator=str of the merge parameter, custom column name

根据index合并：res = pd.merge(left, right, left_index=True, right_index=True, how='outer')

The suffixes of the merge parameter solves the problem of overlapping (overlapping) [the name K0 has a 1-year-old male, a 4-year-old female, and a 5-year-old female]