pandas data analysis reading notes (1)

Series object, how to generate Series object.

Datadf object, how to generate Datadf

df.Head() function, df.tail() function, df.loc() function (get data by index position)

Del df['eastern'], the del function is used to delete a column.

df.T, transpose function

df.values, returns the data of Datadf

df.index, returns the index of Datadf

df.reindex(), used to modify the index

df.drop(axis = 0), used to delete a row or a column, the default parameter is axis = 0, the default delete row, when axis=1, delete the column, or axis ='columns' is also possible. This function returns a deleted object and does not modify the original data. When the parameter inplace = True is passed in, the object is modified in place, and no new object is returned.

df.loc(), use index tag to get data

df.iloc(), get data through integer index

Np.abs(df), you can use numpy functions to manipulate pandas objects

df.apply(f), function f, acts on each column of df

df.applymap(f), function f, acts on every element of df

df.sort_index(axis = 0, ascending = True), sort the index on a certain axis, the default is axis = 0, that is, the row miniature, and can be set to axis = 1, that is, to sort the column index. Ascending is True by default, which means sorting in ascending order, or you can set ascending = False, which means sorting in descending order.

df.sort_values(), to sort the values, the missing values ​​will be placed at the end of the Series, the by parameter is to sort according to the values ​​in one or more columns, if you want to sort according to multiple columns, you need to pass in the list

df.index.is_unique, this is an attribute, you can see whether the value of the index is unique

df.sum(axis = 0), this method returns a Series containing the sum of the columns, the default is axis = 0, calculate the sum of each column, you can also modify the parameter to axis = 1, and the sum operation will be performed according to the row

df.mean(axis = 0', skipna =True), returns a Series containing the average value of the column, skipping null values.

df.idmax(), returns the index of the maximum value of each column

df.idmin(), returns the index of the minimum value of each column

df.cumsum(), the cumulative sum of each column

df.describe(), generate multiple summary statistics at once, including total, average, minimum, maximum, quantile, etc.

There are also: count(), max(), min(), argmax()/returns the integer index of the maximum value, argmin()/returns the integer index of the minimum value, quantile/calculates the quantile 0 to 1, sum, mean, median, mad, var, std, skew, kuit, cumsum, cummin, cummax/cumulative maximum and minimum of sample value, cumulative product of cumprod, pct_change calculate percentage change (calculate stock return)

Ser1.corr(Ser2), to calculate the correlation system of two Series overlapping, non-NA, aligned by index

Df.corr(), returns the related system matrix of this dataframe

Df.cov(), returns the covariance matrix of this dataframe

Df.corrwith(), when a Series is passed in, the correlation coefficients between all the columns of the dataframe and the Series will be calculated. When a DataFrame is passed in, the correlation coefficients will be matched according to the column names and then the correlation coefficients will be calculated.

Ser.unique(), returns an array of unique values ​​in the Series

Ser.value_counts(), used to calculate the frequency of each value in a Series

pd.value_counts(Ser.values), this value_counts is still a top-level method

Ser.isin(), used to determine whether the value in the Series is in a list

Guess you like

Origin blog.csdn.net/u012724887/article/details/107025181