Summary of common pandas operations

Record some common operations for daily use

df.value_counts() statistics table category number

value_counts(values,sort=True, ascending=False, normalize=False,bins=None,dropna=True)
##
sort=True: 是否要进行排序;默认进行排序
ascending=False: 默认降序排列;
normalize=False: 是否要对计算结果进行标准化并显示标准化后的结果,默认是False。
bins=None: 可以自定义分组区间,默认是否;
dropna=True:是否删除缺失值nan,默认删除

df.head( n ) is used to read the previous n lines, if the parameter n is not filled, 5 lines will be returned by default

df.tail( n ) is used to read the n lines at the end, if the parameter n is not filled, 5 lines will be returned by default

df.info() returns some basic information about the table, showing how many non-null values ​​each column has

df.describe() View counts, mean, min, max, quartiles, etc.

df.dropna(axis=0) deletes rows (axis=0) or columns (axis=1) with vacant values

df.isnull() shows which value is the missing value

# df['列名'].isnull().sum(axis=0)  #统计某一列 
# df.isnull().sum(axis=0) #统计所有列,返回每列的情况
# df.isnull().sum().sum() #统计所有列,返回一个总数

df.query("x1<10 & x2<20") for data filtering

pd.to_datetime(arg, errors='raise', utc=None, format=None, unit=None) converts the given data into a date format according to the specified format

df.columns output table column name

data.interpolate() linear interpolation function

data.interpolate(method='spline', order=3) cubic spline interpolation method

#缺失值处理
# new_data = data.dropna()                       # 1--删除存在缺失值的行
# new_data = data.dropna(subset=['C1','Chla'])   # 2--删除指定列存在缺失值的行
# new_data = data.dropna(thresh=15)              # 3--删除行属性值不足k个的行(即删除缺失元素比较多的行-->n-15)
# data = data.fillna(method='ffill')             # ffill---前填充;bfill--后填充
# data['C1'] = data['C1'].fillna(data['C1'].mean())  # 均值填充:.mean()--->.median()--->.mode()

subset = df[::rate] downsampling at a constant rate rate, rate custom

Multi-condition joint screening

In [44]: df[(df['B'] > 0) & (df['3'] > 0)]
Out[44]:
   A         B         3         4         5         6
C  2  0.716184  0.318086 -0.540593 -2.408134  0.526977
In [45]: df.loc[:, (df.loc['C'] > 0) | (df.loc['5'] > 0)]
Out[45]:
   A         B         3         6
1  0 -1.160599 -1.236521  1.090875
2  1 -0.285893  0.918261 -0.007042
C  2  0.716184  0.318086  0.526977
D  3  0.718741 -1.420655 -1.608134
5  4 -0.541419 -1.195019  1.078212
6  5  0.454283 -0.763443  0.182736

filter rows by column

In [38]: df[df['B'] > 0]
Out[38]:
   A         B         3         4         5         6
C  2  0.716184  0.318086 -0.540593 -2.408134  0.526977
D  3  0.718741 -1.420655 -0.182436  0.333909 -1.608134
6  5  0.454283 -0.763443  0.065712 -0.336119  0.182736

filter columns by row

In [39]: df.loc[:, df.loc['C'] > 0]
Out[39]:
   A         B         3         6
1  0 -1.160599 -1.236521  1.090875
2  1 -0.285893  0.918261 -0.007042
C  2  0.716184  0.318086  0.526977
D  3  0.718741 -1.420655 -1.608134
5  4 -0.541419 -1.195019  1.078212
6  5  0.454283 -0.763443  0.182736

Guess you like

Origin blog.csdn.net/qq_44391957/article/details/123019619