Second, the use of the method of deleting a row
1. Delete the row where all the rows are 0
foreword
To get a bunch of data, first of all, we need to preprocess the data, some of which are empty or data we don't want, delete it or modify the data value. Here are the deletions and modifications for this example:
>>> df
out[]:
salary age gender
0 10000 23 男
1 15000 34 女
2 23000 21 男
3 0 20 女
4 28500 0 男
5 35000 37 男
1. Data processing
1. df.replace() method: use 1 to represent "male" and 0 to represent "girl".
>>> df.replace(["男", "女"], [1, 0])
out[]:
salary age gender
0 10000 23 1
1 15000 34 0
2 23000 21 1
3 0 20 0
4 28500 0 1
5 35000 37 1
2. The pd.DataFrame.loc() method to specify the row whose data is 0 in the column:
>>> df = df.loc[~((df['salary'] == 0) | (df['age'] == 0))]
>>> df
out[]:
salary age gender
0 10000 23 1
1 15000 34 0
2 23000 21 1
3 35000 37 1
You can also use:
df = df.loc[df['salary'] * df['age'] != 0]
Second, the use of the method of deleting a row
1. Delete the row where all the rows are 0
code show as below:
>>> df.loc[~(df==0).all(axis=1)]
It looks more symmetrical and can be written like this:
>>> df.loc[(df!=0).any(axis=1)]
Use the dropna method to delete:
>>> new_df = df[df.loc[:]!=0].dropna()
2. Replace zeros with nan, then delete all rows where the data is nan. After that, replace nan with zero.
code show as below:
import numpy as np
df = df.replace(0, np.nan)# 把0替换成nan
df = df.dropna(how='all', axis=0)# 删除所有为nan的行
df = df.replace(np.nan, 0)# 再把nan替换成0
3. Delete a row with a value of 0 in a row
The code is as follows: |
>>> df= df[df['salary'] != 0]
4. Use lambda function to delete rows
code show as below:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))
df.loc[['one', 'three']] = 0 # 把第一行和第三行改为0
print(df)
print(df.loc[~df.apply(lambda row: (row==0).all(), axis=1)])
The output is:
To drop all columns with a value of 0 in any row:
new_df = df[df.loc[:]!=0].dropna()
new_df
The output is: