Python data analysis - delete the row with value=0


foreword

To get a bunch of data, first of all, we need to preprocess the data, some of which are empty or data we don't want, delete it or modify the data value. Here are the deletions and modifications for this example:
 

>>> df
out[]:
   salary   age   gender
0   10000    23     男
1   15000    34     女
2   23000    21     男
3     0      20     女
4   28500     0     男
5   35000    37     男

1. Data processing

1. df.replace() method: use 1 to represent "male" and 0 to represent "girl".

>>> df.replace(["男", "女"], [1, 0])
out[]:
   salary   age   gender
0   10000    23     1
1   15000    34     0
2   23000    21     1
3     0      20     0
4   28500     0     1
5   35000    37     1

2. The pd.DataFrame.loc() method to specify the row whose data is 0 in the column:

>>> df = df.loc[~((df['salary'] == 0) | (df['age'] == 0))]
>>> df
out[]:
   salary   age   gender
0   10000    23     1
1   15000    34     0
2   23000    21     1
3   35000    37     1

You can also use: 

df = df.loc[df['salary'] * df['age'] != 0]

Second, the use of the method of deleting a row

1. Delete the row where all the rows are 0

code show as below:

>>> df.loc[~(df==0).all(axis=1)]

 It looks more symmetrical and can be written like this:

>>> df.loc[(df!=0).any(axis=1)]

Use the dropna method to delete:

>>> new_df = df[df.loc[:]!=0].dropna()

2. Replace zeros with nan, then delete all rows where the data is nan. After that, replace nan with zero.

code show as below:

import numpy as np
df = df.replace(0, np.nan)# 把0替换成nan
df = df.dropna(how='all', axis=0)# 删除所有为nan的行
df = df.replace(np.nan, 0)# 再把nan替换成0

 3. Delete a row with a value of 0 in a row

The code is as follows: |

>>> df= df[df['salary'] != 0]

4. Use lambda function to delete rows

code show as below:
 

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3),
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0 # 把第一行和第三行改为0

print(df)
print(df.loc[~df.apply(lambda row: (row==0).all(), axis=1)])

The output is:
 

To drop all columns with a value of 0 in any row: 

new_df = df[df.loc[:]!=0].dropna()
new_df

 The output is:

Guess you like

Origin blog.csdn.net/ex_6450/article/details/126867123