numpy, pandas do data cleansing

<!doctype html>

numpy、pandas做数据清洗

numpy, pandas do data cleansing

Cleaning numpy, pandas null

There are two ways

  1. Delete rows of data where the null value
  2. The missing rows or columns to delete

Using data determination method:

isnull: to determine whether the data is empty, returns True if empty, otherwise it returns False

notnull: to determine whether the data is not empty, if empty returns False, otherwise True

any: Analyzing combination with isnull

dropna: delete data in the empty rows or columns of data, axis and other parameters indicate the contrary, the behavior 0, as 1

fillna: filling parameter data for operation method = 'ffill' representing fill forward, 'bfill' represents rearwardly filled

Examples of an embodiment:

method one:

from pandas import DataFrame,Series
df = DataFrame(data = np.random.randint(0,100,size=(7,5)))
#创建一个7行5列的二维数组
#随机取值从0到100,形式是7行5列的数组
#设置是三个空值
df.iloc[3,4] = None   #三行四列的值为空
df.iloc[2,2] = np.nan  #设置2行2列的值为NAN
df.iloc[5,3] = None    #设置5行3列的值为空
df   #panads会自动将None的空值转换成NaN
#清洗空值的两种方式
#方式一删除空所在的行数据
 #isnull、notnull、any、all
df.isnull()   #用于判断数组内的数据是否为空,如果为空放回True,否则返回False
df.isnull().all(axis=1)   #1表示行,0表示列     只有在drop中于此相反
#all是行或列中如果出现False就返回False,只有都是True才返回True
#any是行或者列中如果有一个为True,就返回True
df.isnull().any(axis=1)   #1是行,0是列
#将布尔值作为原数据的行索引:保留为True的行数据
df.loc[df.isnull().any(axis=1)]   #根据isnull()的判断将有空值的行数据保留
drop_index = df.loc[df.isnull().any(axis=1)].index  #提取出存在空值的行索引
df.drop(labels=drop_index,axis=0)  #删除所在的行

Method Two:

df.notnull().all(axis=1)    #notnull是判断不为空的返回True,否则返回False
#找出所有有空的行数据
#将布尔值作为行索引
df.loc[df.notnull().all(axis=1)]
#根据notnull的判断进行过滤出不为空的行数据

Examples of two ways:

#方式二:dropna:可以直接将缺失的行或者列进行删除
df.dropna(axis=0)   #在dropna中0表示行,1表示列

drop_duplications(keep=False)

Remove duplicate rows of data

keep=first

Reservations first row of data, delete other rows

keep=last

Retain the last row of data, delete the other duplicate data

Guess you like

Origin www.cnblogs.com/g15009428458/p/12638646.html