Python11-pandas finishing 03

data preprocessing pandas

 

- dirty data

  • Null handling
  • Value processing is repeated
  • Outliers
  • Data type conversion 

- structural problems

  • Index Settings

*****************************************************************************************************************

  • Null handling

                * View df.isnull ()

                * Remove df.dropna () default to delete contains a null value of this line and then deleted if this line are all null, set df.dropna (how = "all")

                * Filling df.fillna () df.fillna ({ "Gender": "M", "age": "30"}) - multiple columns packed with different values

 

  • Value processing is repeated

                * Remove df.drop_duplicates () default line keep the first occurrence of retained = "last" reserved the last occurrence of keep = Flase delete all duplicates

 

  • Outlier detection and treatment

                * Detection: Compared to the normal data over / under data. (Designated over the normal range, points outside the vertical edge of the box in FIG., The value of the normal distribution deviation exceeds 3σ)

                * Processing: Delete, filling, when the filter --Python special value in research, Replace (), etc.

 

  • Data type conversion

                .dtype () to view the data type .astype ( "float64") converting data types


 

  • Index Settings

                * Add index: df.index = [1,2,3,4,5]

                * Reset Index: df.set_index ( "Order Number") - with order number as a new index level Index --set_index () was passed over two / parameters

                * Renaming indexes: df.rename (index = {1: "a", 2: "two", 3: "three"}, columns = { "Order Number": "New Order ID"})

                * Reset Index: For hierarchical index, the index into the column default all converted df.reset_index ()   

                                      (Level = 0) df.reset_index - the index into 0th columns df.reset_index (level = 1) - the index of level 1 into columns 

Published 56 original articles · won praise 0 · Views 771

Guess you like

Origin blog.csdn.net/xiuxiuxiu666/article/details/104317098