Common operation command Encyclopedia pandas

Online individual does not actually have to tap to add a point common
environment IDE anaconda python3.7


In this Quick Reference Guide, we use the following abbreviations:

df: Object arbitrary Pandas DataFrame

s: arbitrary objects Pandas Series

At the same time we need to do the following introduction:

import pandas as pd


import numpy as np

Import Data

  • pd.read_csv (filename): import data from a CSV file
  • pd.read_table (filename): import data from a delimited text file defining
  • pd.read_excel (filename): import data from the Excel file
  • pd.read_sql (query, connection_object): import data from SQL table / database
  • pd.read_json (json_string): import data from JSON-formatted string
  • pd.read_html (url): parsing the URL, the HTML file or string, wherein the extracted tables form
  • pd.read_clipboard (): Gets content from your clipboard, and passed read_table ()
  • pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data

export data

  • df.to_csv (filename): Export data to CSV file
  • df.to_excel (filename): export data to an Excel file
  • df.to_sql (table_name, connection_object): export data to SQL tables
  • df.to_json (filename): Json format to export data to a text file

Create a test object

  • pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
  • pd.Series (my_list): Creating a Series object from iterables my_list
  • df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index

Check, check data

  • df.head (n): View DataFrame object the first n rows
  • df.tail (n): View DataFrame object last n lines
  • () df.shape: View the number of rows and columns  # Windows error parentheses
  • df.info () : View the index, data type and memory information
  • df.columns View column
  • df.index View Index
  • df.describe () to view summary statistics for numeric columns will be digital statistics show the total number of the largest minimum difference
  • s.value_counts (dropna = False): See Series unique object count value and
  • df.apply (pd.Series.value_counts): See DataFrame object unique values ​​and each column count

Data Selector

  • df [col]: The column name, and returns a column in the form of Series
  • df [[col1, col2]]: Returns the form of a plurality of columns in DataFrame
  • s.iloc [0]: Select data by location  support indexing, sliced
  • s.loc [ 'index_one']: Select the data according to the index   did not understand what the hell is this
  • df.iloc [0 ,:]: returning the first row  colon indicates start to finish, can specify the length of a slice
  • df.iloc [0,0]: Returns the first element of the first column
  • df.iloc [:, 0]: Returns the first column  

Data Cleansing

  • df.columns = [ 'a', 'b', 'c']: Rename column name
  • pd.isnull () any ():. DataFrame null inspection object and returns a Boolean array
  • pd.notnull () any ():. DataFrame non-null values ​​inspection object and returns a Boolean array
  • pd [pd.notnull () == True] filter all null
  • pd [pd. Column name .notnull () == True] This filtration column worth of data is empty
  • df.dropna (): delete all the rows contain null values
  • df.dropna (axis = 1): Delete all columns contain null values
  • df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
  • df.fillna (x): Replace all null values ​​DataFrame object by x
  • s.astype (float): Change the data type to float in Series
  • s.replace (1, 'one') : with a 'one' is equal to 1 instead of the value of all the  tests in the alternative floating-point type int int entire column into
  • s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3
  • df.rename (columns = lambda x: x + 1): Mass Change column names
  • df.rename (columns = { 'old_name': 'new_ name'}): selectively change a column name
  • df.set_index ( 'column_one'): Change the index column
  • df.rename (index = lambda x: x + 1): Batch rename index

Data processing: Filter, Sort and GroupBy

  • df [df [col]> 0.5]: select col column values ​​greater than 0.5
  • df.sort_values ​​(col1): col1 sort the data in columns, in ascending order by default
  • df.sort_values ​​(col2, ascending = False): in descending order according to the data column col1
  • df.sort_values ​​([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data
  • df.groupby (col):. returns one of Groupby objects grouped by column col   is really a return address formation
  • df.groupby ([col1, col2]): Returns by a plurality of objects are grouped Groupby column.   
  • df.groupby (col1) [col2]: RETURN group by columns col1, col2 the column means.    or return address
  • PivotTable maximum value to create a group by columns col1, col2 and col3 and calculates a: df.pivot_table (index = col1, values ​​= [col2, col3], aggfunc = max)
    • customer_data.pivot_table (index = 'REFER', values = 'Age', aggfunc = [max, min])  . Minimum and maximum for each channel
  • df.groupby (col1) .agg (np.mean): Returns the mean of all the columns grouped by columns col1
    • Often used to display the average value of each channel, channel by channel in each of the age of the average value (not the entire data of the maximum and minimum )        
  • data.apply (np.mean): in each column of DataFrame application function np.mean
  • data.apply (np.max, axis = 1): for each row of the applied function of np.max DataFrame

Data processing: Add a new column

  • According to the results of the current process will result columns were added to the new / increased a
    • frame['test'] = frame.apply(lamubda x: function(x.city, x.year), axis = 1)
    • function is a function of the preparation

Data Merge

  • df1.append (df2): df2 added to the end of the row of df1
  • df.concat ([df1, df2], axis = 1): adding df2 columns to the end of df1
  • df1.join (df2, on = col1, how = 'inner'): join SQL execution forms of the columns df1 and df2 column

Statistics

    • df.describe (): see the data value of the column summary statistics
    • df.mean (): Returns the mean of all the columns
    • df.corr (): Returns the correlation coefficient between the rows and columns
    • df.count (): returns the number of non-null values ​​of each column
    • df.max (): returns the maximum of each column
    • df.min (): returns the minimum value of each column
    • df.median (): Returns the median of each column
    • df.std (): returns the standard deviation of each column Dispersion
      • Larger value indicates more dispersion data

Guess you like

Origin www.cnblogs.com/valorchang/p/11387005.html