pandas common functions

Import Data

  • pd.read_csv (filename): import data from a CSV file
  • pd.read_table (filename): import data from a delimited text file defining
  • pd.read_excel (filename): import data from the Excel file
  • pd.read_sql (query, connection_object): import data from SQL table / database
  • pd.read_json (json_string): import data from JSON-formatted string
  • pd.read_html (url): parsing the URL, the HTML file or string, wherein the extracted tables form
  • pd.read_clipboard (): Gets content from your clipboard, and passed read_table ()
  • pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data

export data

  • df.to_csv (filename): Export data to CSV file
  • df.to_excel (filename): export data to an Excel file
  • df.to_sql (table_name, connection_object): export data to SQL tables
  • df.to_json (filename): Json format to export data to a text file

Create a test object

  • pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
  • pd.Series (my_list): Creating a Series object from iterables my_list
  • df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index

Check, check data

Data Selector

  • df [col]: The column name, and returns a column in the form of Series
  • df [[col1, col2]]: Returns the form of a plurality of columns in DataFrame
  • s.iloc [0]: Select data by location
  • s.loc [ 'index_one']: Press the Select Data Index
  • df.iloc [0 ,:]: returning the first row
  • df.iloc [0,0]: Returns the first element of the first column

Data Cleansing

  • df.coumns = [ 'a', 'b', 'c']: Rename column name
  • pd.isnull (): null DataFrame inspection object and returns a Boolean array
  • pd.notnull (): non-null values ​​DataFrame inspection object and returns a Boolean array
  • df.dropna (): delete all the rows contain null values
  • df.dropna (axis = 1): Delete all columns contain null values
  • df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
  • df.fillna (x): Replace all null values ​​DataFrame object by x
  • s.astype (float): Change the data type to float in Series
  • s.replace (1, 'one'): with a 'one' in place of a value equal to 1 for all
  • s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3
  • df.rename (columns = lambda x: x + 1): Mass Change column names
  • df.rename (Columns = { 'Old name': 'new new name'}): selectively change a column name
  • df.set_index ( 'column_one'): Change the index column
  • df.rename (index = lambda x: x + 1): Batch rename index

Data processing: Filter, Sort and GroupBy

  • df [df [col]> 0.5]: select col column values ​​greater than 0.5
  • df.sort_values ​​(col1): col1 sort the data in columns, in ascending order by default
  • df.sort_values ​​(col2, ascending = False): in descending order according to the data column col1
  • df.sort_values ​​([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data
  • df.groupby (col): Returns one of Groupby objects grouped by the column col
  • df.groupby ([col1, col2]): Returns by a plurality of objects are grouped column Groupby
  • df.groupby (col1) [col2]: RETURN group by columns col1, col2 of the column means
  • PivotTable maximum value to create a group by columns col1, col2 and col3 and calculates a: df.pivot_table (index = col1, values ​​= [col2, col3], aggfunc = max)
  • df.groupby (col1) .agg (np.mean): Returns the mean of all the columns grouped by columns col1
  • data.apply (np.mean): in each column of DataFrame application function np.mean
  • data.apply (np.max, axis = 1): for each row of the applied function of np.max DataFrame

Data Merge

  • df1.append (df2): df2 added to the end of the row of df1
  • df.concat ([df1, df2], axis = 1): adding df2 columns to the end of df1
  • df1.join (df2, on = col1, how = 'inner'): join SQL execution forms of the columns df1 and df2 column

Statistics

  • df.describe (): see the data value of the column summary statistics
  • df.mean (): Returns the mean of all the columns
  • df.corr (): Returns the correlation coefficient between the rows and columns
  • df.count (): returns the number of non-null values ​​of each column
  • df.max (): returns the maximum of each column
  • df.min (): returns the minimum value of each column
  • df.median (): Returns the median of each column
  • df.std (): returns the standard deviation of each column

Guess you like

Origin blog.51cto.com/13132323/2447986