Abbreviations and key package import
abbreviations:
df: arbitrary objects Pandas DataFrame
s: Pandas Series arbitrary objects
import the package:
import pandas as pd
import numpy as np
Import data
pd.read_csv (filename): import data from the CSV file
pd.read_table (filename): import data from a delimited text file defining
pd.read_excel (filename): import data from the Excel file
pd.read_sql (query, connection_object) : SQL table from / import data repository
pd.read_json (json_string): import data from a string JSON format
pd.read_html (url): parsing the URL, the HTML file or string, extracting tables with the table of
pd.read_clipboard (): get content from your clipboard, and passed read_table ()
pd.DataFrame (dict): import data from a dictionary object, Key is a column name, Value data
Export data
df.to_csv (filename): export data to a CSV file
df.to_excel (filename): export data to an Excel file
df.to_sql (table_name, connection_object): export data to SQL tables
df.to_json (filename): to Json format export data to a text file
Create test objects
pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
pd.Series (my_list): Create a Series object from iterables my_list
df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index
See, inspection data
df.head (n): front view object DataFrame n rows
df.tail (n): n View last row DataFrame object
df.shape (): the number of rows and columns view
df.info (): See index, data type and memory information
df.describe (): See summary statistics numeric columns
s.value_counts (dropna = False): See Series object unique value and counting
df.apply (pd.Series.value_counts): See and counting a unique value for each column in the object DataFrame
Data selection
df [col]: The column name, and returns it as Series column
df [[col1, col2]] : Returns the form of a plurality of columns in DataFrame
s.iloc [0]: Select data by location
s.loc [ 'index_one ']: select the index data in
df.iloc [0 ,:]: returning the first row
df.iloc [0,0]: returns the first element of the first column
Data cleaning
df.columns = [ 'a', ' b', 'c']: Rename column name
pd.isnull (): null DataFrame inspection object and returns a Boolean array
pd.notnull (): Check DataFrame object non-null value, and returns a Boolean array
df.dropna (): delete all rows contain null values
df.dropna (axis = 1): delete all columns contain null values
df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
df.duplicated (): Analyzing duplicate data records
df.drop_duplicates (): deleting data records may specify a particular column or all
df.fillna (x): x by replacing all null values DataFrame object
s.astype (float): to change the data type to float in Series
s.replace (1, 'one') : with a 'one' value equal to 1 instead of all
s.replace does ( [1,3], [ 'one' , 'three']): with a 'one' in place of 1, with 'three' in place. 3
df.rename (the lambda columns = X: X + 1): batch change a column name
df. rename (columns = { 'old_name' : 'new_ name'}): selectively change a column name
df.set_index ( 'column_one'): to change the index column
df.rename (index = lambda x: x + 1): batch rename index
Data processing: Filter, Sort and the GroupBy
DF [DF [col]> 0.5]: select col column values greater than 0.5
df.sort_values (col1): col1 sort the data in columns, in ascending order by default
df.sort_values (col2, ascending = False): col1 descending order according to List data
df.sort_values ([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data
df.groupby (col): returns one of objects grouped by column Groupby COL
df.groupby ([col1, col2]): returns an object grouping more than one column of Groupby
df.groupby (col1) [col2]: rETURN group by columns col1, col2 column mean
df.pivot_table (index = col1, values = [col2, col3], aggfunc = max): create a group by columns col1, col2 and col3 and calculates the maximum PivotTable
df.groupby (col1). agg (np.mean): returns the mean of all the columns by the column col1 packet
data.apply (np.mean): in each column of DataFrame application function np.mean
data.apply (np.max, Axis =. 1): for each row of the application function np.max DataFrame
The combined data
df1.append (df2): df2 adding rows to the tail df1
df.concat ([df1, df2], axis = 1): adding columns to df2 df1 tail
df1.join (df2, on = col1, how = 'inner '): execute SQL join to form columns of df1 and df2 column
Statistics
df.describe (): Summary view the data value of the column statistics
df.mean (): Returns the average of all the columns
df.corr (): Returns the correlation coefficient between the columns and columns
df.count (): returns each column the number of non-null values
df.max (): returns the maximum of each column
df.min (): returns the minimum value of each column
df.median (): returns the number of bits in each column
df.std () : returns the standard deviation of each column