python Pandas API documentation

Abbreviations and key package import 
abbreviations:

df: arbitrary objects Pandas DataFrame 
s: Pandas Series arbitrary objects 
import the package:

import pandas as pd 
import numpy as np

Import data 
pd.read_csv (filename): import data from the CSV file 
pd.read_table (filename): import data from a delimited text file defining 
pd.read_excel (filename): import data from the Excel file 
pd.read_sql (query, connection_object) : SQL table from / import data repository 
pd.read_json (json_string): import data from a string JSON format 
pd.read_html (url): parsing the URL, the HTML file or string, extracting tables with the table of 
pd.read_clipboard (): get content from your clipboard, and passed read_table () 
pd.DataFrame (dict): import data from a dictionary object, Key is a column name, Value data

Export data 
df.to_csv (filename): export data to a CSV file 
df.to_excel (filename): export data to an Excel file 
df.to_sql (table_name, connection_object): export data to SQL tables 
df.to_json (filename): to Json format export data to a text file

Create test objects 
pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5 
pd.Series (my_list): Create a Series object from iterables my_list 
df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index

See, inspection data 
df.head (n): front view object DataFrame n rows 
df.tail (n): n View last row DataFrame object 
df.shape (): the number of rows and columns view 
df.info (): See index, data type and memory information 
df.describe (): See summary statistics numeric columns 
s.value_counts (dropna = False): See Series object unique value and counting 
df.apply (pd.Series.value_counts): See and counting a unique value for each column in the object DataFrame

Data selection 
df [col]: The column name, and returns it as Series column 
df [[col1, col2]] : Returns the form of a plurality of columns in DataFrame 
s.iloc [0]: Select data by location 
s.loc [ 'index_one ']: select the index data in 
df.iloc [0 ,:]: returning the first row 
df.iloc [0,0]: returns the first element of the first column

Data cleaning 
df.columns = [ 'a', ' b', 'c']: Rename column name 
pd.isnull (): null DataFrame inspection object and returns a Boolean array 
pd.notnull (): Check DataFrame object non-null value, and returns a Boolean array 
df.dropna (): delete all rows contain null values 
df.dropna (axis = 1): delete all columns contain null values 
df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows 
df.duplicated (): Analyzing duplicate data records 
df.drop_duplicates (): deleting data records may specify a particular column or all 
df.fillna (x): x by replacing all null values DataFrame object 
s.astype (float): to change the data type to float in Series 
s.replace (1, 'one') : with a 'one' value equal to 1 instead of all 
s.replace does ( [1,3], [ 'one' , 'three']): with a 'one' in place of 1, with 'three' in place. 3 
df.rename (the lambda columns = X: X + 1): batch change a column name 
df. rename (columns = { 'old_name' : 'new_ name'}): selectively change a column name 
df.set_index ( 'column_one'): to change the index column 
df.rename (index = lambda x: x + 1): batch rename index

Data processing: Filter, Sort and the GroupBy 
DF [DF [col]> 0.5]: select col column values greater than 0.5 
df.sort_values (col1): col1 sort the data in columns, in ascending order by default 
df.sort_values (col2, ascending = False): col1 descending order according to List data 
df.sort_values ([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data 
df.groupby (col): returns one of objects grouped by column Groupby COL 
df.groupby ([col1, col2]): returns an object grouping more than one column of Groupby 
df.groupby (col1) [col2]: rETURN group by columns col1, col2 column mean 
df.pivot_table (index = col1, values = [col2, col3], aggfunc = max): create a group by columns col1, col2 and col3 and calculates the maximum PivotTable 
df.groupby (col1). agg (np.mean): returns the mean of all the columns by the column col1 packet 
data.apply (np.mean): in each column of DataFrame application function np.mean 
data.apply (np.max, Axis =. 1): for each row of the application function np.max DataFrame

The combined data 
df1.append (df2): df2 adding rows to the tail df1 
df.concat ([df1, df2], axis = 1): adding columns to df2 df1 tail 
df1.join (df2, on = col1, how = 'inner '): execute SQL join to form columns of df1 and df2 column

Statistics 
df.describe (): Summary view the data value of the column statistics 
df.mean (): Returns the average of all the columns 
df.corr (): Returns the correlation coefficient between the columns and columns 
df.count (): returns each column the number of non-null values 
df.max (): returns the maximum of each column 
df.min (): returns the minimum value of each column 
df.median (): returns the number of bits in each column 
df.std () : returns the standard deviation of each column

Guess you like

Origin www.cnblogs.com/feiqixia/p/11241925.html