pandas common functions

Import Data

pd.read_csv (filename): import data from a CSV file
pd.read_table (filename): import data from a delimited text file defining
pd.read_excel (filename): import data from the Excel file
pd.read_sql (query, connection_object): import data from SQL table / database
pd.read_json (json_string): import data from JSON-formatted string
pd.read_html (url): parsing the URL, the HTML file or string, wherein the extracted tables form
pd.read_clipboard (): Gets content from your clipboard, and passed read_table ()
pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data

pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
pd.Series (my_list): Creating a Series object from iterables my_list
df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index

df.head (n): View DataFrame object the first n rows
df.tail (n): View DataFrame object last n lines
df.shape (): View the number of rows and columns
[ Http://df.info ()] (https://link.zhihu.com/?target=http%3A//df.info ()): See index, and data type information memory
df.describe (): See aggregate statistics numeric column
s.value_counts (dropna = False): See Series unique object count value and
df.apply (pd.Series.value_counts): See DataFrame object unique values and each column count

df.coumns = [ 'a', 'b', 'c']: Rename column name
pd.isnull (): null DataFrame inspection object and returns a Boolean array
pd.notnull (): non-null values DataFrame inspection object and returns a Boolean array
df.dropna (): delete all the rows contain null values
df.dropna (axis = 1): Delete all columns contain null values
df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
df.fillna (x): Replace all null values DataFrame object by x
s.astype (float): Change the data type to float in Series
s.replace (1, 'one'): with a 'one' in place of a value equal to 1 for all
s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3
df.rename (columns = lambda x: x + 1): Mass Change column names
df.rename (Columns = { 'Old name': 'new new name'}): selectively change a column name
df.set_index ( 'column_one'): Change the index column
df.rename (index = lambda x: x + 1): Batch rename index

df [df [col]> 0.5]: select col column values greater than 0.5
df.sort_values (col1): col1 sort the data in columns, in ascending order by default
df.sort_values (col2, ascending = False): in descending order according to the data column col1
df.sort_values ([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data
df.groupby (col): Returns one of Groupby objects grouped by the column col
df.groupby ([col1, col2]): Returns by a plurality of objects are grouped column Groupby
df.groupby (col1) [col2]: RETURN group by columns col1, col2 of the column means
PivotTable maximum value to create a group by columns col1, col2 and col3 and calculates a: df.pivot_table (index = col1, values = [col2, col3], aggfunc = max)
df.groupby (col1) .agg (np.mean): Returns the mean of all the columns grouped by columns col1
data.apply (np.mean): in each column of DataFrame application function np.mean
data.apply (np.max, axis = 1): for each row of the applied function of np.max DataFrame

df1.append (df2): df2 added to the end of the row of df1
df.concat ([df1, df2], axis = 1): adding df2 columns to the end of df1
df1.join (df2, on = col1, how = 'inner'): join SQL execution forms of the columns df1 and df2 column