pandas
df: Pandas DataFrame any objects (abbreviation)
at the same time we need to do the following introduction: import pandas as pd
Import Data
pd.read_csv (filename): import data from a CSV file
pd.read_table (filename): import data from a delimited text file defining
pd.read_sql (query, connection_object): import data from SQL table / database
pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data
export data:
df.to_csv (filename): Export data to CSV file
df.to_excel (filename): export data to an Excel file
df.to_sql (table_name, connection_object): export data to SQL tables
Create a test object
pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
pd.Series (my_list): Creating a Series object from iterables my_list
df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index
Check, check data:
df.head (n): View DataFrame object the first n rows
df.tail (n): View DataFrame object last n lines
df.shape (): View the number of rows and columns
df.info (): View the index, data type and memory information
df.describe (): See aggregate statistics numeric column
s.value_counts (dropna = False): See Series unique object count value and
df.apply (pd.Series.value_counts): See DataFrame object unique values and each column count
Data selection:
df [col]: The column name, and returns a column in the form of Series
df [[col1, col2]]: Returns the form of a plurality of columns in DataFrame
s.iloc [0]: Select data by location
s.loc [ 'index_one']: Press the Select Data Index
df.iloc [0 ,:]: returning the first row
df.iloc [0,0]: Returns the first element of the first column
Data cleansing:
df.columns = [ 'a', 'b', 'c']: Rename column name
pd.isnull (): null DataFrame inspection object and returns a Boolean array
pd.notnull (): non-null values DataFrame inspection object and returns a Boolean array
df.dropna (): delete all the rows contain null values
df.dropna (axis = 1): Delete all columns contain null values
df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
df.fillna (x): Replace all null values DataFrame object by x
s.astype (float): Change the data type to float in Series
s.replace (1, 'one'): with a 'one' in place of a value equal to 1 for all
s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3
df.rename (columns = lambda x: x + 1): Mass Change column names
df.rename (columns = { 'old_name': 'new_ name'}): selectively change a column name
df.set_index ( 'column_one'): Change the index column
df.rename (index = lambda x: x + 1): Batch rename index
Statistics
df.describe (): see the data value of the column summary statistics
df.mean (): Returns the mean of all the columns
df.corr (): Returns the correlation coefficient between the rows and columns
df.count (): returns the number of non-null values of each column
df.max (): returns the maximum of each column
df.min (): returns the minimum value of each column
df.median (): Returns the median of each column
df.std (): returns the standard deviation of each column