pandas common functions
Language
2019-11-06 01:10:38
views: null
Import Data
- pd.read_csv (filename): import data from a CSV file
- pd.read_table (filename): import data from a delimited text file defining
- pd.read_excel (filename): import data from the Excel file
- pd.read_sql (query, connection_object): import data from SQL table / database
- pd.read_json (json_string): import data from JSON-formatted string
- pd.read_html (url): parsing the URL, the HTML file or string, wherein the extracted tables form
- pd.read_clipboard (): Gets content from your clipboard, and passed read_table ()
- pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data
export data
- df.to_csv (filename): Export data to CSV file
- df.to_excel (filename): export data to an Excel file
- df.to_sql (table_name, connection_object): export data to SQL tables
- df.to_json (filename): Json format to export data to a text file
Create a test object
- pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5
- pd.Series (my_list): Creating a Series object from iterables my_list
- df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index
Check, check data
Data Selector
- df [col]: The column name, and returns a column in the form of Series
- df [[col1, col2]]: Returns the form of a plurality of columns in DataFrame
- s.iloc [0]: Select data by location
- s.loc [ 'index_one']: Press the Select Data Index
- df.iloc [0 ,:]: returning the first row
- df.iloc [0,0]: Returns the first element of the first column
Data Cleansing
- df.coumns = [ 'a', 'b', 'c']: Rename column name
- pd.isnull (): null DataFrame inspection object and returns a Boolean array
- pd.notnull (): non-null values DataFrame inspection object and returns a Boolean array
- df.dropna (): delete all the rows contain null values
- df.dropna (axis = 1): Delete all columns contain null values
- df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows
- df.fillna (x): Replace all null values DataFrame object by x
- s.astype (float): Change the data type to float in Series
- s.replace (1, 'one'): with a 'one' in place of a value equal to 1 for all
- s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3
- df.rename (columns = lambda x: x + 1): Mass Change column names
- df.rename (Columns = { 'Old name': 'new new name'}): selectively change a column name
- df.set_index ( 'column_one'): Change the index column
- df.rename (index = lambda x: x + 1): Batch rename index
Data processing: Filter, Sort and GroupBy
- df [df [col]> 0.5]: select col column values greater than 0.5
- df.sort_values (col1): col1 sort the data in columns, in ascending order by default
- df.sort_values (col2, ascending = False): in descending order according to the data column col1
- df.sort_values ([col1, col2], ascending = [True, False]): first by column in ascending order col1, col2 descending order according to the data
- df.groupby (col): Returns one of Groupby objects grouped by the column col
- df.groupby ([col1, col2]): Returns by a plurality of objects are grouped column Groupby
- df.groupby (col1) [col2]: RETURN group by columns col1, col2 of the column means
- PivotTable maximum value to create a group by columns col1, col2 and col3 and calculates a: df.pivot_table (index = col1, values = [col2, col3], aggfunc = max)
- df.groupby (col1) .agg (np.mean): Returns the mean of all the columns grouped by columns col1
- data.apply (np.mean): in each column of DataFrame application function np.mean
- data.apply (np.max, axis = 1): for each row of the applied function of np.max DataFrame
Data Merge
- df1.append (df2): df2 added to the end of the row of df1
- df.concat ([df1, df2], axis = 1): adding df2 columns to the end of df1
- df1.join (df2, on = col1, how = 'inner'): join SQL execution forms of the columns df1 and df2 column
Statistics
- df.describe (): see the data value of the column summary statistics
- df.mean (): Returns the mean of all the columns
- df.corr (): Returns the correlation coefficient between the rows and columns
- df.count (): returns the number of non-null values of each column
- df.max (): returns the maximum of each column
- df.min (): returns the minimum value of each column
- df.median (): Returns the median of each column
- df.std (): returns the standard deviation of each column
Origin blog.51cto.com/13132323/2447986