pandas document

pandas

df: Pandas DataFrame any objects (abbreviation)
at the same time we need to do the following introduction: import pandas as pd

Import Data

pd.read_csv (filename): import data from a CSV file

pd.read_table (filename): import data from a delimited text file defining

pd.read_sql (query, connection_object): import data from SQL table / database

pd.DataFrame (dict): import data from the dictionary objects, Key is the column name, Value data

export data:

df.to_csv (filename): Export data to CSV file

df.to_excel (filename): export data to an Excel file

df.to_sql (table_name, connection_object): export data to SQL tables

Create a test object

pd.DataFrame (np.random.rand (20,5)): Create Object 20 DataFrame row random number consisting of 5

pd.Series (my_list): Creating a Series object from iterables my_list

df.index = pd.date_range ( '1900/1/30', periods = df.shape [0]): increase a date index

Check, check data:

df.head (n): View DataFrame object the first n rows

df.tail (n): View DataFrame object last n lines

df.shape (): View the number of rows and columns

df.info (): View the index, data type and memory information

df.describe (): See aggregate statistics numeric column

s.value_counts (dropna = False): See Series unique object count value and

df.apply (pd.Series.value_counts): See DataFrame object unique values ​​and each column count

Data selection:

df [col]: The column name, and returns a column in the form of Series

df [[col1, col2]]: Returns the form of a plurality of columns in DataFrame

s.iloc [0]: Select data by location

s.loc [ 'index_one']: Press the Select Data Index

df.iloc [0 ,:]: returning the first row

df.iloc [0,0]: Returns the first element of the first column

Data cleansing:

df.columns = [ 'a', 'b', 'c']: Rename column name

pd.isnull (): null DataFrame inspection object and returns a Boolean array

pd.notnull (): non-null values ​​DataFrame inspection object and returns a Boolean array

df.dropna (): delete all the rows contain null values

df.dropna (axis = 1): Delete all columns contain null values

df.dropna (axis = 1, thresh = n): to delete all non-null value less than n rows

df.fillna (x): Replace all null values ​​DataFrame object by x

s.astype (float): Change the data type to float in Series

s.replace (1, 'one'): with a 'one' in place of a value equal to 1 for all

s.replace ([1,3], [ 'one', 'three']): with a 'one' in place of 1, with 'three' instead of 3

df.rename (columns = lambda x: x + 1): Mass Change column names

df.rename (columns = { 'old_name': 'new_ name'}): selectively change a column name

df.set_index ( 'column_one'): Change the index column

df.rename (index = lambda x: x + 1): Batch rename index

Statistics

df.describe (): see the data value of the column summary statistics

df.mean (): Returns the mean of all the columns

df.corr (): Returns the correlation coefficient between the rows and columns

df.count (): returns the number of non-null values ​​of each column

df.max (): returns the maximum of each column

df.min (): returns the minimum value of each column

df.median (): Returns the median of each column

df.std (): returns the standard deviation of each column

Released three original articles · won praise 1 · views 43

Guess you like

Origin blog.csdn.net/qq_43569821/article/details/104791679