pandas data analysis reading notes (two)

pd.isnull(df), return null value is True, non-null value is False

The most commonly used data loading functions are: Read_csv() and read_table()

Pd.read_table(sep=), if it is a csv file, you can also use the read_table function to read it, but you need to set the separator sep

Pd.read_csv(header = None, name)

header parameter, when the file has no header row, you can set this parameter to None

Names parameter, when reading the file, you can specify the column name

Index_col parameter, you can specify a column as the index of the DataFrame, or you can pass in a list, which will make a hierarchical index, or it can be a number

skiprows skip a line of the file, such as skiprows[0, 2, 3], skip the first, third, and fourth lines of the file

Parse_dates, parse the data as a date, the default is False, if it is True, try to parse all columns

Nrows, the number of rows that need to be read

skip_footer, the number of lines to be ignored, counting from the end of the file

Chunksize, read the file block by block, this parameter is used to specify the block size, returns an iterator, each time a small block is read according to the chunksize

 

Df.to_csv(), this function is used to write data to a csv file.

The parameters are as follows:

Na_rep, the empty string is represented as another tag value

Index, whether to write index

Header, whether to write the column name

Columns, pass in a list of column names, you can specify the order of column output

 

Json library

Json.loads(), convert json string into python form

Json.dumps(), convert python objects into JSON form

 

Pd.read_json(), read the json file as a DataFrame

Df.to_json(), output data to json

 

Pd.read_excel(sheet_name = ), read the excel file, the sheet_name parameter is which sheet page to read

Frame.to_excel(sheet_name), store the data in an excel file

 

Df.isnull(), returns a df with only True and False, if the value of df is empty, the corresponding is True

Df.notnull(), the antonym of isnull

Df.dropna(axis = 0, how = ), delete the missing value, if it is a DdataFrame object, it will delete the empty row or the line, the default is to drop the row with the missing value, when axis = 1, delete the Empty columns, how='all' only delete those rows that are all empty,

Df.fillna(value =, method=, inplace =, axis = 0, inpalce = False, limit), fill missing data, can be filled by method, you can use fil and bfill, fil is forward filling, bfill is backward Fill, you can also pass in a dictionary to fill in different values ​​according to different columns. You can also pass in the inplace parameter as True, that is, locally modify existing objects, and you can also pass in averages, etc.

Df.duplicated(), returns a Boolean Series, indicating whether there are duplicate rows in each row

Df.drop_duplicates(), delete duplicate rows, also delete duplicates based on multiple columns of a certain column, just pass in the column name

Guess you like

Origin blog.csdn.net/u012724887/article/details/107035545