pd.isnull(df), return null value is True, non-null value is False
The most commonly used data loading functions are: Read_csv() and read_table()
Pd.read_table(sep=), if it is a csv file, you can also use the read_table function to read it, but you need to set the separator sep
Pd.read_csv(header = None, name)
header parameter, when the file has no header row, you can set this parameter to None
Names parameter, when reading the file, you can specify the column name
Index_col parameter, you can specify a column as the index of the DataFrame, or you can pass in a list, which will make a hierarchical index, or it can be a number
skiprows skip a line of the file, such as skiprows[0, 2, 3], skip the first, third, and fourth lines of the file
Parse_dates, parse the data as a date, the default is False, if it is True, try to parse all columns
Nrows, the number of rows that need to be read
skip_footer, the number of lines to be ignored, counting from the end of the file
Chunksize, read the file block by block, this parameter is used to specify the block size, returns an iterator, each time a small block is read according to the chunksize
Df.to_csv(), this function is used to write data to a csv file.
The parameters are as follows:
Na_rep, the empty string is represented as another tag value
Index, whether to write index
Header, whether to write the column name
Columns, pass in a list of column names, you can specify the order of column output
Json library
Json.loads(), convert json string into python form
Json.dumps(), convert python objects into JSON form
Pd.read_json(), read the json file as a DataFrame
Df.to_json(), output data to json
Pd.read_excel(sheet_name = ), read the excel file, the sheet_name parameter is which sheet page to read
Frame.to_excel(sheet_name), store the data in an excel file
Df.isnull(), returns a df with only True and False, if the value of df is empty, the corresponding is True
Df.notnull(), the antonym of isnull
Df.dropna(axis = 0, how = ), delete the missing value, if it is a DdataFrame object, it will delete the empty row or the line, the default is to drop the row with the missing value, when axis = 1, delete the Empty columns, how='all' only delete those rows that are all empty,
Df.fillna(value =, method=, inplace =, axis = 0, inpalce = False, limit), fill missing data, can be filled by method, you can use fil and bfill, fil is forward filling, bfill is backward Fill, you can also pass in a dictionary to fill in different values according to different columns. You can also pass in the inplace parameter as True, that is, locally modify existing objects, and you can also pass in averages, etc.
Df.duplicated(), returns a Boolean Series, indicating whether there are duplicate rows in each row
Df.drop_duplicates(), delete duplicate rows, also delete duplicates based on multiple columns of a certain column, just pass in the column name