The most commonly used methods and function collections of the Pandas library

Pandas is the core third-party library for Python data analysis and processing. It uses a two-dimensional array form, similar to an Excel table, and encapsulates many practical function methods, allowing you to easily perform various operations on data sets.

Friends in the technical group automatically initiated the process of sorting out the commonly used functions and methods in Pandas, and shared them with you today for your convenience.

If you also want to join in learning Python, see the end of the article.

read write

  • read_csv: Read CSV files

  • to_csv: Export CSV file

  • read_excel: Read Excel files

  • to_excel: Export Excel file

  • read_json: Read Json file

  • to_json: Export Json file

  • read_html: Read HTML table data in web pages

  • to_html: Export web page HTML table

  • read_clipboard: Read clipboard data

  • to_clipboard: Export data to clipboard

  • to_latex: Export data to latex format

  • read_sas: Read sas format data (a statistical analysis software data format)

  • read_spss: Read spss format data (a statistical analysis software data format)

  • read_stata: Read stata format data (a statistical analysis software data format)

  • read_sql: Read the data sql query (need to connect to the database), output dataframe format

  • to_sql: Write data in dataframe format to the database

Connect Merge Reshape

  • merge: connect multiple dataframes based on specified key associations, similar to join in sql

  • concat: merge multiple dataframes, similar to union in sql

  • Pivot: Reshape the table according to the specified rows and columns

  • pivot_table: pivot table, similar to the pivot table in excel

  • cut: Divide a set of data into discrete intervals, suitable for classifying values.

  • qcut: The same function as cut, but it divides the values ​​​​at equal intervals.

  • crosstab: Create a crosstab that calculates frequencies between two or more factors

  • join: merge two dataframes by index

  • stack: "Stack" the columns of the data frame into a hierarchical Series

  • unstack: Convert hierarchical Series back to data frame form

  • append: Append one or more rows of data to the end of the data frame

Grouping Aggregation Transformation Filtering

  • groupby: Group data according to a specified column or multiple columns

  • agg: Apply a custom aggregation function to each group

  • transform: Apply a transformation function to each grouping, returning a result with the same shape as the original data

  • rank: Calculate the ranking of elements in each group

  • filter: Filter data based on certain attributes of the group

  • sum: Calculates the sum of groups

  • mean: Calculate the average of the groups

  • Median: Calculate the median of the grouping

  • min and max: Calculate the minimum and maximum values ​​of the group

  • count: Count the number of non-NA values ​​in the group

  • size: Calculate the size of the group

  • std and var: calculate the standard deviation and variance of the grouping

  • describe: Generate descriptive statistical summary of groupings

  • first and last: get the first and last element in the group

  • nunique: Counts the number of unique values ​​in a group

  • cumsum, cummin, cummax, cumprod: calculate the cumulative sum, minimum value, maximum value, and cumulative product of the group

Data cleaning

  • dropna: drop rows or columns containing missing values

  • fillna: fill or replace missing values

  • interpolate: interpolate missing values

  • duplicated: mark duplicate lines

  • drop_duplicates: delete duplicate rows

  • str.strip: remove whitespace characters at both ends of the string

  • str.lower and str.upper: Convert strings to lowercase or uppercase

  • str.replace: replace specific characters in a string

  • astype: Convert the data type of a column to the specified type

  • sort_values: Sort the data frame according to the specified columns

  • rename: Rename columns or rows

  • drop: delete the specified column or row

data visualization

  • pandas.DataFrame.plot.area: Draw a stacked chart

  • pandas.DataFrame.plot.bar: draw histogram

  • pandas.DataFrame.plot.barh: Draw horizontal bar chart

  • pandas.DataFrame.plot.box: draw box plot

  • pandas.DataFrame.plot.density: Plot kernel density estimate

  • pandas.DataFrame.plot.hexbin: draw hexagonal bin plot

  • pandas.DataFrame.plot.hist: plot histogram

  • pandas.DataFrame.plot.line: draw line graph

  • pandas.DataFrame.plot.pie: draw pie chart

  • pandas.DataFrame.plot.scatter: Draw a scatter plot

  • pandas.plotting.andrews_curves: Plot Andrew curves for visualizing multivariate data

  • pandas.plotting.autocorrelation_plot: Plot time series autocorrelation plot

  • pandas.plotting.bootstrap_plot: used to evaluate the uncertainty of statistical data, such as mean, median, mid-range, etc.

  • pandas.plotting.lag_plot: Plots lag plots for detecting patterns, trends, and seasonality in time series data

  • pandas.plotting.parallel_coordinates: Draw parallel coordinates plots to show the relationship between samples in a data set with multiple features

  • pandas.plotting.scatter_matrix: draw scatter matrix plot

  • pandas.plotting.table: Draw tabular visualizations

date time

  • to_datetime: Convert input to Datetime type

  • date_range: Generate date range

  • to_timedelta: Convert input to Timedelta type

  • timedelta_range: generate time interval range

  • shift: move data along the timeline

  • resample: Resample a time series

  • asfreq: Convert time series to specified frequency

  • cut: Divide continuous data into discrete bins

  • period_range: generate period range

  • infer_freq: infer frequency of time series

  • tz_localize: Set time zone

  • tz_convert: Convert time zone

  • dt: used to access properties in Datetime

  • day_name, month_name: Get the day of the week and month name of the date

  • total_seconds: the total number of seconds in the calculation interval

  • rolling: Operation for rolling windows

  • expanding: Operation for expanding the window

  • at_time, between_time: Select at a specific time

  • truncate: truncate time series

Technical exchange and information acquisition

Technology needs to be communicated and shared, and it is not recommended to work behind closed doors. One person can go very fast, and a group of people can go further.

Good articles are inseparable from the sharing and recommendations of fans. Information, information sharing, data, and technical exchanges and improvements can all be obtained by joining the communication group. The group has more than 2,000 members. The best way to comment when adding is: source + interest Directions to find like-minded friends.

The methods for technical exchange, code, and data acquisition are as follows:

Method ①, add WeChat account: dkl88194, remarks: from CSDN + technical exchange
Method ②, search public account on WeChat: Python learning and data mining, background reply: technical exchange< /span>

Fee 1
Insert image description here
Fee 2

We created "100 Super Powerful Algorithm Models". Features: Easy to learn from 0 to 1. Principles, codes, and cases are all available. All algorithm models are expressed according to this rhythm, so it is a complete set of cases. Library.

Many beginners have such a pain point, which is the case. The completeness of the case directly affects the interest of the students. Therefore, I have compiled 100 of the most common algorithm models to give you a boost on your learning journey!

Insert image description here

Guess you like

Origin blog.csdn.net/qq_34160248/article/details/134916668
Recommended