Pandas library for Python data analysis

1. Introduction to Pandas

Pandas is a data analysis package for python. It was originally developed by AQR Capital Management in April 2008 and released as an open source at the end of 2009. It is currently being developed and maintained by the PyData development team that focuses on Python data package development. It is part of the PyData project . Pandas was originally developed as a financial data analysis tool, therefore, pandas provides good support for time series analysis. The name of Pandas comes from panel data (panel data) and python data analysis (data analysis). Panel data is a term for cubes in economics, and the panel data type is also provided in Pandas.

pandas is based on NumPy, and pandas tools were created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large datasets. pandas provides a large number of functions and methods that allow us to process data quickly and easily. As you'll soon discover, it's one of the things that makes Python a powerful and efficient data analysis environment.

20221222121644

2. Installation of Pandas library

pandas is a third-party library that needs to be installed separately to use it, pip installation is recommended

pip install pandas

Normally, we will import the pandas module like this:

import pandas as pd

It has become almost an unwritten rule to abbreviate pandas as pd. Therefore, as long as you see pd, it should be associated with pandas.

3. Pandas data structure

  • Series:
    One-dimensional array, similar to the one-dimensional array in Numpy. The two are also very similar to Python's basic data structure List. Series can now save different data types, strings, boolean values, numbers, etc. can be saved in Series.

  • Time-Series:
    Series indexed by time.

  • DataFrame:
    Two-dimensional tabular data structure. Many functions are similar to data.frame in R. DataFrame can be understood as a container of Series.

  • Panel:
    A three-dimensional array, which can be understood as a DataFrame container.

  • Panel4D:
    It is a 4-dimensional data container like Panel.

  • PanelND:
    It has a factory collection, which can create N-dimensional named container modules like Panel4D.

4. Use of Series and DataFrame data structures

To use pandas, you must first be familiar with its two main data structures: Series (one-dimensional data) and DataFrame (two-dimensional data). Most typical use cases.

For examples of using Series (one-dimensional data) and DataFrame (two-dimensional data), you can refer to this article, which is summarized in more detail:

Usage of Series and DataFrame

5. Other websites that can be referred to

Pandas official website: https://pandas.pydata.org/

Pandas Chinese website: https://www.pypandas.cn/

Pandas github: https://github.com/pandas-dev/pandas

Guess you like

Origin blog.csdn.net/hubing_hust/article/details/128407077