Article directory
foreword
1. Introduction to pandas
pandas is Python
核心数据分析支持库
, providing fast, flexible, and explicit data structures designed to handle relational, labeled data simply and intuitively.
The goal of pandas is to become an essential advanced tool for Python data analysis practice and combat . Its long-term goal is to become the most powerful and flexible open source data analysis tool that can support any language. After years of unremitting efforts, pandas is getting closer and closer to this goal.
When it comes to data analysis with Python, pandas is pretty much everyone's name.
通俗来讲,pandas 是 Python 编程界的 Excel
.
Click me on the pandas official website , access is slower without a VPN.
The pandas Chinese website clicks me , and it can be accessed normally, which is more user-friendly.
Two, pandas advantage
为什么 pandas 能成为 Python 数据分析的利器和核心支持库?
I think the answer can be found in the following points.
2.1 Powerful data structure support
The main data structures of pandas are Series (one-dimensional data) and DataFrame (two-dimensional data) , which are sufficient to handle most typical use cases in the fields of finance, statistics, social science, engineering, etc.
For R users, DataFrame provides richer functions than R language data.frame. pandas is developed based on NumPy and can be perfectly integrated with other third-party scientific computing support libraries.
2.2 Advantages
-
1. Handle missing data in floating-point and non-floating-point data, expressed as NaN
-
2. Variable size
Insert or delete columns of multidimensional objects such as DataFrame;
- 3. Automatic, display data alignment
Display aligns objects with a set of labels, or ignore labels, and automatically aligns with data when Series and DataFrame are calculated;
- 4. Powerful and flexible group by function
Split-apply-combine data sets, aggregate and transform data;
easily convert irregular and different index data in Python and NumPy data structures into DataFrame objects;
-
5. Based on smart tags, perform operations such as slicing, fancy indexing, and subset decomposition on large data sets;
-
6. The axis supports structured labels: one scale supports multiple labels;
-
7. Mature IO tools
Read data from sources such as text files (CSV and other files that support delimiters), Excel files, databases, etc., and use the ultra-fast HDF5 format to save/load data;
- 8. Time series
Support date range generation, frequency conversion, moving window statistics, moving window linear regression, date displacement and other time series functions.
Three, pandas learning route
First the Series:
after that the DataFrame:
epilogue
The learning of pandas is bound to encounter many difficulties. This reminds me of when I first learned the Java framework Spring, I felt that I couldn’t stand it anymore.边学习边实践,拒绝拖延,是提高学习积极性的好办法。
Related reading
Article direct | Link |
---|---|
Review of last issue | [Data Analysis - Basic Introduction to NumPy⑥] - NumPy Case Consolidation and Strengthening |
Next issue notice | [Data Analysis-Basic Introduction to Pandas②]-Pandas Data Structure——Series |