Python five common data analysis library

Foreword

Python is often used data analysis tools commonly used in data analysis library has a lot, the following analysis focuses on the following five libraries: NumPy, Pandas, SciPy, StatsModels, Matplotlib.

NumPy

NumPy is a very commonly used data analysis library, said to be a more accurate point math libraries, including the following Pandas also depends on NumPy. Saying Why NumPy, he What are the advantages?

  1. Many built-in mathematical calculations: articles you see here, that you have to do is work and the high probability of artificial intelligence, machine learning or work-related data analysis, not simply stack code logic of these jobs, and more mathematics applications, often require a matrix calculation, the basic linear algebra operations, stochastic simulation and Fourier transform, can be carefully built NumPy these operations, without the need for Fourier expansion your hand.
  2. Fast: For example, the matrix a and b of multiplication can be directly written a * b (element product) or a @ b (matrix product), will hand write cycle is faster than you, because you may have guessed, it is a pre compiled C code or use a better caching strategies, there is an advantage, there is no reason you do not.
  3. Code is simple: The above wording matrix multiplication ratio cycle would easier to read, less code means fewer Bug.

Do data analysis, life is short, have been used Python, it's good to know NumPy, you will have a deeper understanding.

Pandas

Pandas, Python Data Analysis Library, which is a data analysis and modeling tool library students, Pandas contains many data model, but also absorb a lot of advantages over other libraries, such as the above-mentioned Pandas will depend on NumPy, also recommended here NumPy and then to learn to understand the Pandas. Pandas provides a number of functions and methods of data processing, in particular, has its unique advantages for large data sets, and since its creation has a financial background, so that in terms of time-series analysis or economics has its unique advantages.

Matplotlib

Matplotlib is a 2D graphics library. A bunch of data before, or a huge data table in front, to analyze where the data characteristics, such a task a bit more difficult for the human eye, but if converted to graphics, it is a different feeling. Few lines of code can draw accurate histogram, bar chart, errors or scatter plot, such as tools for data analysis, it is die artifact.

Above is the Python data analysis of the three pillars (of course, some people will say other libraries or tools as well, really I do not deny this, but I personally and some people still think the top three is a Python library data analysis of the three pillars ), the following continue to analyze other three just as well worth knowing learning tool library.

SciPy

Many people also want SciPy as a pillar toolkit, we can see that is very important. SciPy also depends on NumPy, SciPy scientific computing is a tool magazine, which also contains a lot of higher level of abstraction and a physical model, the integral difference and signal processing.

state Models

StatsModels contains many statistical models, linear models, generalized linear models, variance analysis, time series (Pandas can do, because StatsModels turn depends on the Pandas) and linear mixed models such as utility has its unique advantages in terms of statistics.

postscript

Python library for data analysis, I have to say, really close connection between them together, often there are also the official website linked to one another index, recommend each other, they really hold together for warmth. And among them some features slightly repeat, but have different emphases, to this point, which is inevitable, because the data analysis of some basic operations are the same, which library they can not put those basic functions masked. For us, we often need to have some knowledge in different job or task, using a different tool library.

For learning the steps, my advice is to take a look at the three pillars (NumPy, Pandas and Matplotlib) documents, after learning the basic use, after specific API slowly learning process to master.

Guess you like

Origin www.cnblogs.com/renyuzhuo/p/12222578.html