Data Analysis Study Notes (8) -- Introduction to numpy, pandas, scipy

numpy

  • Introduction

NumPy (Numeric Python) is a Python package. It is a library consisting of multidimensional array objects and a collection of routines for working with arrays.

Numeric, the predecessor of NumPy, was developed by Jim Hugunin, who also developed another package, Numarray, which has some additional features. In 2005, Travis Oliphant created the NumPy package by integrating the functionality of Numarray into the Numeric package. This open source project has many contributors.

NumPy is commonly used with SciPy (Scientific Python) and Matplotlib (plotting library).

  • Function

a. Arithmetic and logical operations on arrays
b. Fourier transforms and operations on graphs
c. Operations related to linear algebra: has built-in functions for linear algebra and random number generation


pandas

  • Introduction

The python Data Analysis Library or pandas is a NumPy based tool created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large datasets. pandas provides a large number of functions and methods that allow us to transfer data quickly and easily.

Pandas is used in a wide range of fields, including academic and business fields such as finance, economics, statistics, analysis, etc.

  • developing

Pandas is a data analysis package for python, originally developed by AQR Capital Management in April 2008 and open sourced at the end of 2009. It is currently developed and maintained by the PyData development team focusing on Python data package development and is part of the PyData project . Pandas was originally developed as a financial data analysis tool, therefore, pandas provides good support for time series analysis. Pandas gets its name from panel data and python data analysis. Panel data is a term for cubes in economics, and the panel data type is also provided in Pandas.

  • data structure

Series: One-dimensional array, similar to one-dimensional array in Numpy. The two are also very similar to Python's basic data structure List. The difference is that the elements in List can be of different data types, while Array and Series are only allowed to store the same data type, which can use memory more efficiently. Improve operational efficiency.

Time-Series: A time-indexed Series.

DataFrame: A two-dimensional tabular data structure. Many functions are similar to data.frame in R. DataFrame can be understood as a container of Series.

Panel: A three-dimensional array, which can be understood as a container of DataFrame.

The above content comes from Baidu Encyclopedia: pandas


scipy

  • Introduction

scipy is a numpy-based scientific computing core package that works with NumPy arrays and provides many user-friendly and efficient numerical practices, such as implementing interpolation, integration, optimization, image processing, and more.


in short

Numpy : N-dimensional array container, matrix

Pandas : Tabular container

Scipy : scientific computing function library

Matplotlib : plotting

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325697763&siteId=291194637