Python's data analysis toolkits will learn and install [entry]

 

Introduction
text and pictures of this article from the network, only to learn, exchange, not for any commercial purposes, belongs to original author, if any questions, please contact us for treatment.

First, we look

Mac version

In accordance with the needs of everyone in order to install, learn data analysis if you have not, I suggest you first learn Pytho basis and reptiles again. Python can go small series of exchanges dress: a long time and their weapons while under a stream of thought (digital homonym) conversion can be found, there are new Python Tutorial Project
python3 -m pip install numpy

python3 -m pip install --upgrade pip

 

  1.  
    // and then click Install
  2.  
     
  3.  
    python3 -m pip install pandas
  4.  
    python3 -m pip install wordcloud
  5.  
    python3 -m pip install matplotlib
  6.  
     
  7.  
    python3 -m pip install scipy
  8.  
     
  9.  
    python3 -m pip install -U scikit-learn

Matplotlib 
Matplotlib is a visualization module of Python, he can easily only line charts, pie charts, bar charts, and other professional graphics. If you can not read, did not explain the basis for you to learn. Python can go small series of exchanges dress: a long time and their weapons while under a stream of thought (digital homonym) conversion can be found, there are new Python Tutorial project, learn to see in this


Use Matplotlib, you can customize any aspect made the chart. He supports all operating systems with different back-end GUI, and can be exported as a common vector and graphics tests, such as PDF SVG JPG PNG BMP GIF. By drawing data, we can be converted into digital boring people easily received chart. 
Matplotlib Numpy is based on a set of Python package, which provides data commanded drawing tools, mainly used to draw some statistical graphics. 
Matplotlib set various properties allow customization of the default settings, the default property may be controlled Matplotlib each of: the size of an image, dots per inch, width, color and style, sub-picture, coordinate axes, network attributes, text and text properties .

 

Numpy 
Numpy provides two basic objects: ndarray and ufunc. ndarray single data type is stored in a multidimensional array, ufunc is a function capable of processing the array. Numpy features:

  • N-dimensional array, a fast, efficient use of multidimensional arrays of memory, he provides vectorization math.
  • Using a loop may not be required, the data can be over the entire array of standard mathematical operations.
  • Very easy to transfer the data to write (C \ C ++) with a lower external database language, but also easy to return data in external libraries Numpy array.

Numpy does not offer advanced data analysis capabilities, but can be more profound understanding of computing Numpy arrays and array-oriented.

 

Pandas 

Pandas is a data analysis package Python, Pandas originally used as financial data analysis tool developed, therefore Pandas time series analysis provides a good support. 
Pandas to solve data analysis tasks created, Pandas included a large number of libraries and some standard data model provides a tool efficient operation of large data sets need. Pandas provides a large number of our quick and easy data processing functions and methods. Pandas includes advanced data structures, and make data analysis becomes quick, simple tools. It builds on Numpy, Numpy make application easier.

  • A data structure with axes, either automatically or explicitly support data alignment. This prevents errors due to common data structures are not aligned, and a processing different sources using different indices of the data generated.
  • Use Pandas easier to handle missing data.
  • The combined popular databases (such as: SQL-based database)

Pandas are clear data / best finishing tools.

 

Learn-Scikit 
Scikit-Learn Python module is based on machine learning, based on the BSD open-source license. 
Scikit-Learn installation requires Numpy Scopy Matplotlib modules, Scikit-Learn the main function is divided into six parts, classification, regression, clustering, data reduction, model selection, data preprocessing. 
Scikit-Learn comes with some classic data sets, such as iris and digits for classification of data sets, as well as for boston house prices regression analysis of the data set. The data set is a dictionary structure, the data stored in .data member, the output in the tag storage .target members. Scikit-Learn built on Scipy, provides a set of commonly used machine learning algorithms to use through a unified interface, Scikit-Learn contribute to popular algorithms on the data set. 
Scikit-Learn Some libraries, such as: the depth of learning for Nltk Theano and other natural language processing, data for web site crawling Scrappy, for web mining Pattern, for.

SciPy 
SciPy is a convenient, easy to use, designed for scientific and engineering Python package, which includes statistics, optimization, integration, linear algebra module, Fourier transforms, signal and image processing, often solver differential equations. Scipy dependent on Numpy, and offers many user-friendly and efficient numerical routines such as numerical integration and optimization.

 

Python has as powerful as Matlab numerical toolkit Numpy; has a drawing kit Matplotlib; have a scientific computing toolkit Scipy. 
Python can directly process the data, and Pandas almost as image control data SQL. Matplotlib data can be visualized and demerits, quickly understand the data. Scikit-Learn provides support for machine learning algorithms, Theano provides enroll learning framework (you can also use the CPU acceleration).

 

Guess you like

Origin www.cnblogs.com/chengxuyuanaa/p/11985453.html