Day98 Data Analysis (a)

A data analysis acquaintance

1. What is the data analysis

In the 21st century, the most important is the data, who mastered data, grasps the lifeblood of the 21st century, and the data analysis is to allow us to play a role in these data tools

 

2. Data analysis can do

'' ' 
For data analysis we can do in fact simply a few examples: 

1, Taobao can observe the contents of the user's purchase history, search history, as well as people post on social media select product recommendation 

2, according to the corresponding stock data select the buying and selling 

3, today's headlines may be applied to a data analysis algorithm ranking among newsfeeds 

4, iQIYI can provide personalized movie recommendation service for users to 

actually analyze data not only can be done like the ones above recommendation system, in the pharmaceutical industry also can use data analysis to predict what compounds might be made more effective drugs 

so that data analysis is absolutely essential to the future of all the company's position, the current way of acquiring too much data on social, so much data, as long as we have the skills, data analysis, definitely can cope with any work on the job. 
'' '

 

3. The process of data analysis

'' ' 
1. The proposed requirements 

2. organize data 
collation data is divided into three steps: 
(1) collecting data 
to get the data through various channels, into Jupyter Notebook in 
(2) to assess the data 
it needs to find a major step whether the data problems in quality or structure 
(3), clean up the data 
by modifying, replace, delete, etc. to ensure high data quality, good structure 

3. data analysis 
using numpy, pandas and other tools for data analysis 

4. the outcome of and display 
using matplotlib for display 

ps: when large volumes of data, using hadoop, spark, etc. 
'' '

 

4, commonly used library Introduction

Numpy

Numerical Python Numpy is shorthand, it can be used for the main numerical Python. It provides a variety of data structures, algorithms, and Python mostly involving numerical calculations required interfaces. 

Fast, efficient multidimensional array object ndarray 
array elements and direct calculation of the mathematical operation function based on the array 
for a tool based on the data set read-write hard drive array 
, the Fourier transform, and a random number generating linear algebra operations 
for the C, C ++, Fortran code into python tool

 

Pandas

Pandas make us a major tool for data analysis carried out. 
Design of data structures and data processing tools such that it contains Python and data analysis very quick cleaning.
pandas are generally used in conjunction with other numerical tools, language support most Numpy-style array of computing.
pandas and the biggest difference is numpy pandas is used to handle heterogeneous or tabular data, and Numpy is exactly the opposite, it is more suitable for processing an array of data values based homogeneous type

 

matplotlib

matplotlib is the most popular data used to draw the chart python library.

 

Scipy

Scipy is a collection of scientific computing packages for various standard problem domain. Provides a powerful scientific calculation (matrix analysis, signal analysis, mathematical analysis and the like)

 

IPython and Juypyter notebook

IPython is an enhanced version of the Python interpreter, Juypyter notebook is a Web-based code notebook, also originally from IPython project.

 

Installation and use of two .Ipython

installation

pip3 install ipython

ipython mainly in the terminal can write python code, he and the use of native python3 written is different, highlighting the use of ipython make the code more convenient to write our

And use the tab key in the on packages and other operations can give us some tips relative native to write more friendly

 

After using ipython

After that we need can be achieved in ipython in some small code in the terminal test

 

Installation and use of three of .jupyter notebook

Two installation and start-up mode

Command line mode

Installation: 
    PIP3 install jupyter 
start: 
    Enter the directory you want to activate: jupyter notebook

This command line must manually install additional data analysis package of

First enter the official website: https: //www.anaconda.com/
choose to download the corresponding version

The installation process the red part is not checked , the default choice other

 

Software using anaconda way

 Pros: it contains basic data analysis package around about 200 scientific computing package

1.anaconda foundation Features

 

2.jupyter notebook Features Editor

 

Create a new file python3

Shortcuts

Shortcuts 
cell is green: edit mode 
cell is blue: command line mode, esc enter the mode
 1 operation is selected by the current cell and a cell, Shift + Enter
 2. Run the current cell, Ctrl + Enter
 3. In an adding unit cells over the cell, ESC + a
 4. below, ESC + B
 5. The To delete a cell, ESC + dd
 6. The switching code and markdown, esc + m

Note that this is not a py file, he suffix is ​​ipynb

 

 

Guess you like

Origin www.cnblogs.com/sxchen/p/11973032.html