The data analysis environment will not match? Look here!

Hello, I am Yuechuang.

It is not easy to learn a programming language by yourself. From my own experience, you need to overcome several obstacles, from mentality, to physical strength, to thinking, in order to become a good programmer in the end. Therefore, to master a language, you need to feel it with your heart, and to appreciate the tacit understanding between you and "her".

But unfortunately, too many friends, not down in the middle, but they did not start, yes, they even Pythonprincess delicate hand did not touch.

Which version of Python should I download?

How to configure the Python environment?

Recommended several best use Python IDEand so on. Our challenge is that there are too many choices. At the gate of the harvest farm, we look at the watermelons and grapes, but are full of ambition but dare not move forward.

If you have not configured Pythonthe environment, then congratulations to you, follow this article to go, you will be able to harvest based Python3.7build programming platform, as well as Jupyterprogramming environment as the main tool, this article will take you install common tool for data analysis and visualization tools , One-stop shopping, no worries.

If you have installed Python3.7the environment, or anacondathe environment, you can also go down along the idea of this tool is not the point, is the ability to configure the environment.

But if you Pythonre-install version of the official website unknown environment (forum download unknown), I strongly recommend that you uninstall ( Pythonuninstall the same way as ordinary software, uninstall it in the Control Panel). After all, "knife sharpening does not cut wood by mistake" , some versions are incomplete, and no one can predict what bugs you will have.

This article takes windows 10 X64 system as an example to demonstrate the process of building a local Python data analysis environment.

1. Install Python environment

1.1 Python software download

As a great young man in the 21st century, always want to play the most authentic, pirated version? ! That was left over from the previous generation. Especially when the genuine version is free. It is recommended to download from the official Python website https://www.python.org/ . Some friends may load slowly. Don't worry about it. Unless there is a network problem, the website can be accessed.

image description

After entering the official website, you can just follow the prompts on the map, select the Windowstab to access Windowsthe platform Pythonversion of the details of a single page ( Releases Page), recommended to choose 3.7 or later. Python2.7Will stop all updates in 2020, as has been brilliant Windows XP, the ultimate slowly become the dust of history, so this column is not recommended to install Python2.7.

In the Select Pythonversion should pay attention to their own version based on the same PC, if your version is 64, it recommended to choose x86-64the version to download, or directly select the regular version. Remember median and computer Pythonversions consistent, otherwise the flow of tears after all had dug pit ah.

image description

For those using mac OS, please select the mac OS X tab to download. The steps are basically the same.

1.2 Python software installation

The installation process of Python is very user-friendly. You only need to double-click the installation program to set the installation path, "Remember to check: Add Python 3.7 to PATH" and then click Next until the installation is complete.

image description

The above picture add python3.7 to Pathmeans that the program adds the installation path to the system environment variable by default during the installation process, this item must be checked!

Now that we have completed the initial Python3.7installation. In cmdthe command line input python, you can go directly to pythonthe environment:

C:\Users\Administrator>python
Python 3.7.4 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

Continue entering our first line of the Pythonstatement, open tour of our data analysis it:

>> print("Life is short, I do data analysis in python!")
"Life is short, I do data analysis in python!"

If Pythoncompared to the princess, then how have fishes with a servant girl, is not it? We often say that Pythona wide range of applications and data analysis, reptiles, network programming, artificial intelligence, operation and maintenance and other fields, based on fact, because Pythonthere are a lot of third-party libraries, which greatly enriched the Pythonecological, makes Pythonalmost anything. So how do we install these third-party libraries?

Yes, the PIPtool should be said to be a little expert in this area. PIPUnderstand intuitively, that is Pythona small tool to manage third-party libraries tailored, played the role of maidservants. Let's tease it PIPin cmdthe input command window to view the installation path and version of PIP:

C:\Users\Administrator>pip --version
pip 19.1 from d:\users\lemeng\appdata\local\programs\python\python37\lib\site-packages\pip-19.1-py3.7.4.egg\pip (python 3.7)

If the system prompts that PIP is not the latest version, please continue to execute the following commands:

C:\Users\Administrator>python -m pip install --upgrade pip

Is PIP that simple? We will announce later.

2. Jupyter Notebook, the most suitable programming tool for data analysis

To be precise, it is Jupyter Notebooknot only suitable for data analysis, but also very suitable for beginners to practice Python.

Jupyter NotebookIt is a kind of "questions and answers" as an important feature of the Pythoneditor, as well typesetting notes, your program, the results of each order.

This column focuses on Pythondata analysis process Jupyter Notebookused. In the actual case, you enter a sentence and it returns the result of the program execution. And the memory of the program will be locked after execution, and the variables will be destroyed and the memory will be released until the program is closed.

This feature is particularly important in the field of data analysis. In the process of data analysis, all routines are based on the characteristics of the data itself. Especially for medium and large data tables, only when you are familiar with the characteristics of the data can you analyze the data in the next step. This is also the biggest difference between data programming and network programming.

We install it first Jupyter Notebook, and then experience its convenience through an example.

We recommend one-button installation by PIP tool, cmdenter the following command window:

C:\Users\Administrator>pip install jupyter

Here is the recommended pip replacement source for you, here is Windows 10 as an example:

  1. First open to view the file extension: View -> check the file extension

  2. First enter in the Windows folder "Shortcut key: Win icon + E" window: %APPDATA%

    image-20200803172555191

  3. Then create a pip folder at the bottom, then go to the pip folder to create a pip.ini, and then enter the content inside

  4. Enter the content:

    [global]
    index-url = http://mirrors.aliyun.com/pypi/simple/
    [install]
    trusted-host=mirrors.aliyun.com
    

  5. Since then, the end of the change source

  6. supplement:

    Starting from pip10.0.0, there is a config subcommand that can be used to change the configuration, no need to care about the configuration file path under different operating systems.

    详见讨论:Create a command to make it easy to access the configuration file · Issue #1736 · pypa/pip

    Practical example:

    # 阿里源
    pip config set global.index-url http://mirrors.aliyun.com/pypi/simple/
    
    # 豆瓣源
    pip config set global.index-url https://pypi.douban.com/simple
    
    # 阿里云 http://mirrors.aliyun.com/pypi/simple/
    # 科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
    # 豆瓣(douban) http://pypi.douban.com/simple/
    # 清华大学 https://pypi.tuna.tsinghua.edu.cn/simple/
    # 中国科学技术大学 http://pypi.mirrors.ustc.edu.cn/simple/
    

After the prompt installation is successful, it is recommended to set our commonly used Python learning path, so that new and saved files will be placed in this path by default. In the cmdcontinuing input window:

C:\Users\Administrator>jupyter notebook --generate-config
Writing default config to C:\Users\Administrator\.jupyter\jupyter_notebook_config.py

On the Edit step prompt jupyter_notebool_config.pyfile, find c.Notebook.notebok_dir, let go of the previous comments, and complete the modification in accordance with the following figure (fill in the path according to their own situation, taking care to avoid placing Chinese route, give yourself digging), Save and Close.

image description

So far, the Python environment has been configured. Let's take a look at it.

In the cmdwindow, enter jupyter notebook, start the environment:

C:\Users\Administrator>jupyter notebook

By Newmenu, create a new Python 3post-program, continue to enter the new page:

for i in range(10):
    print(i, end=",")

jupyter notebook The interface and effects are as follows:

image description

In Jupyterthe Standard toolbar, there are save, cut, copy, paste, run, and stop button, move the mouse over the button will be prompted, in general, still very human, I believe that there should be no difficulty to get started.

Please note that my friends here, and in the subsequent sections of this column, unless otherwise specified, Pythonthe demonstration program is in Jupyter Notebookconducted.

3. Three artifacts of data analysis

  • Numpy, a basic module for scientific computing

In simple terms, Numpyprovides an N-dimensional array of containers, based on Numpy, you can easily transform and calculation of the array, than the Pythonhigh efficiency that comes with a list of nested more, not just the operating efficiency, the development efficiency is very high . And subsequent data analysis tools are based on a lot of Numpydevelopment from, that would like Topsy data analysis must be installed and understand Numpy.

It is recommended to install Numpy with PIP. Normally, there are 2 methods: online installation and offline installation. If the network condition may also be in the cmdinput window:

C:\Users\Administrator>pip install numpy

PIP will automatically search for a matching Numpy version based on the Python version and install it.

If you have a poor network, because the link to the foreign site is not stable, it is very likely that the download will fail. Here I recommend the Alibaba Cloud mirror site http://mirrors.aliyun.com/pypi/simple/ . Friends can enter the Alibaba Cloud mirror point, download the corresponding package to the local, for example c:/盘, perform local installation. The local path of the file needs to be brought in the installation command, as follows:

C:\Users\Administrator>pip install c:/numpy-1.17.2-cp37-cp37m-win_amd64.whl

Here is how to find a version that suits you. cp37 means suitable for Python3.7, win_amd64 means suitable for windows 64bit platform. The file in whl format is essentially a compressed package, which contains py files and compiled pyd files for easy installation.

In the follow-up of this column, there will still be some libraries that need to be installed using PIP. Friends can try to use the Alibaba Cloud mirror site for offline installation, the routines are the same. "You can also use change source"

In essence, online installation is to go to the site to search for the appropriate whl file for installation.

After installation is complete, we try lead pack operation, if successful pilot package, it means that Numpythe installation of all the best. In Jupyter Notebookthe input and execute the following:

# numpy导包操作
import numpy as np
  • Pandas, a tool born to solve data analysis

To put it bluntly, Pandasit is actually a table container, and provides a lot of beautiful show operations, which can meet the daily needs of various "plug-ins".

ExcelEveryone use it, Exceland there are many operations, such as filtering, function, sorting, perspective, drawing, copying and so on.

But the big data era, Excelthere are many limitations, can only rely on automation VBA, but a limited degree of automation; a single Excelform limited capacity of over 100,000 lines to run hard, run efficiency not; not compatible with other tools, limited statistical functions, can not be customized ...

For Pandas, these are not a problem.

PandasIncorporating a large number of libraries and some standard data models, it provides the tools needed to efficiently manipulate large data sets. Relying on Pythonsyntax, it can easily functional programming and object-oriented programming, you can easily and various databases for docking, all kinds of functions can be customized according to the characteristics of the data set, and Pandasstill Basics of data mining and artificial intelligence.

The installation of Pandas is the same as Numpy. But it should be noted that Pandas is encapsulated based on Numpy, so the installation order is Numpy first and Pandas after. Do not change the order.

C:\Users\Administrator>pip install pandas

If the network speed is not good, it is recommended to install Alibaba Cloud offline. Please refer to Numpy for the process. "Or change the source"

After the installation is complete, we try to guide the package operation, if the package is successful, then everything is going well.

# pandas导包操作
import pandas as pd
  • Matplotlib, a must-learn visualization tool for beginners

matplotlib is a drawing tool developed based on Numpy, which can easily draw publication-quality graphics, and the effect is not a little bit better than Excel. Developers can generate drawings with just a few lines of code. Generally, line graphs, scatter graphs, histograms, pie charts, histograms, subgraphs, etc. can be drawn.

The installation process is also very simple. Just like Numpy, type in the command line:

C:\Users\Administrator>pip install matplotlib 

If the network speed is not good, it is recommended to install Alibaba Cloud offline. Please refer to Numpy for the process.

After installation is complete, we look at the demo, to see Jupyterand visualization tools together, what kind of spark clashes can it? We enter the following program:

# 这是 Python 的一个魔法函数,在命令行下起作用,方便图形在 Jupyter Notebook 中显示
%matplotlib inline
# 导包,约定俗成,固定格式
import matplotlib.pyplot as plt
import numpy as np

# 生成0-2π区间的含100项的等差数列
x = np.linspace(0,2*np.pi,100)
# 求x值对应的sin,赋值给y
y = np.sin(x)
# 绘图
plt.plot(x,y)

image-20200803201039003

The biggest feature of Matplotlib drawing is that it relies on Python. It is very convenient to clean the data, interact with the data, and interact with the graphical interface. It can be done in one stop. In the following courses, I will give you a detailed introduction on how to use this set of tools to do some cool things.

4. Pyecharts, a national goddess-level visualization tool

Before that, let's talk about it first Echarts.

Echarts It is an open source visualization library implemented using JavaScript. Since its inception, it has been quickly praised by friends. The effects are cool, efficient, interactive, highly customizable, etc., and the praises are not listed one by one.

For friends who learn Python, the only headache is the need to have a certain knowledge of JavaScript, especially for newcomers, this seems a little unfriendly.

But this is not a problem at all for those who have finished studying this column. Here we recommend a useful tool Pyecharts, the syntax is fully compatible with Python, the full effect of the Echartspar.

Let’s take a look at how to install it Pyecharts. cmdUse PIP to perform installation tasks in the command line:

C:\Users\Administrator>pip install pyecharts -U

It should be noted here that there are Pyechartstwo versions, Version0.5xand the Version1.xtwo versions are not compatible. v1.xThe syntax is fully embraced TypeHint, the expression is more OOP(Object Oriented Programming), and the writing is more flexible. v0.5xIt is still native, very close to the scripting language.

In line with the idea of ​​advancing with the times-advanced is easy to use, we recommend friends to use the latest version. In fact, v0.5xafter my various tests, there are some problems with the compatibility with the notebook, and v1.xthe effect is more stable. In order to reduce troubled version of factors could cause friends, case based on this column Version1.8.1to write.

After the installation is complete, let's take a look at the effect first, enter the following Python program:

# 导入绘图工具
from pyecharts import options as opts
from pyecharts.charts import Bar

attr = ["衬衫", "羊毛衫", "雪纺衫", "裤子" , "高跟鞋" , "袜子"]
v1 = [5, 20, 36, 10, 75, 90]
v2 = [10, 25, 8, 60, 20, 80]
bar = (
        Bar()
        .add_xaxis(attr)
        .add_yaxis("商家A", v1)
        .add_yaxis("商家B", v2)
        .set_global_opts(title_opts=opts.TitleOpts(title="Bar-基本示例", subtitle="我是副标题"))
    )
# 在 notebook 中输出
bar.render_notebook()

image-20200803201951148

If the above prompt appears, there is no problem, 1.9 has not been released yet!

5. Summary

At this point, the entire Python-based data analysis environment has been built. In general, the significance of building this environment lies in:

  • Provides a set of data cleaning platform, you can easily observe the law of data, and complete the statistics and analysis of corresponding indicators;
  • Provides a visualization platform. Shift from traditional drawing methods to automated, batch-based, and interactive visualization methods;
  • Expanded the source of data analysis. Python's capabilities will also be fully demonstrated in this respect. With this platform, your data sources are no longer limited to Excel spreadsheets. With the gradual lighting of skill points, you can freely obtain from various databases, online forms, and various text files.
  • Skills will be more comprehensive. The data you can manipulate is not only limited to numbers, text, pictures, etc. will become your operating objects; the level of operation is more refined, and the efficiency is greatly improved; the amount of data has also rapidly increased from thousands of dimensions to hundreds of Ten thousand levels. As for the larger amount of data, it depends on better hardware performance and certain modeling capabilities, but it is certain that the skills of Python data analysis will not be outdated;
  • Most importantly, it provides a ladder for skills upgrading and career advancement. Using this platform, you can focus on becoming a data analyst, you can transform into a big data engineer, and you can be promoted to become a data mining engineer, even a data scientist, and an algorithm expert.

So friends, what are you waiting for, hurry up and join the study.

Guess you like

Origin blog.csdn.net/qq_33254766/article/details/109290993