Hello, I am Yuechuang.
It is not easy to learn a programming language by yourself. From my own experience, you need to overcome several obstacles, from mentality, to physical strength, to thinking, in order to become a good programmer in the end. Therefore, to master a language, you need to feel it with your heart, and to appreciate the tacit understanding between you and "her".
But unfortunately, too many friends, not down in the middle, but they did not start, yes, they even Python
princess delicate hand did not touch.
Which version of Python should I download?
How to configure the Python environment?
Recommended several best use Python
IDE
and so on. Our challenge is that there are too many choices. At the gate of the harvest farm, we look at the watermelons and grapes, but are full of ambition but dare not move forward.
If you have not configured Python
the environment, then congratulations to you, follow this article to go, you will be able to harvest based Python3.7
build programming platform, as well as Jupyter
programming environment as the main tool, this article will take you install common tool for data analysis and visualization tools , One-stop shopping, no worries.
If you have installed Python3.7
the environment, or anaconda
the environment, you can also go down along the idea of this tool is not the point, is the ability to configure the environment.
But if you Python
re-install version of the official website unknown environment (forum download unknown), I strongly recommend that you uninstall ( Python
uninstall the same way as ordinary software, uninstall it in the Control Panel). After all, "knife sharpening does not cut wood by mistake" , some versions are incomplete, and no one can predict what bugs you will have.
This article takes windows 10 X64 system as an example to demonstrate the process of building a local Python data analysis environment.
1. Install Python environment
1.1 Python software download
As a great young man in the 21st century, always want to play the most authentic, pirated version? ! That was left over from the previous generation. Especially when the genuine version is free. It is recommended to download from the official Python website https://www.python.org/ . Some friends may load slowly. Don't worry about it. Unless there is a network problem, the website can be accessed.
After entering the official website, you can just follow the prompts on the map, select the Windows
tab to access Windows
the platform Python
version of the details of a single page ( Releases Page
), recommended to choose 3.7 or later. Python2.7
Will stop all updates in 2020, as has been brilliant Windows XP
, the ultimate slowly become the dust of history, so this column is not recommended to install Python2.7
.
In the Select Python
version should pay attention to their own version based on the same PC, if your version is 64, it recommended to choose x86-64
the version to download, or directly select the regular version. Remember median and computer Python
versions consistent, otherwise the flow of tears after all had dug pit ah.
For those using mac OS, please select the mac OS X tab to download. The steps are basically the same.
1.2 Python software installation
The installation process of Python is very user-friendly. You only need to double-click the installation program to set the installation path, "Remember to check: Add Python 3.7 to PATH" and then click Next until the installation is complete.
The above picture add python3.7 to Path
means that the program adds the installation path to the system environment variable by default during the installation process, this item must be checked!
Now that we have completed the initial Python3.7
installation. In cmd
the command line input python
, you can go directly to python
the environment:
C:\Users\Administrator>python
Python 3.7.4 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Continue entering our first line of the Python
statement, open tour of our data analysis it:
>> print("Life is short, I do data analysis in python!")
"Life is short, I do data analysis in python!"
If Python
compared to the princess, then how have fishes with a servant girl, is not it? We often say that Python
a wide range of applications and data analysis, reptiles, network programming, artificial intelligence, operation and maintenance and other fields, based on fact, because Python
there are a lot of third-party libraries, which greatly enriched the Python
ecological, makes Python
almost anything. So how do we install these third-party libraries?
Yes, the PIP
tool should be said to be a little expert in this area. PIP
Understand intuitively, that is Python
a small tool to manage third-party libraries tailored, played the role of maidservants. Let's tease it PIP
in cmd
the input command window to view the installation path and version of PIP:
C:\Users\Administrator>pip --version
pip 19.1 from d:\users\lemeng\appdata\local\programs\python\python37\lib\site-packages\pip-19.1-py3.7.4.egg\pip (python 3.7)
If the system prompts that PIP is not the latest version, please continue to execute the following commands:
C:\Users\Administrator>python -m pip install --upgrade pip
Is PIP that simple? We will announce later.
2. Jupyter Notebook, the most suitable programming tool for data analysis
To be precise, it is Jupyter Notebook
not only suitable for data analysis, but also very suitable for beginners to practice Python
.
Jupyter Notebook
It is a kind of "questions and answers" as an important feature of the Python
editor, as well typesetting notes, your program, the results of each order.
This column focuses on Python
data analysis process Jupyter Notebook
used. In the actual case, you enter a sentence and it returns the result of the program execution. And the memory of the program will be locked after execution, and the variables will be destroyed and the memory will be released until the program is closed.
This feature is particularly important in the field of data analysis. In the process of data analysis, all routines are based on the characteristics of the data itself. Especially for medium and large data tables, only when you are familiar with the characteristics of the data can you analyze the data in the next step. This is also the biggest difference between data programming and network programming.
We install it first Jupyter Notebook
, and then experience its convenience through an example.
We recommend one-button installation by PIP tool, cmd
enter the following command window:
C:\Users\Administrator>pip install jupyter
Here is the recommended pip replacement source for you, here is Windows 10 as an example:
First open to view the file extension: View -> check the file extension
First enter in the Windows folder "Shortcut key: Win icon + E" window: %APPDATA%
Then create a pip folder at the bottom, then go to the pip folder to create a pip.ini, and then enter the content inside
Enter the content:
[global] index-url = http://mirrors.aliyun.com/pypi/simple/ [install] trusted-host=mirrors.aliyun.com
Since then, the end of the change source
supplement:
Starting from pip10.0.0, there is a config subcommand that can be used to change the configuration, no need to care about the configuration file path under different operating systems.
详见讨论:Create a command to make it easy to access the configuration file · Issue #1736 · pypa/pip
Practical example:
# 阿里源 pip config set global.index-url http://mirrors.aliyun.com/pypi/simple/ # 豆瓣源 pip config set global.index-url https://pypi.douban.com/simple # 阿里云 http://mirrors.aliyun.com/pypi/simple/ # 科技大学 https://pypi.mirrors.ustc.edu.cn/simple/ # 豆瓣(douban) http://pypi.douban.com/simple/ # 清华大学 https://pypi.tuna.tsinghua.edu.cn/simple/ # 中国科学技术大学 http://pypi.mirrors.ustc.edu.cn/simple/
After the prompt installation is successful, it is recommended to set our commonly used Python learning path, so that new and saved files will be placed in this path by default. In the cmd
continuing input window:
C:\Users\Administrator>jupyter notebook --generate-config
Writing default config to C:\Users\Administrator\.jupyter\jupyter_notebook_config.py
On the Edit step prompt jupyter_notebool_config.py
file, find c.Notebook.notebok_dir
, let go of the previous comments, and complete the modification in accordance with the following figure (fill in the path according to their own situation, taking care to avoid placing Chinese route, give yourself digging), Save and Close.
So far, the Python environment has been configured. Let's take a look at it.
In the cmd
window, enter jupyter notebook
, start the environment:
C:\Users\Administrator>jupyter notebook
By New
menu, create a new Python 3
post-program, continue to enter the new page:
for i in range(10):
print(i, end=",")
jupyter notebook
The interface and effects are as follows:
In Jupyter
the Standard toolbar, there are save, cut, copy, paste, run, and stop button, move the mouse over the button will be prompted, in general, still very human, I believe that there should be no difficulty to get started.
Please note that my friends here, and in the subsequent sections of this column, unless otherwise specified, Python
the demonstration program is in Jupyter Notebook
conducted.
3. Three artifacts of data analysis
- Numpy, a basic module for scientific computing
In simple terms, Numpy
provides an N-dimensional array of containers, based on Numpy
, you can easily transform and calculation of the array, than the Python
high efficiency that comes with a list of nested more, not just the operating efficiency, the development efficiency is very high . And subsequent data analysis tools are based on a lot of Numpy
development from, that would like Topsy data analysis must be installed and understand Numpy.
It is recommended to install Numpy with PIP. Normally, there are 2 methods: online installation and offline installation. If the network condition may also be in the cmd
input window:
C:\Users\Administrator>pip install numpy
PIP will automatically search for a matching Numpy version based on the Python version and install it.
If you have a poor network, because the link to the foreign site is not stable, it is very likely that the download will fail. Here I recommend the Alibaba Cloud mirror site http://mirrors.aliyun.com/pypi/simple/ . Friends can enter the Alibaba Cloud mirror point, download the corresponding package to the local, for example c:/盘
, perform local installation. The local path of the file needs to be brought in the installation command, as follows:
C:\Users\Administrator>pip install c:/numpy-1.17.2-cp37-cp37m-win_amd64.whl
Here is how to find a version that suits you. cp37 means suitable for Python3.7, win_amd64 means suitable for windows 64bit platform. The file in whl format is essentially a compressed package, which contains py files and compiled pyd files for easy installation.
In the follow-up of this column, there will still be some libraries that need to be installed using PIP. Friends can try to use the Alibaba Cloud mirror site for offline installation, the routines are the same. "You can also use change source"
In essence, online installation is to go to the site to search for the appropriate whl file for installation.
After installation is complete, we try lead pack operation, if successful pilot package, it means that Numpy
the installation of all the best. In Jupyter Notebook
the input and execute the following:
# numpy导包操作
import numpy as np
- Pandas, a tool born to solve data analysis
To put it bluntly, Pandas
it is actually a table container, and provides a lot of beautiful show operations, which can meet the daily needs of various "plug-ins".
Excel
Everyone use it, Excel
and there are many operations, such as filtering, function, sorting, perspective, drawing, copying and so on.
But the big data era, Excel
there are many limitations, can only rely on automation VBA
, but a limited degree of automation; a single Excel
form limited capacity of over 100,000 lines to run hard, run efficiency not; not compatible with other tools, limited statistical functions, can not be customized ...
For Pandas
, these are not a problem.
Pandas
Incorporating a large number of libraries and some standard data models, it provides the tools needed to efficiently manipulate large data sets. Relying on Python
syntax, it can easily functional programming and object-oriented programming, you can easily and various databases for docking, all kinds of functions can be customized according to the characteristics of the data set, and Pandas
still Basics of data mining and artificial intelligence.
The installation of Pandas is the same as Numpy. But it should be noted that Pandas is encapsulated based on Numpy, so the installation order is Numpy first and Pandas after. Do not change the order.
C:\Users\Administrator>pip install pandas
If the network speed is not good, it is recommended to install Alibaba Cloud offline. Please refer to Numpy for the process. "Or change the source"
After the installation is complete, we try to guide the package operation, if the package is successful, then everything is going well.
# pandas导包操作
import pandas as pd
- Matplotlib, a must-learn visualization tool for beginners
matplotlib is a drawing tool developed based on Numpy, which can easily draw publication-quality graphics, and the effect is not a little bit better than Excel. Developers can generate drawings with just a few lines of code. Generally, line graphs, scatter graphs, histograms, pie charts, histograms, subgraphs, etc. can be drawn.
The installation process is also very simple. Just like Numpy, type in the command line:
C:\Users\Administrator>pip install matplotlib
If the network speed is not good, it is recommended to install Alibaba Cloud offline. Please refer to Numpy for the process.
After installation is complete, we look at the demo, to see Jupyter
and visualization tools together, what kind of spark clashes can it? We enter the following program:
# 这是 Python 的一个魔法函数,在命令行下起作用,方便图形在 Jupyter Notebook 中显示
%matplotlib inline
# 导包,约定俗成,固定格式
import matplotlib.pyplot as plt
import numpy as np
# 生成0-2π区间的含100项的等差数列
x = np.linspace(0,2*np.pi,100)
# 求x值对应的sin,赋值给y
y = np.sin(x)
# 绘图
plt.plot(x,y)
The biggest feature of Matplotlib drawing is that it relies on Python. It is very convenient to clean the data, interact with the data, and interact with the graphical interface. It can be done in one stop. In the following courses, I will give you a detailed introduction on how to use this set of tools to do some cool things.
4. Pyecharts, a national goddess-level visualization tool
Before that, let's talk about it first Echarts
.
Echarts
It is an open source visualization library implemented using JavaScript. Since its inception, it has been quickly praised by friends. The effects are cool, efficient, interactive, highly customizable, etc., and the praises are not listed one by one.
For friends who learn Python, the only headache is the need to have a certain knowledge of JavaScript, especially for newcomers, this seems a little unfriendly.
But this is not a problem at all for those who have finished studying this column. Here we recommend a useful tool Pyecharts
, the syntax is fully compatible with Python, the full effect of the Echarts
par.
Let’s take a look at how to install it Pyecharts
. cmd
Use PIP to perform installation tasks in the command line:
C:\Users\Administrator>pip install pyecharts -U
It should be noted here that there are Pyecharts
two versions, Version0.5x
and the Version1.x
two versions are not compatible. v1.x
The syntax is fully embraced TypeHint
, the expression is more OOP
(Object Oriented Programming), and the writing is more flexible. v0.5x
It is still native, very close to the scripting language.
In line with the idea of advancing with the times-advanced is easy to use, we recommend friends to use the latest version. In fact, v0.5x
after my various tests, there are some problems with the compatibility with the notebook, and v1.x
the effect is more stable. In order to reduce troubled version of factors could cause friends, case based on this column Version1.8.1
to write.
After the installation is complete, let's take a look at the effect first, enter the following Python program:
# 导入绘图工具
from pyecharts import options as opts
from pyecharts.charts import Bar
attr = ["衬衫", "羊毛衫", "雪纺衫", "裤子" , "高跟鞋" , "袜子"]
v1 = [5, 20, 36, 10, 75, 90]
v2 = [10, 25, 8, 60, 20, 80]
bar = (
Bar()
.add_xaxis(attr)
.add_yaxis("商家A", v1)
.add_yaxis("商家B", v2)
.set_global_opts(title_opts=opts.TitleOpts(title="Bar-基本示例", subtitle="我是副标题"))
)
# 在 notebook 中输出
bar.render_notebook()
If the above prompt appears, there is no problem, 1.9 has not been released yet!
5. Summary
At this point, the entire Python-based data analysis environment has been built. In general, the significance of building this environment lies in:
- Provides a set of data cleaning platform, you can easily observe the law of data, and complete the statistics and analysis of corresponding indicators;
- Provides a visualization platform. Shift from traditional drawing methods to automated, batch-based, and interactive visualization methods;
- Expanded the source of data analysis. Python's capabilities will also be fully demonstrated in this respect. With this platform, your data sources are no longer limited to Excel spreadsheets. With the gradual lighting of skill points, you can freely obtain from various databases, online forms, and various text files.
- Skills will be more comprehensive. The data you can manipulate is not only limited to numbers, text, pictures, etc. will become your operating objects; the level of operation is more refined, and the efficiency is greatly improved; the amount of data has also rapidly increased from thousands of dimensions to hundreds of Ten thousand levels. As for the larger amount of data, it depends on better hardware performance and certain modeling capabilities, but it is certain that the skills of Python data analysis will not be outdated;
- Most importantly, it provides a ladder for skills upgrading and career advancement. Using this platform, you can focus on becoming a data analyst, you can transform into a big data engineer, and you can be promoted to become a data mining engineer, even a data scientist, and an algorithm expert.
So friends, what are you waiting for, hurry up and join the study.