Python data analysis-Numpy, Matplotlib, Pandas

Python data analysis-Numpy, Matplotlib, Pandas


Jupyter uses
anaconda powershell prompt to open jupyter notebook
1. Input: cd C:\Users\asus\Desktop\iPython
2. Input: jupyter notebook

Python data analysis

Outline

Basic concepts and environment

matplotlib

Drawing

numpy

Processing numeric arrays

pandas

Handling data types such as numeric arrays, strings, time series, lists, dictionaries, etc.

0. Summary

1. Why study data analysis

  1. There is a job demand
  2. Is the foundation of Python data science
  3. Is the foundation of machine learning courses
  4. It is very convenient to find some very intuitive experiences and conclusions from a bunch of data for use by yourself or others

2. What is data analysis

Data analysis is to use appropriate methods to analyze a large amount of collected data to help people make judgments in order to take appropriate actions.

Data analysis process:
ask questions → prepare data → analyze data → obtain conclusions → visualize results/text/report

3. Environmental installation

Creation environment : conda creat——name python3 pyhton=3
Switch environment : windows: activate python3
official website address : www.anaconda.com/downdoad/

4. Know jupyter notebook

One, matplotlib

Why learn matplotlib :

  1. Able to visualize data and present it more intuitively
  2. Make the data more objective and persuasive

1. What is matplotlib

matplotlib : The most popular Python bottom-level plotting library, mainly for data visualization and charting . The name is based on MATLAB and is constructed in imitation of MATLAB.

2. The basic points of matplotlib

1. Matplotlib basic drawing


axis axis: refers to the x or y axis

Basic points :
each red point is a coordinate , and the coordinates of the 5 points are connected into a line to form a line chart

eg : Suppose the temperature (℃) every two hours (range(2,26,2)) in a day is [15,13,14.5,17,20,25,26,26,27,22,18, 15]

from matplotlib import pyplot as plt
       #导入pyplot
x=range(2,26,2)
       #数据在x轴的位置,是一个可迭代对象
y=[15,13,14.5,17,20,25,26,26,24,22,18,15]
       #数据在y轴的位置,是一个可迭代对象
       #x轴和y轴的数据一起组成了所有要绘制出的坐标
       #分别是(2,15)(4,13)(6,14.5)……
plt.plot(x,y)
       #传入x和y,通过plot绘制出折线图
plt.show()
       #在执行程序的时候展示图形

(Python and pycharm can be implemented)

Existing problems:

  1. Set the image size (I want a large HD uncoded image)
  2. Save to local
  3. Descriptive information, such as what the x-axis and y-axis represent, and what does this graph represent
  4. Adjust the spacing of x or y scales
  5. Line style (such as color, transparency, etc.)
  6. Mark special points (such as telling others where the highest and lowest points are)
  7. Add a watermark to the picture (anti-counterfeiting, prevent theft)

2. Matplotlib basic drawing and adjustment of X-axis scale

from matplotlib import pyplot as plt
       #导入pyplot
    
x=range(2,26,2)
       #数据在x轴的位置,是一个可迭代对象
    
y=[15,13,14.5,17,20,25,26,26,24,22,18,15]
       #数据在y轴的位置,是一个可迭代对象
       #x轴和y轴的数据一起组成了所有要绘制出的坐标
       #分别是(2,15)(4,13)(6,14.5)……

#设置图片大小
plt.figure(figsize=(20,8),dpi=80)

#绘图
plt.plot(x,y)
       #传入x和y,通过plot绘制出折线图

#设置x轴的刻度

# plt.xticks(x)   #步长2

# plt.xticks(range(2,25))

# _xtick_labels = [i/2 for i in range(2,49)]
# plt.xticks(_xtick_labels)

# _xtick_labels = [i/2 for i in range(2,49)]
# plt.xticks(_xtick_labels[::3])

_xtick_labels = [i/2 for i in range(2,49)]
plt.xticks(range(25,50))

#设置y轴的刻度

# plt.yticks(y)

plt.yticks(range(min(y),max(y)+1))

#保存
plt.savefig("./t1.png")   
    
#展示图形
plt.show()
       #在执行程序的时候展示图形

3. Matplotlib plots the temperature from 10 o'clock to 12 o'clock

Case
[1] If the list a represents the temperature every minute from 10 o'clock to 12 o'clock, how to draw a line graph to observe the change of the temperature every minute?
a= [random.randint(20,35) for i in range(120) ]

from matplotlib import pyplot as plt
import random

x = range(0,120)
y = [random.randint(20,35) for i in range(120)]

# 设置图片大小
plt.figure(figsize=(20,8),dpi=80)

plt.plot(x,y)

plt.show()

4. matplotlib settings display Chinese

from matplotlib import pyplot as plt
import random

x = range(0,120)
y = [random.randint(20,35) for i in range(120)]

# 设置图片大小
plt.figure(figsize=(20,8),dpi=80)

plt.plot(x,y)

# 调整x轴的刻度
# _x = x
# _xtick_labels = ["hello,{}".format(i) for i in _x]
# plt.xticks(x,_xtick_labels)

_xtick_labels = ["10点{}分".format(i) for i in range(60)]
_xtick_labels += ["11点{}分".format(i) for i in range(60)]

# 取步长,数字和字符串一一对应,数据的长度一样
plt.xticks(list(x)[::3],_xtick_labels[::3])

plt.show()

5. Matplotlib sets graphics information

6. Introduction and summary of the difference between matplotlib drawing multiple graphs and different graphs

3. Scatter plots, histograms, and histograms of matplotlib

1. Matplotlib draws scatter plots

2. Matplotlib draws bar graphs

3. Matplotlib draws multiple bar graphs

4. Matplotlib draws histogram

4. More drawing tools

Two, Numpy learning

1. What is Numpy

1. Numpy array creation

2. Numpy array calculation

2. Numpy basics

1. Numpy reads local data

2. Numpy indexing and slicing

3. Numpy more indexing methods

3. Numpy commonly used statistical methods

1. Data splicing

1. Random method in Numpy

1. Common statistical methods of Numpy and Nan

1. The practice of filling Nan and youbute data in Numpy

Three, Pandas learning

1. Understanding of Pandas series

2. Pandas reads external data

3. Pandas dataFrame creation

4. Dataframe description information of Pandas

3. Pandas dataframe index

3. Pandas bool index and processing of missing data

3. Movie number histogram

3. Common statistical methods of Pandas

3. The case of string discretization

3. Data consolidation

3. Data dispersion and aggregation

3. Data index learning

3. Data dispersion and aggregation exercises and summary

3. Pandas time series

3. Case

3. PM2.5 case

3. Douban TV case

Guess you like

Origin blog.csdn.net/qq_43210525/article/details/107335321