Use Matplotlib to draw various charts
- Matplotlib part
Matplotlib part
Using Python visualization here mainly introduces matplotlib.
Pyecharts and Seaborn have the opportunity to introduce them systematically in the future.
Matplotlib installation
Method 1: Command line installation in windows environment: pip install matplotlib; pip3 install matplotlib in mac environment.
Method 2: Use the anaconda environment.
first drawing
import matplotlib.pyplot as plt
x=[0,1,2,3,4]
y=[0,1,2,3,4]
plt.plot(x,y)
At this time, it corresponds to {(x,y)=(0,0),(1,1),(2,2),(3,3),(4,4)}
but plt.plot(x,y) is just Drawing commands, if you want to display, you need to add a show statement.
import matplotlib
import matplotlib.pyplot as plt
x=[0,1,2,3,4]
y=[0,1,2,3,4]
plt.plot(x,y)
plt.show()
The result is as follows:
Title and Axis Names
Title naming: plt.title('标题内容')
x-axis naming: plt.xlabel('x轴名字')
y-axis naming:plt.ylabel('y轴名字')
Note:
The plt here is import matplotlib.pyplot as plt
or from matplotlib import pyplot as plt
the plt declared here, which needs to be declared when using it.
If so from matplotlib import pyplot
, it needs to be written out completely, for examplepyplot.xlabel('x轴名字')
It is recommended to use jupyter to write, and the graphics can be displayed without using the jupyter interactive notebook plt.show()
. Here is a demo:
import matplotlib.pyplot as plt
x=[-1,1,2,3,4]
y=[-1,1,2,3,4]
plt.xlabel('x轴数据')
plt.ylabel('y轴数据')
plt.title('示例1')
plt.plot(x,y)
Add more detail to the line chart
marker——data point marker
from matplotlib import pyplot as plt
x=[-1,1,2,3,4]
y=[-1,1,2,3,4]
plt.xlabel('x轴数据')
plt.ylabel('y轴数据')
plt.title('示例1')
plt.plot(x,y)
When drawing a line chart at work, it is often necessary to mark the data points with different details. Here, the marker parameter should be set:
plt.plot(x,y,marker='.')
After adding the marker parameter to the plot statement just now:
use the markersize parameter to adjust the point size: plt.plot(x,y,marker='.',markersize=10)
use the color parameter to adjust the point color: plt.plot(x,y,marker='.',color='red')
, the color here can be set by yourself using the HEX code, such as plt.plot(x,y,marker='.',color='#2614e8')
the line width parameter for the line pair thickness: plt.plot(x,y,marker='.',linewidth=3)
adjust the color of the point border Use markeredgecolor parameter: plt.plot(x,y,marker='.',markeredgecolor='blue')
line style adjustment with linestyle parameter:plt.plot(x,y,marker='.',linestyle='dashed')
Overall effect:
plt.plot(x,y,marker='.',markersize=10,color='red',linewidth=3,markeredgecolor='blue')
draw multiple polylines
from matplotlib import pyplot as plt
dev_x = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_y = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
py_dev_y = [45372, 48876, 53850, 57287, 63016,65998, 70003, 70000, 71496, 75370, 83640]
plt.plot(dev_x,dev_y)
plt.plot(dev_x,py_dev_y)
Two line graphs can be drawn on one drawing with two plot statements:
in order to make it more obvious which line corresponds to which data, it is necessary to add an illustration, using the label parameter: plt.plot(x轴数据,y轴数据, label='名字')
to supplement the above code:
from matplotlib import pyplot as plt
dev_x = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_y = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
py_dev_y = [45372, 48876, 53850, 57287, 63016,65998, 70003, 70000, 71496, 75370, 83640]
plt.plot(dev_x,dev_y,label='所有开发人员')
plt.plot(dev_x,py_dev_y,label='python开发人员')
plt.legend()
Note: To display the icon after using the label parameter, you need to add a plt.legend()
statement. Since I am writing in jupyter notebook, the statement can be omitted . If it is not an interactive notebook , the statement needs to be added at the end to display the visual chart plt.show()
when running the program . plt.show()
Add the third piece of data here, and then use the marker to optimize the chart:
from matplotlib import pyplot as plt
dev_x = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_y = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
py_dev_y = [45372, 48876, 53850, 57287, 63016,65998, 70003, 70000, 71496, 75370, 83640]
js_dev_y = [37810, 43515, 46823, 49293, 53437,56373, 62375, 66674, 68745, 68746, 74583]
plt.plot(dev_x,dev_y,'r--',label='所有开发人员')
plt.plot(dev_x,py_dev_y,'b^--',label='python开发人员')
plt.plot(dev_x,js_dev_y,'go--',label='Js开发人员')
plt.legend()
plt.title('不同语言开发人员不同年龄收入情况')
plt.xlabel('年龄')
plt.ylabel('收入')
Simplified writing is used here: (fmt mode)
plt.plot(dev_x,dev_y,[fmt],label='所有开发人员')
# fmt=[颜色][marker][linestyle]
# 'go--'表示color='green',marker='o',linestyle='dashed',linewidth=2,markersize=12
For details, you can refer to the official document according to your own matplotlib version: 3.3.2 plot parameters in matplotlib.pyplot
Turn on the grid function
In order to obtain image data information more clearly, you need to use the grid parameter to enable the grid function:plt.grid()
from matplotlib import pyplot as plt
dev_x = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_y = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
py_dev_y = [45372, 48876, 53850, 57287, 63016,65998, 70003, 70000, 71496, 75370, 83640]
js_dev_y = [37810, 43515, 46823, 49293, 53437,56373, 62375, 66674, 68745, 68746, 74583]
plt.plot(dev_x,dev_y,'r--',label='所有开发人员')
plt.plot(dev_x,py_dev_y,'b^--',label='python开发人员')
plt.plot(dev_x,js_dev_y,'go--',label='Js开发人员')
plt.legend()
plt.title('不同语言开发人员不同年龄收入情况')
plt.xlabel('年龄')
plt.ylabel('收入')
plt.grid()
Beautify charts with style files
首先查看一下有什么风格:print(plt.style.available)
[‘Solarize_Light2’, ‘_classic_test_patch’, ‘bmh’, ‘classic’, ‘dark_background’, ‘fast’, ‘fivethirtyeight’, ‘ggplot’, ‘grayscale’, ‘seaborn’, ‘seaborn-bright’, ‘seaborn-colorblind’, ‘seaborn-dark’, ‘seaborn-dark-palette’, ‘seaborn-darkgrid’, ‘seaborn-deep’, ‘seaborn-muted’, ‘seaborn-notebook’, ‘seaborn-paper’, ‘seaborn-pastel’, ‘seaborn-poster’, ‘seaborn-talk’, ‘seaborn-ticks’, ‘seaborn-white’, ‘seaborn-whitegrid’, ‘tableau-colorblind10’]
Now use a style for comparison:
plt.plot(dev_x,dev_y,'r--',label='所有开发人员')
plt.plot(dev_x,py_dev_y,'b^--',label='python开发人员')
plt.plot(dev_x,js_dev_y,'go--',label='Js开发人员')
plt.legend()
plt.title('不同语言开发人员不同年龄收入情况')
plt.xlabel('年龄')
plt.ylabel('收入')
plt.style.use('tableau-colorblind10')
plt.rcParams['font.sans-serif'] = ['SimHei']
You can also use anime style: plt.xkcd()
, but please note that plt.xkcd()
there is no Chinese font library, and it is only applicable to pure English charts.
from matplotlib import pyplot as plt
dev_x = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
dev_y = [38496, 42000, 46752, 49320, 53200, 56000, 62316, 64928, 67317, 68748, 73752]
py_dev_y = [45372, 48876, 53850, 57287, 63016,65998, 70003, 70000, 71496, 75370, 83640]
js_dev_y = [37810, 43515, 46823, 49293, 53437,56373, 62375, 66674, 68745, 68746, 74583]
plt.xkcd()
plt.plot(dev_x,dev_y,'r--',label='All')
plt.plot(dev_x,py_dev_y,'b^--',label='python')
plt.plot(dev_x,js_dev_y,'go--',label='Js')
plt.grid()
plt.legend()
plt.title('Title')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()
line chart with shade
Import data using Pandas
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
The data structure is shown in the figure:
Draw a line chart
plt.plot(data['Age'],data['All_Devs'],label='All')
plt.plot(data['Age'],data['Python'],label='Python')
plt.legend()
add shadows
Shadow parameters:plt.fill_between()
plt.fill_between(data['Age'],data['Python'])
You will find that this will cause the line chart to be very unclear. Here you can adjust the transparency:alpha=0.2
plt.fill_between(data['Age'],data['Python'],alpha=0.2)
set threshold
Threshold line is set to 60000:overall_mid=60000
overall_mid=60000
plt.fill_between(data['Age'],data['Python'],overall_mid,alpha=0.2)
Conditional statement to filter shadow position
plt.fill_between(data['Age'],data['Python'],overall_mid,where=(data['Python'] > overall_mid),alpha = 0.2)
It looks a bit awkward here, and can be optimized with gradient parameters:interpolate=True
plt.fill_between(data['Age'],data['Python'],overall_mid,where=(data['Python'] > overall_mid),interpolate=True,alpha = 0.2)
add more details
Can be used color=‘颜色’
to control the color of the shadow area and label
add labels.
plt.fill_between(data['Age'],data['Python'],data['All_Devs'],where=(data['Python'] > data['All_Devs']),interpolate=True,alpha = 0.2,label='Python > All')
plt.fill_between(data['Age'],data['Python'],data['All_Devs'],where=(data['Python'] <= data['All_Devs']),interpolate=True,alpha = 0.2,color='red',label='Python <= All')
histogram
Read data using pandas
Use pandas to import data from csv files:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.xkcd()
data = pd.read_csv('data.csv')
data.head()
The data structure is shown in the figure:
Count the specific languages in the LanguagesWorkedWith column:
from collections import Counter
language_responses=data['LanguagesWorkedWith']
cnt = Counter()
for l in language_responses:
cnt.update(l.split(';'))
Take the first 15:cnt.most_common(15)
lang=[]
popularity=[]
for c in cnt.most_common(15):
lang.append(c[0])
popularity.append(c[1])
Draw a histogram of the extracted data
Draw a histogram:plt.bar(x,y)
plt.bar(lang,popularity)
plt.title('Top 15 Languages')
plt.xlabel('Language')
plt.ylabel('Popularity')
It is found that the x-axis data cannot be fully displayed. Here are three solutions:
Solution 1: Zoom in on the chartplt.figure(figsize=(10,8))
Solution 2: The text on the x-axis is tilted by 60 degrees. plt.xticks(rotation=60)
Solution 3: Flip the x and y axes. plt.barh(lang,popularity)
If you want to arrange it from large to small instead of small to large, you need to invert the data.
lang.reverse()
popularity.reverse()
stacked column chart
Import Data
minutes = [1, 2, 3, 4, 5, 6, 7, 8, 9]
player1 = [1, 2, 3, 3, 4, 4, 4, 4, 5]
player2 = [1, 1, 1, 1, 2, 2, 2, 3, 4]
player3 = [1, 5, 6, 2, 2, 2, 3, 3, 3]
Draw a simple stacked graph
plt.bar(minutes, player1)
plt.bar(minutes, player2)
plt.bar(minutes, player3)
Obviously there is a problem with the stacked chart here, some data is hidden and cannot be displayed. Here you need to set the index .
index_x = np.arange(len(minutes))
w= 0.15
plt.bar(index_x-w,player1,width=w)
plt.bar(index_x,player2,width=w)
plt.bar(index_x+w,player3,width=w)
This stacking method needs to set the width by itself, and a simpler method can be used: stackplot
.
plt.stackplot(minutes, player1, player2, player3)
Enrich the details:
labels=['class1','class2','class3']
colors = ['Blue','Red','Green']
plt.stackplot(minutes,player1,player2,player3,labels=labels,colors=colors)
plt.legend()
The display label needs to be addedplt.legend()
, and the position of the label can be modified to avoid overlapping with the content of the pictureplt.legend(loc(坐标))
plt.legend(loc=(0.1,0.8))
More ways to use
ages = [18, 19, 21, 25, 26, 26, 30, 32, 38, 45, 55]
If this set of data is drawn in a histogram, because there is no repetition, the height of each column is the same. Here you can use the grouping function:plt.hist=(数据, bins=频次)
plt.hist(ages,bins=4)
In this way, the data can be evenly cut into four ranges.
The four ranges here are 18-27.25, 27.25-36.5, 36.5-45.75, and 45.75-55
will be more obvious when the dividing line is added:edgecolor=‘颜色’
plt.hist(ages,bins=4,edgecolor='black')
Of course, the bins here can also be entered manually:
bins=[20,30,40,50,60]
plt.hist(ages,bins,edgecolor='black')
Practical case
Import data from pandas:
data=pd.read_csv('data.csv')
data.head()
Break it down into five groups:
plt.hist(data.Age,bins=5,edgecolor='black')
Custom grouping:
bins=[10,20,30,40,50,60,70,80,90,100]
plt.hist(data.Age,bins,edgecolor='black')
Since the y-axis data is relatively large, scientific notation can be used here:log=True
bins=[10,20,30,40,50,60,70,80,90,100]
plt.hist(data.Age,bins,edgecolor='black',log=True)
Here you can clearly see the difference between the two graphs. After using scientific notation, you can see that 80-90 years old is less than 90-100 years old, and the graph that is not used is very blurred in the 80-100 age range.
Add average auxiliary line
Average auxiliary line:plt.axvline=(中位数)
median_age=data.Age.mean()
plt.axvline(median_age,color='red',label='Median')
plt.legend()
pie chart
Draw the first pie chart
Enter the data first:
import matplotlib.pyplot as plt
list1 =['JavaScript','HTML/CSS','SQL','Python','Java']
list2 = [59219,55466,47544,36443,35917]
Generate pie charts with Pie mode:plt.pie(数值类型,labels='对应名称')
plt.pie(list2,labels=list1)
add explosion effect
Explosion effect parameters:explode=explo
explo = [0,0,0,0.1,0]
# 选择排名第4的数据
plt.pie(list2,labels=list1,explode=explo)
explo = [0.1,0,0,0.1,0]
# 选择排名第一和第三的数据
plt.pie(list2,labels=list1,explode=explo)
add shadow
Shadow parameters:shadow=True
explo = [0.1,0,0,0,0]
plt.pie(list2,labels=list1,explode=explo,shadow=True)
Modify the position of the first block
Custom position parameters: startangle=0
, to rotate counterclockwise.
At that timestartangle=90
, the position of the first block was in the upper left corner:
At that timestartangle=180
, the position of the first block was in the lower left corner:
At that timestartangle=270
, the position of the first block was in the lower right corner:
show percentage
Percentage parameter: autopct='%1.2f%%'
%1.2f here means 2 digits of precision after the decimal point, and %% means to display the percent sign (the first percent sign is the conversion character)
explo = [0.1,0,0,0,0]
plt.pie(list2,labels=list1,explode=explo,shadow=True,startangle=0,autopct='%1.2f%%')
change image border
Boundary control parameters: wedgeprops={'edgecolor':'black'}
This way of writing means that the boundary color is outlined in black.
explo = [0.1,0,0,0,0]
plt.pie(list2,labels=list1,explode=explo,shadow=True,startangle=0,autopct='%1.2f%%',wedgeprops={
'edgecolor':'black'})
add title
The title is the same as other pictures plt.title('标题')
In order to make the title more suitable for mobile display, you can add plt.tight_layout()
the compact mode
explo = [0.1,0,0,0,0]
plt.pie(list2,labels=list1,explode=explo,shadow=True,startangle=0,autopct='%1.2f%%',wedgeprops={
'edgecolor':'black'})
plt.title('最受欢迎的语言占比情况')
plt.tight_layout()
Start drawing a Pie pie chart from data import
Import data using Pandas
import pandas as pd
import numpy as np
fifa = pd.read_csv('fifa_data.csv')
fifa.head()
Filter out the number of players who prefer to play with left or right foot:
left = fifa.loc[fifa['Preferred Foot']=='Left'].count()[0]
right = fifa.loc[fifa['Preferred Foot']=='Right'].count()[0]
draw pie chart
plt.pie([left,right])
Then start to beautify:
labels = ['Left','Right']
explo=[0.1,0]
plt.pie([left,right],labels=labels,explode=explo,shadow=True,startangle=0,autopct='%1.2f%%',wedgeprops={
'edgecolor':'black'})
Plotting Weight data with strings
Let’s look at the data first:
It’s obviously not possible to draw a pie chart directly with the data with the string 'lbs'. Here are two ideas:
1. .strip('lbs')
2. .replace('lbs','')
Use idea 1 to deal with it in detail:
def func1(d1):
if type(d1)==str:
return int(d1.strip('lbs'))
fifa['Weight2']=fifa.Weight.apply(func1)
Classify different Weight
class1=fifa.loc[fifa.Weight2 < 125].count()[0]
class2 = fifa.loc[(fifa.Weight2 >= 125) & (fifa.Weight2 < 150)].count()[0]
class3 = fifa.loc[(fifa.Weight2 >= 150) & (fifa.Weight2 < 175)].count()[0]
class4 = fifa.loc[(fifa.Weight2 >= 175) & (fifa.Weight2 < 200)].count()[0]
class5 = fifa.loc[fifa.Weight2 > 200].count()[0]
data into listlist= [class1,class2,class3,class4,class5]
Draw a pie chart on the processed data
labels = ['< 125 ','125-150','150-175','175-200', '> 200']
explo=[0.4,0.2,0,0,0.4]
plt.pie(list,labels=labels,explode=explo,shadow=True,startangle=0,autopct='%1.2f%%',wedgeprops={
'edgecolor':'black'})
Here it is found that the smallest ratio is too small, the display is not obvious, you can modify the size of the canvas
plt.figure(figsize=(8,5),dpi = 100)
reuse pctdistance=0.8
control spacing
plt.pie(list,labels=labels,explode=explo,pctdistance=0.8,shadow=True,startangle=0,autopct='%1.2f%%',wedgeprops={
'edgecolor':'black'})
Scatterplot
Scatter plot drawing:plt.scatter(x数据,y数据)
plt.scatter(x,y,s=100,color='red',edgecolor='black',alpha=0.8)
# s是size,点的大小
plt.grid()
You can also cluster with different colors:
x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]
colors = [447, 445, 449, 447, 445, 447, 442, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]
plt.scatter(x,y,s=100,c=colors,edgecolor='black',alpha=0.8)
plt.grid()
If the color is not intuitive, or confusing, you can add more graphic details:
x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]
colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]
plt.scatter(x,y,s=100,c=colors,edgecolor='black',alpha=0.8)
plt.grid()
cbar = plt.colorbar()
cbar.set_label('Label')
Import data from Pandas
First look at the data structure:
df = pd.read_csv('2019-05-31-data.csv')
df.head()
Draw a scatterplot:plt.scatter(df.view_count,df.likes)
Optimization of details
plt.figure(figsize=(10,6))
plt.scatter(df.view_count,df.likes,c='red',edgecolors='black',linewidths=1,alpha=0.9)
plt.xscale('log')
# 数据堆叠在一起,采用对数坐标更加明显
plt.yscale('log')
But here I want to df.ratio
add elements to the scatterplot:
plt.figure(figsize=(10,6))
plt.scatter(df.view_count,df.likes,c=df.ratio,edgecolors='black',linewidths=1,alpha=0.9)
plt.xscale('log')
plt.yscale('log')
cbar = plt.colorbar()
cbar.set_label('Like&Dislike')
Time Series Data Processing
Traditional String Performance Effects
import matplotlib.pyplot as plt
from datetime import datetime,timedelta
# 因为是时间序列,所以需要使用datetime
x = ['2019-5-24','2019-5-25','2019-5-26','2019-5-27','2019-5-28','2019-5-29','2019-5-30','2019-6-30']
y = [0,1,3,4,6,5,7,3]
plt.plot(x,y)
If it is plt.plot
drawn in this traditional way, the time below looks messy, because the plt.plot
default str
method is a string. In addition, there is one most important problem with the string: 2019-5-30 to 2019-6-30 is actually a month apart, but they are two adjacent units on the line chart.
Use plt.plot_date method
x = [
datetime(2019,5,24),
datetime(2019,5,25),
datetime(2019,5,26),
datetime(2019,5,27),
datetime(2019,5,28),
datetime(2019,5,29),
datetime(2019,5,30),
]
y = [0,1,3,4,6,5,7,3]
plt.plot_date(x2,y)
At first glance, it seems that there is no difference, but the attribute of the x-axis data is yes datetime
, no str
. Now connect the points with polylines.
plt.style.use('seaborn')
plt.plot_date(x,y,linestyle='solid')
However, as the amount of data becomes larger, the x-axis data will still be blurred, for example:
x2 = [
datetime(2019,5,24),
datetime(2019,5,25),
datetime(2019,5,26),
datetime(2019,5,27),
datetime(2019,5,28),
datetime(2019,5,29),
datetime(2019,5,30),
datetime(2019,6,24),
datetime(2019,6,25),
datetime(2019,6,26),
datetime(2019,6,27),
datetime(2019,6,28),
datetime(2019,6,29),
datetime(2019,6,30),
]
y2 = [0,1,3,4,6,5,7,0,1,3,4,6,5,7]
plt.plot_date(x2,y2,linestyle='solid')
Although the time interval of one month is solved here, the line chart will show the interval of one month, but the problem of the x-axis here is very obvious. Let's start discussing solutions:
x-axis display fuzzy solution
plt.plot_date(x2,y2,linestyle='solid')
plt.gcf().autofmt_xdate()
# gcf是获得图表的控制权,gca是获得坐标轴控制权
# plt.gcf().autofmt_xdate()可以自动调整x轴日期格式
Of course, you can also set the date format yourself:
from matplotlib import dates as mpl_dates
plt.plot_date(x2,y2,linestyle='solid')
plt.gcf().autofmt_xdate()
date_format=mpl_dates.DateFormatter('%b,%d %Y')
# 用月份-日期-年份格式
plt.gca().xaxis.set_major_formatter(date_format)
Using Pandas to import financial data analysis
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime,timedelta
from matplotlib import dates as mpl_dates
df = pd.read_csv('data.csv')
df.head()
Note that the time here is not necessarily in datetime format, you need to check it
df.info()
Sure enough, it is in string format and needs to be adjusted to datetime:
df.Date = pd.to_datetime(df.Date)
df.info()
Sort the time series to see if there is any problem
df.sort_values('Date',inplace=True)
df.head()
Next, start drawing the trend chart:
plt.plot_date(df.Date,df.Close,linestyle='solid')
plt.gcf().autofmt_xdate()
Enrich with more details:
plt.plot_date(df.Date,df.Close, linestyle='solid')
plt.gcf().autofmt_xdate()
date_format = mpl_dates.DateFormatter('%b,%d %Y')
plt.gca().xaxis.set_major_formatter(date_format)
plt.title('Bitcoin Price')
plt.xlabel('Date')
plt.ylabel('Price USD')
real-time data processing
traditional drawing
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
x = [0,1,2,3,4,5]
y = [0,1,2,3,4,5]
plt.plot(x,y)
This type of data can be drawn in this way, but if it is real-time data, such as stocks, sensor feedback data, etc., how should it be processed?
Use iterator to set a real-time data
import random
from itertools import count
index = count()
x1=[]
y1=[]
def animate(i):
x1.append(next(index)) #next(index)是一个计数器,0,1,2...
y1.append(random.randint(0,50))
plt.plot(x1,y1)
for i in range(50):
animate(i)
Let the program run automatically
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import HTML
from itertools import count
import random
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
index = count()
x1=[]
y1=[]
def animate(i):
x1.append(next(index)) #next(index)是一个计数器,0,1,2...
y1.append(random.randint(0,50))
plt.cla() #plt.cla()可以控制实时图线条颜色不变化
plt.plot(x1,y1)
ani = FuncAnimation(plt.gcf(),animate,interval=1000)
# plt.gcf()获取控制权
# 调用animate函数
# interval=1000:间隔1000毫秒(1秒)
HTML(ani.to_jshtml())
Get real-time data and save it to a file and then load it into the notebook
The data source of the above case comes from the random and count functions, so what if the data source is loaded into an external interface or obtained in real time?
Design an external file to obtain data in real time
import csv
import random
import time
x_value = 0
y1 = 1000
y2 = 1000
fieldname=["x","y1","y2"]
with open('data.txt','w') as csvfile:
csv_w = csv.DictWriter(csvfile,fieldnames=fieldname)
csv_w.writeheader() #写入表头
while True:
with open('data.txt','a') as csvfile:
csv_w = csv.DictWriter(csvfile,fieldnames=fieldname)
info = {
"x" : x_value,
"y1" : y1 ,
"y2" : y2 ,
}
x_value += 1
y1 = y1 + random.randint(-6,10)
y2 = y2 + random.randint(-4,5)
csv_w.writerow(info)
time.sleep(1) #设置运行间隔1s
As long as this program is running, a set of real-time data will be generated every 1s and stored in the data.txt file.
Next, read the file data to draw a continuously changing real-time data graph:
import pandas as pd
import matplotlib.pyplot as plt
from itertools import count
import random
from matplotlib.animation import FuncAnimation
plt.style.use('fivethirtyeight')
def animate2(i):
dfa = pd.read_csv('data.txt')
x = dfa.x
y1 = dfa.y1
y2 = dfa.y2
plt.cla()
plt.plot(x,y1,label='Stock1')
plt.plot(x,y2,label='Stock2')
plt.legend()
ani2 = FuncAnimation(plt.gcf(),animate2,interval=1000)
# plt.gcf()获取控制权
# 调用animate函数
# interval=1000:间隔1000毫秒(1秒)
plt.show()
It is recommended to run in pycharm, jupyter notebook can only display 100 data.
Chart Multiplot
In some cases it is necessary to fig
draw a、b、c
the graph in one, so multiple drawing of the graph is required
Traditional method of drawing
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')
data = pd.read_csv('data.csv')
data.head()
Extract the data and plot it:
ages = data.Age
all_dev = data.All_Devs
py= data.Python
js = data.JavaScript
plt.plot(ages,all_dev,label='All')
plt.plot(ages,py,label='Python')
plt.plot(ages,js,label='JS')
plt.legend()
plt.xlabel('Age')
plt.ylabel('Sal')
So how to draw the three pieces of information in this picture into a picture?
Enable multiple charts
fig,ax=plt.subplots(nrows=2,ncols=1)
# 一个fig存在2行1列的小图
In order to better identify these two pictures, they are named ax1 and ax2 respectively:
fig,(ax1,ax2)=plt.subplots(nrows=2,ncols=1)
Import data into multiple charts
The way to import the chart: change plt
to图片名
fig,(ax1,ax2)=plt.subplots(nrows=2,ncols=1)
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python')
ax2.plot(ages,js,label='JS')
ax1.legend()
ax2.legend()
Similarly, if there are three pictures:
fig,(ax1,ax2,ax3)=plt.subplots(nrows=3,ncols=1)
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python',color='g')
ax3.plot(ages,js,label='JS',color='r')
ax1.legend()
ax2.legend()
ax3.legend()
shared x-axis
Since the x-axis data of the three graphs are the same, the x-axis can be shared to make the graph look more concise:sharex=True
fig,(ax1,ax2,ax3)=plt.subplots(nrows=3,ncols=1,sharex=True)
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python',color='g')
ax3.plot(ages,js,label='JS',color='r')
ax1.legend()
ax2.legend()
ax3.legend()
ax3.set_xlabel('Age')
shared y-axis
fig , (ax1,ax2,ax3) = plt.subplots(nrows=1,ncols=3,sharey=True)
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python',color='g')
ax3.plot(ages,js,label='JS',color='r')
ax1.legend()
ax2.legend()
ax3.legend()
ax3.set_xlabel('Age')
ax1.set_ylabel('Salary')
dynamic loading
If you don't know how many rows and columns you have before drawing the chart, it is obviously not advisable to set the sum, nrows
and you can use the dynamic loading method at this time.ncols
fig = plt.figure()
ax1 = fig.add_subplot(311)
# 311代表3行1列第1个
ax2 = fig.add_subplot(312)
# 312代表3行1列第2个
ax3 = fig.add_subplot(313)
# 313代表3行1列第3个
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python',color='g')
ax3.plot(ages,js,label='JS',color='r')
ax1.legend()
ax2.legend()
ax3.legend()
ax3.set_xlabel('Age')
Change a parameter:
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
You can continue to change according to your needs:
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
Grid mode for drawing more complex layouts
ax1 = plt.subplot2grid((6,1),(0,0),rowspan=2,colspan=1)
# 设置一个6行1列的布局
# ax1从0行0列开始跨越2行,1列
ax2 = plt.subplot2grid((6,1),(2,0),rowspan=2,colspan=1)
# ax2从2行0列开始跨越2行,1列
ax3 = plt.subplot2grid((6,1),(4,0),rowspan=2,colspan=1)
# ax3从4行0列开始跨越2行,1列
ax1.plot(ages,all_dev,label='All')
ax2.plot(ages,py,label='Python',color='g')
ax3.plot(ages,js,label='JS',color='r')
ax1.legend()
ax2.legend()
ax3.legend()
ax3.set_xlabel('Age')
Continue to design more custom distributions:
ax1 = plt.subplot2grid((6,1),(0,0),rowspan=1,colspan=1)
ax2 = plt.subplot2grid((6,1),(1,0),rowspan=3,colspan=1)
ax3 = plt.subplot2grid((6,1),(4,0),rowspan=2,colspan=1)
Draw it like the previous example:
ax1 = plt.subplot2grid((4,2),(0,0),rowspan=2,colspan=1)
ax2 = plt.subplot2grid((4,2),(0,1),rowspan=2,colspan=1)
ax3 = plt.subplot2grid((4,2),(2,0),rowspan=2,colspan=2)