Use Plotly to draw various charts
- Plotly section
-
- Installing Plotly
- Use Plotly to draw the first graph
- Attempt to plot large amounts of data
- Scatterplot
- pie chart
- Draw using custom data (PUBG eating chicken game data)
- density density map
- 3d scatter plot
- online mapping
- Real-time financial data plotting
- Use heatmap to draw a heat map
- Use scatter to draw a scatter plot
- Use scatter_matrix to draw a scatter matrix
- Use scatter_geo to draw geographic scatter plots
- Use choropleth function to draw map information
- Draw geographic areas using geojson functions
- Use choroplethmapbox to draw beautiful maps
- Folium draws (x,y) positioning map
- Use of dynamic data graphs
Plotly section
Installing Plotly
Plotly is relatively new and is not included in the anaconda environment and needs to be installed separately.
Pycharm or anaconda find Plotly and click install to install it.
You can enter it on the terminal: pip3 install plotly
You can enter it on the command line: pip install plotly
You can install it in the anaconda environment: conda install plotly
If the download speed is slow, you can use the Tsinghua source:pip install -i https://pypi.tuna.tsinghua.edu.cn/simple plotly
Check if the installation is successful
import plotly
from plotly import __version__
print(__version__)
The version displayed here means that the installation is successful. I am using version 4.14.3.
Use Plotly to draw the first graph
Use the offline version: plotly.offline
the easiest way to draw: iplot([数据])
, plot
and iplot
the biggest difference is whether to create a new webpage to display the chart, which will be demonstrated later.
First look at iplot
the schema.
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
dic1 = {
'x':[1,2,3,4],
'y':[1,3,5,8]
}
iplot([dic1])
# 可以试试plot([dic1])感受区别
It can be seen that the obvious difference between the graph drawn by plotly and that drawn by matplotlib is:
1. The data in the chart can display specific values (interactivity);
2. There are many tools for viewing the details of the image in the upper right corner (extended sex).
Attempt to plot large amounts of data
Use go.Scatter to store data
import plotly.graph_objects as go
import numpy as np
import random
x = np.random.randn(30)
y = np.random.randn(30)
go.Scatter(x=x,y=y)
When there is a lot of data, the data is often stored go.Scatter
first , and then called collectively.
Call go.Scatter data to draw a scatter plot
iplot([go.Scatter(x=x,y=y)])
# iplot([数据]),注意这里数据是放在中括号内
Such data is very messy. In fact, we only need to draw a scatter plot. Here we need to set the mode:mode='markers'
iplot([go.Scatter(x=x,y=y,mode='markers')])
Scatterplot
Go object standard writing
import plotly
import plotly.graph_objs as go
import numpy as np
from plotly.offline import download_plotlyjs , init_notebook_mode,plot ,iplot
n =1000
x = np.random.randn(n)
y = np.random.randn(n)
trace = go.Scatter(x=x, y=y, mode='markers', marker=dict(color='red',size=3,opacity=0.5))
data=[trace]
iplot(data)
The standard way of writing the go.Scatter statement:
the first step is to generate data.
The second step is to put the data into the go object. Assign go.Scatter()
the value in a variable , and adjust the details 例子用trace
in go.Scatter , which represents the color, the size of the scatter point, and the transparency. The third step is to create a variable to store the go object. The fourth step, the (optional) variable is an array list, so more than one go object can be stored. The fourth step, drawing.marker=dict()
color
size
opacity
data
data
iplot(data)
pie chart
groups=['餐食','账单','娱乐','其他']
amount=[1000,500,1500,300]
colors=['#d32c58','#f9b1ee','#b7f9b1','#b1f5f9']
trace=go.Pie(labels=groups, values=amount)
data=[trace]
iplot(data)
Enrich the details:
trace=go.Pie(labels=groups, values=amount, hoverinfo='label+percent', textinfo='value',
textfont=dict(size=25), marker=dict(colors=colors,line=dict(color='#000000',width=3)))
# hoverinfo='label+percent':显示标签+百分比
# textinfo='value':饼图上文字显示value值
# textfont=dict(size=25):文字大小25号
# marker=dict(colors=colors,line=dict(color='#000000',width=3)):颜色用colors内的颜色,线条用黑色,宽度3。
data=[trace]
iplot(data)
It is also possible to display only the required part, and the percentage will be recalculated:
Draw using custom data (PUBG eating chicken game data)
Data Sources
Website: kaggle.com
Use Pandas linkage Plotly
Use Pandas to read csv file data:
from plotly.offline import download_plotlyjs , init_notebook_mode,plot ,iplot
import plotly.graph_objs as go
import pandas as pd
pubg = pd.read_csv("PUBG.csv")
pubg.head()
Visualization requires the data to be of numeric type, and info
you can view it with a statement:
pubg.info()
View fields:
pubg.columns
Working with data structures
df_pubg = pubg.apply(pd.to_numeric,errors = 'ignore')
# 所有的数据转化成数值类型,错误忽略
df_new_pubg = df_pubg.head(100)
Draw a scatterplot
trace = go.Scatter(x = df_new_pubg.solo_RoundsPlayed ,y = df_new_pubg.solo_Wins , name = 'Rounds Won' ,mode='markers')
layout = go.Layout(title =" PUBG win vs round played " ,plot_bgcolor='rgb(230,230,230)' ,showlegend=True)
# 对输出内容进行设置
# plot_bgcolor:背景颜色
# showlegend=True:显示图示
fig = go.Figure(data=[trace] , layout=layout)
# 把trace和layout组合在一张画布上
iplot(fig)
Two sets of data to draw a histogram
trace1 = go.Bar(x=df_new_pubg.player_name, y=df_new_pubg.solo_RoundsPlayed, name='Rounds Play')
trace2 = go.Bar(x=df_new_pubg.player_name, y=df_new_pubg.solo_Wins, name='Wins')
layout = go.Layout(barmode='group')
fig = go.Figure(data=[trace1,trace2] , layout=layout)
iplot(fig)
density density map
The data follows the PUGB chicken data file of Data Visualization Analysis 2.3 .
from plotly.offline import download_plotlyjs , init_notebook_mode,plot ,iplot
import plotly.graph_objs as go
import pandas as pd
pubg = pd.read_csv("PUBG.csv")
df_pubg = pubg.apply(pd.to_numeric,errors = 'ignore')
df_new_pubg = df_pubg.head(100)
import plotly.figure_factory as ff
A 2D chart requires two sets of data:
x = df_new_pubg.solo_Wins
y = df_new_pubg.solo_TimeSurvived
Setting parameters:
colorscale = ['#7A4579','#D56073','rgb(236,158,105)',(1,1,0.2),(0.98,0.98,0.98)]
Do not add parameters to see the drawing effect
fig=ff.create_2d_density(x,y)
iplot(fig ,filename='histgram_subplot')
Color optimization via palette:
fig = ff.create_2d_density(x,y , colorscale= colorscale)
The color of the density map and the histogram here are not consistent, and then adjust the color of the histogram:
fig = ff.create_2d_density(x,y , colorscale= colorscale ,hist_color='rgb(255,237,222)' , point_size= 5)
3d scatter plot
A 3d map requires three pieces of data:
x = df_new_pubg.solo_Wins
y = df_new_pubg.solo_TimeSurvived
z = df_new_pubg.solo_RoundsPlayed
Use go
the statement to trace
assign a value:
trace1 = go.Scatter3d(
x=x,
y=y,
z=z,
mode='markers'
)
data=[trace1]
fig=go.Figure(data=data)
iplot(fig)
Optimize the marker parameters:
trace1 = go.Scatter3d(
x=x,
y=y,
z=z,
mode='markers',
marker=dict(
size=12,
color=z,
colorscale='Viridis',
# 采用Viridis调色板
opacity=0.8,
showscale =True
# 增加图示
)
)
The lighter the color the more the surface is played with.
Add another layout file:
layout = go.Layout(margin=dict(
l=0,
r=0,
t=0,
b=0
))
fig = go.Figure(data=data , layout=layout)
iplot(fig,filename='3d')
online mapping
Interactive visualization on the web is one of Plotly's most powerful features.
First, you need to click Sign Up on the official website of plotly to register an account: After logging in on the official website of plotly, click Settings: then find the API interface: generate a temporary password: install the library: start uploading pictures:
chart_studio
import chart_studio
import chart_studio.plotly as py
chart_studio.tools.set_credentials_file(username='D_Ddd0701',api_key='ZDBddR6QXiKshV9xdMwu')
# 输入网站上注册的用户名和生成的API
init_notebook_mode(connected=True)
# 笔记本和线上做连接
# 把刚才的代码复制过来,加入py.iplot
fig = go.Figure(data=data , layout=layout)
py.iplot(fig,filename='3d')
Click the EDIT button in the lower right corner, enter the webpage and find that the picture has been uploaded, click Save to store it.
After confirming the content, click Save again:
the picture has been uploaded to the personal File at this time:
click Viewer to browse, or click Editor to edit.
Similarly, you can also visit the pictures made by others to learn how others did it.
Real-time financial data plotting
This case uses Apple and Tesla stocks as examples.
Import data using Pandas
from plotly.offline import download_plotlyjs , init_notebook_mode,plot ,iplot
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('APPL.csv')
df.head()
You can see that the data includes date, opening and closing volume and other data.
draw a chart
Use go
the statement to store data:
trace1=go.Scatter(
x=df['Date'],
y=df['AAPL.Close']
)
iplot([trace1])
Use the same method to draw Tesla. Here we plot the highest and lowest prices for Tesla stock.
df2 = pd.read_csv('Tesla.csv')
trace_a = go.Scatter(
x = df2.date,
y = df2.high,
name = "Tesla High",
line = dict(color = '#17BECF'),
opacity =0.8
)
trace_b = go.Scatter(
x = df2.date,
y = df2.low,
name = "Tesla Low",
line = dict(color = '#7f7f7f'),
opacity =0.8
)
data=[trace_a, trace_b]
iplot(data)
Make some adjustments to the layout Layout
:
layout = dict(title = "Tesla stock High vs Low")
fig = dict(data = data,layout = layout)
iplot(fig)
Then add a close price line:
trace_c = go.Scatter(
x = df2.date,
y = df2.close,
name = "Tesla Close",
line = dict(color = '#7f7f7f'),
opacity =0.8
)
data =[trace_a,trace_b,trace_c]
Introducing financial features - range selectors
import plotly.express as px
fig = px.line(df2 , x='date',y='close') # 导入df2数据,x轴是date,y是close
fig.update_xaxes(rangeslider_visible=True) # 范围选择器
fig.show()
Look at Apple's:
fig = px.line(df, x='Date', y='AAPL.High', title='Time Series with Rangeslider')
fig.update_xaxes(rangeslider_visible=True)
fig.show()
Introduce financial features - daily line, 5-day line and other shortcut keys
fig = px.line(df, x='Date', y='AAPL.High', title='Time Series with Rangeslider')
fig.update_xaxes(rangeslider_visible=True,
rangeselector = dict(
buttons=list([
dict(count=1,label="1d",step="day",stepmode="backward"),
dict(count=5,label="5d",step="day",stepmode="backward"),
dict(count=1,label="1m",step="month",stepmode="backward"),
dict(count=3,label="3m",step="month",stepmode="backward"),
dict(count=6,label="6m",step="month",stepmode="backward"),
dict(count=1,label="1y",step="year",stepmode="backward"),
dict(step="all") #恢复到之前
])
)
)
fig.show()
Introducing Financial Features - Candle Charts
Candlestick code:
go.Candlestick(
x=日期,
open=开盘价,
high=最高价,
low=最低价,
close=收盘价
)
fig = go.Figure(data=[go.Candlestick(
x=df['Date'],
open=df['AAPL.Open'],
high=df['AAPL.High'],
low=df['AAPL.Low'],
close=df['AAPL.Close']
)
]
)
ig = go.Figure(data=[go.Candlestick(
x=df['Date'],
open=df['AAPL.Open'],
high=df['AAPL.High'],
low=df['AAPL.Low'],
close=df['AAPL.Close']
)
]
)
fig.update_xaxes(rangeslider_visible=True,
rangeselector = dict(
buttons=list([
dict(count=1,label="日",step="day",stepmode="backward"),
dict(count=5,label="五日",step="day",stepmode="backward"),
dict(count=1,label="月线",step="month",stepmode="backward"),
dict(count=3,label="季线",step="month",stepmode="backward"),
dict(count=6,label="半年线",step="month",stepmode="backward"),
dict(count=1,label="年线",step="year",stepmode="backward"),
dict(step="all")
])
)
)
fig.show()
Introducing Financial Characteristics - Indicators
import cufflinks as cf
cf.set_config_file(offline=True,world_readable=True) #设置offline=True和python关联
Generate k-line data with the built-in simulator of cf:
df= cf.datagen.ohlc()
df.head()
qf=cf.QuantFig(df) # 把df内的数据变为金融数据
Use qf.iplot()
the K-line diagram to draw:
qf.iplot()
Add macd indicator
Add macd indicator:qf.add_macd()
qf.add_macd()
qf.iplot()
Increase the rsi indicator
Increase the rsi indicator: qf.add_rsi()
Here, according to the individual, enter the value:
qf.add_rsi(6,80) #周期6天触发值80
qf.iplot()
Add Bollinger Bands Channel
Add Bollinger Bands Channel:qf.add_bollinger_bands()
qf.add_bollinger_bands()
qf.iplot()
You can also click on the upper right corner to turn off the indicator.
Use heatmap to draw a heat map
Import class library:
import pandas as pd
import numpy as np
import chart_studio.plotly as py
import seaborn as sns
import plotly.express as px
%matplotlib inline #代表所有绘制的图表都内嵌在网页中
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
Import data using Pandas
flights = sns.load_dataset("flights")
flights.head()
The self-contained database is used here seaborn
, and errors may occur IncompleteRead
. Solutions:
1. The csv file download address , click to download.
2. Put the downloaded copy of fights.csv into the seaborn-data folder.
View data types:flights.info()
Draw a heat map
A heat map requires three pieces of data.
fig = px.density_heatmap(flights, x ='year' , y ='month' , z= 'passengers')
换个颜色:color_continuous_scale='配色器'
可选:One of the following named colorscales:
[‘aggrnyl’, ‘agsunset’, ‘algae’, ‘amp’, ‘armyrose’, ‘balance’,
‘blackbody’, ‘bluered’, ‘blues’, ‘blugrn’, ‘bluyl’, ‘brbg’,
‘brwnyl’, ‘bugn’, ‘bupu’, ‘burg’, ‘burgyl’, ‘cividis’, ‘curl’,
‘darkmint’, ‘deep’, ‘delta’, ‘dense’, ‘earth’, ‘edge’, ‘electric’,
‘emrld’, ‘fall’, ‘geyser’, ‘gnbu’, ‘gray’, ‘greens’, ‘greys’,
‘haline’, ‘hot’, ‘hsv’, ‘ice’, ‘icefire’, ‘inferno’, ‘jet’,
‘magenta’, ‘magma’, ‘matter’, ‘mint’, ‘mrybm’, ‘mygbm’, ‘oranges’,
‘orrd’, ‘oryel’, ‘oxy’, ‘peach’, ‘phase’, ‘picnic’, ‘pinkyl’,
‘piyg’, ‘plasma’, ‘plotly3’, ‘portland’, ‘prgn’, ‘pubu’, ‘pubugn’,
‘puor’, ‘purd’, ‘purp’, ‘purples’, ‘purpor’, ‘rainbow’, ‘rdbu’,
‘rdgy’, ‘rdpu’, ‘rdylbu’, ‘rdylgn’, ‘redor’, ‘reds’, ‘solar’,
‘spectral’, ‘speed’, ‘sunset’, ‘sunsetdark’, ‘teal’, ‘tealgrn’,
‘tealrose’, ‘tempo’, ‘temps’, ‘thermal’, ‘tropic’, ‘turbid’,
‘turbo’, ‘twilight’, ‘viridis’, ‘ylgn’, ‘ylgnbu’, ‘ylorbr’,
‘ylorrd’]
选择viridis
试试。
fig = px.density_heatmap(flights, x ='year' , y ='month' , z= 'passengers' , color_continuous_scale='viridis')
Statistics with histogram
To count the total number of data contained in the x and y axes, you need to usemarginal_x="histogram" ,marginal_y="histogram"
fig = px.density_heatmap(flights, x ='year' , y ='month' , z= 'passengers' ,marginal_x="histogram" ,marginal_y="histogram")
React heatmap with 3D line chart
fig = px.line_3d(flights , x ='year' , y ='month' , z= 'passengers' ,color='year') # color='year'表示每一年的数据用不同颜色
This is consistent with the heat map response, as the number of years increases, the number of people also increases. July data is generally the largest.
If you only look at the data of these few years, then the displayed change law will be more intuitive.
Use scatter to draw a scatter plot
fig = px.scatter_3d(flights , x ='year' , y ='month' , z= 'passengers' ,color='year')
This is similar to the result shown in the line chart.
Use scatter_matrix to draw a scatter matrix
But sometimes, we hope to display the information of the 3D graph with a 2D graph. The 3D graph of the data in the case just now involves three sets of variables x, y, and z. If it is displayed in 2D, the relationship between xy, xz, and yz needs to be displayed. Draw it with three graphs. Functions are needed here scatter_matrix
.
fig = px.scatter_matrix(flights,color="month")
Next, we use the classic machine learning data Iris flower data set to do another visual analysis.
The Iris data includes attribute values of four different dimensions of three species of iris. We want to separate the three types of flowers through the dataset.
df = px.data.iris()
df.head()
Use the scatter matrix just now to analyze:
fig = px.scatter_matrix(df,color="species")
It can be seen that using petal_length and petal_width to draw points is relatively open.
So use these two elements to draw a scatterplot separately:
fig = px.scatter(df , x ='petal_length' , y='petal_width' , color='species' )
It is found that this picture does not give enough information, so we add another distinction:size='petal_length'
fig = px.scatter(df , x ='petal_length' , y='petal_width' , color='species', size='petal_length')
You will find that setosa is relatively small, and virginica is relatively large. This picture can separate the blue, and the red and green overlap, and the separation is more complicated. So you can try 3D.
fig = px.scatter_3d(df , x ='petal_length' , y='petal_width' ,z='sepal_width' , color='species' ,size='petal_length')
The 3D map finds that there will be a section in space that can separate the three.
Use scatter_geo to draw geographic scatter plots
Use px's own database
Use the built-in database 2007 gdp data:
df = px.data.gapminder().query("year == 2007")
df.head()
Mapping geographic information
fig = px.scatter_geo(df,locations="iso_alpha")
# locations='iso_alpha'表示自动适配地理信息
To make the display more recognizable, add other parameters:
# color="continent"表示按洲不同颜色不同
# hover_name="lifeExp"表示显示数据集中lifeExp数值
# size='pop'表示用数据集中pop数据区别大小
# projection='orthographic'表示用地球投影模式
fig = px.scatter_geo(df,locations="iso_alpha",color="continent",hover_name="lifeExp",size='pop',projection='orthographic')
可以用的投影模式有:
One of the following enumeration values:
[‘equirectangular’, ‘mercator’, ‘orthographic’, ‘natural
earth’, ‘kavrayskiy7’, ‘miller’, ‘robinson’, ‘eckert4’,
‘azimuthal equal area’, ‘azimuthal equidistant’, ‘conic
equal area’, ‘conic conformal’, ‘conic equidistant’,
‘gnomonic’, ‘stereographic’, ‘mollweide’, ‘hammer’,
‘transverse mercator’, ‘albers usa’, ‘winkel tripel’,
‘aitoff’, ‘sinusoidal’]
Use choropleth function to draw map information
import pandas as pd
import numpy as np
import plotly
import plotly.graph_objects as go
import chart_studio.plotly as py
import plotly.express as px
When drawing a map, data information is required, but this data requires gps information (latitude and longitude coordinates), if not, geojson information needs to be supplemented.
For example, Chengdu in the csv file does not represent Chengdu on the map. Chengdu in the csv file is just a character string. To let the program recognize that this is Chengdu on the map, it should include the range of Chengdu (span of latitude and longitude, etc.).
Here is a specific example:
In order to determine the range of this polygon, we use 5 points to delineate it, thus forming the polygonal geographic information in a space. All area information can be summarized as follows:
Draw geographic areas using geojson functions
"Enclosure" from the official website of geojson
In order to obtain geojson data, you need to "enclose the land" on the official website of geojson. Geojson website
Find the place you want (take Tianfu Square in the center of Chengdu as an example), the framed area will form data in geojson format on the right, and click Sava to save it in geojson format.
Use the folium library to "enclose" the jupyter notebook
First, you need to install folium
this third-party library, and the installation method is the same as other third-party libraries.
from folium.plugins import Draw
import folium
m = folium.Map()
draw = Draw(export=True,filename="tianfu_square.geojson")
draw.add_to(m)
Click Export to save the local geojson
format file.
Use choroplethmapbox to draw beautiful maps
A data is prepared here:
geo = pd.read_csv("Geography.csv")
Then we know that there are only two ways to locate on the map: 1. Give the specified (x, y) coordinates; 2. Geojson format file. Obviously, it is too difficult to give the coordinates of the city, so the second method is selected here.
Someone on the Internet has given a geojson file of city-level regions across the country, and we will call it directly.
import json
with open('china_geojson.json') as file:
china = json.load(file)
Draw the picture:
# geojson=geojson数据
# locations=地图对应的id信息
# z=数值
fig = go.Figure(go.Choroplethmapbox(geojson=china,locations=geo.Regions,z=geo.followerPercentage
,colorscale='Cividis'))
# 直接绘制是不能显示的,必须需要fig.update_layout()渲染s
fig.update_layout(mapbox_style="carto-positron",mapbox_zoom=3,mapbox_center = {
"lat" : 35.9 ,"lon" : 104.2})
fig.update_layout(margin={
"t":0,"b":0,"l":0,"r":0})
fig
This picture looks fine, but because the landmarks given in the Geography.csv file are all in English, we prefer to display them in Chinese:
geo2 = pd.read_csv('Geography2.csv')
geo2.head(5)
Change the Beijing and Shanghai of the previous data to Chinese and run it again:
fig = go.Figure(go.Choroplethmapbox(geojson=china,locations=geo2.Regions,z=geo2.followerPercentage
,colorscale='Cividis'))
fig.update_layout(mapbox_style="carto-positron",mapbox_zoom=3,mapbox_center = {
"lat" : 35.9 ,"lon" : 104.2})
fig.update_layout(margin={
"t":0,"b":0,"l":0,"r":0})
fig
It was found that neither Beijing nor Shanghai could be displayed.
This is because our geojson file id is also in English. If Beijing and Shanghai are changed to Chinese here, it will not match the geojson, so it cannot be displayed.
If you modify the id to Chinese, it can be displayed correctly.
Folium draws (x,y) positioning map
We want to draw a map similar to coordinate positioning:
first import a dataset:
df = pd.read_csv('geo_pandas.txt')
We take the first 100 rows of the dataset.
limit=100
df = df.iloc[:limit,:]
We can achieve positioning with folium:
lat = 37.77
long = -122.42
m2 = folium.Map(location=[lat,long],zoom_start=12)
If you want to plot the anchor points:
# 引入特征
incidents = folium.map.FeatureGroup()
# 组合经纬度
for lat, long in zip(df.Y, df.X):
incidents.add_child(
folium.CircleMarker(# 画小点点
[lat,long],
radius=5
)
)
m2.add_child(incidents)
It can be further beautified:
for lat, long in zip(df.Y, df.X):
incidents.add_child(
folium.CircleMarker(# 画小点点
[lat,long],
radius=5,
fill = True, # 开启外圈填充
fill_color = 'blue', # 外圈填充蓝色
color = 'yellow', #内圈颜色黄色
fill_opacity = 0.6 #透明度
)
)
m2.add_child(incidents)
We still hope to have a label (arrow) like the demo.
for lat, long in zip(df.Y, df.X):
incidents.add_child(
folium.CircleMarker(# 画小点点
[lat,long],
radius=5,
fill = True, # 开启外圈填充
fill_color = 'blue', # 外圈填充蓝色
color = 'yellow', #内圈颜色黄色
fill_opacity = 0.6 #透明度
)
)
lat1 = list(df.Y)
long1 = list(df.X)
label1 = list(df.Category)
for lat1, long1, label1 in zip(lat1, long1, label1):
folium.Marker([lat1, long1],popup=label1).add_to(m2)
m2.add_child(incidents)
Here we only display the first 100 rows of data. If there are a total of 150,000 data displayed, it will be densely packed and very uncomfortable. So what should I do if I want to display all of them here? We can use a clustering approach.
from folium.plugins import MarkerCluster
# 新建地图
m3 = folium.Map(location=[lat,long],zoom_start=12)
marker_cluster = MarkerCluster().add_to(m3)
lat1 = list(df.Y)
long1 = list(df.X)
label1 = list(df.Category)
for lat1, long1, label1 in zip(lat1, long1, label1):
folium.Marker([lat1, long1],popup=label1).add_to(marker_cluster)
# 这里不add_to(m3),而是add_to(聚类分组处理器)
As the scroll wheel zooms in and out, specific and clustered information can be displayed.
Use of dynamic data graphs
Use a normal scatterplot
First import various class libraries
import pandas as pd
import numpy as np
import chart_studio.plotly as py
import cufflinks as cf
import seaborn as sns
import plotly.express as px
%matplotlib inline
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
Import px's own dataset:
df_cnt = px.data.gapminder()
df_cnt.head()
Draw a scatterplot to see the correlation between gdpPercap and lifeExp:
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" )
Here you will find that the data are stacked together, and it is impossible to distinguish the previous situation of each country, so use colors to distinguish them:
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent')
Here you will find that it is still not easy to distinguish, so you can use a method that has been used before-using scientific notation.
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent' ,log_x =True)
We also want to click on this point to see specific country information:
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent' ,log_x =True , hover_name="country")
In fact, this picture is still quite messy, because there are decades of data, it would be great if it can be displayed separately, here we need to use a dynamic data map.
Use dynamic scatterplots
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent' ,log_x =True , hover_name="country",
animation_frame="year")
#animation_frame="year"表示按年播放
Drag the slider below or click the play button to play the data of each year. But it will be found that as time increases, the points will overflow the table, because the dynamic y-axis is not set.
Add both dynamic x-axis and dynamic y-axis at once:
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent' ,
log_x =True , hover_name="country",animation_frame="year",range_x=[100,100000],
range_y=[25,90])
There is still a problem here, the actual size of gdp is not clear, for example, what is the gdp situation of purple and other colors? So you can adjust size
the parameters and use the population to distinguish.
px.scatter(df_cnt,x='gdpPercap' , y = "lifeExp" ,color='continent' ,
log_x =True , hover_name="country",animation_frame="year",range_x=[100,100000],
range_y=[25,90],size='pop',size_max=60)
Use a normal histogram
px.bar(df_cnt,x='continent' , y='pop')
I found that I couldn't see what country this pillar represented by pointing up here. so increasehover_name="country"
px.bar(df_cnt,x='continent' , y='pop',hover_name="country")
Use a dynamic histogram
px.bar(df_cnt,x='continent' , y='pop' , hover_name='country' ,color='continent' ,
animation_frame='year')
This will also involve the y-axis display problem, so design the y-axis range:
px.bar(df_cnt,x='continent' , y='pop' , hover_name='country' ,color='continent' ,
animation_frame='year',range_y=[0,4000000000],animation_group='country')
# 这里的animation_group='country'类似MySQL中的groupby,表示按国家分组
dynamic density map
fig = px.density_contour(df_cnt, x="gdpPercap", y="lifeExp", color="continent", marginal_y="histogram",
animation_frame='year', animation_group='country', range_y=[25,100])
Dynamic heat map
fig = px.density_heatmap(df_cnt, x="gdpPercap", y="lifeExp", marginal_y="histogram",
animation_frame='year', animation_group='country', range_y=[25,100])
Dynamic geographic information map
gapminder = px.data.gapminder()
px.choropleth(gapminder,
locations="iso_alpha",
color="lifeExp",
hover_name="country",
animation_frame="year",
color_continuous_scale='Plasma',
height=600
)
Then use the map to explore the trend of crime rate in a certain area:
df = pd.read_csv('CrimeStatebyState_1960-2014.csv')
df.head()
px.choropleth(df,
locations = 'State_code',
color="Murder_per100000", # 用每10万人犯罪数量区别
animation_frame="Year",
color_continuous_scale="oranges",
locationmode='USA-states', # 自带国家边界的geojson数据
scope="usa",
range_color=(0, 20),
title='Crime by State',
height=600
)