>python visualization artifact altair

Python visualization artifact altair

Today I introduce a python library altairwhose syntax is somewhat similar to r's ggplot

The compatibility with Chinese is also very good, take a simple scatter plot as an example:

Installation Notes:

pip install altair
pip install vega-datasets#注意这里是"-"不是"_",我们要使用到其中的数据
import altair as alt
from vega_datasets import data
cars = data.cars()
cars

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    shape='Origin'
).interactive()

Output the following graphic, click the three dots next to it, and save it as a picture in various forms.

image-20221214133847325

It can be found that its syntax is extremely simple:

  • cars is the data we need, he is a data frame (in the form of dataframe)

  • make-point is a scatter plot

  • x='Horsepower', y='Miles_per_Gallon' correspond to our x-axis and y-axis data respectively

  • color='Origin' maps colors according to the origin, which is very similar to the syntax of ggplot

  • shape='Origin', here is to map the shape of the point according to the place of origin

  • interactive() Generate interactive pictures, the effect is as follows

Please add a picture description

1. Draw some simple graphics

(1).Histogram

the syntax is simple

import altair as alt
import pandas as pd

source = pd.DataFrame({
    
    
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x='a',
    y='b',
    color="a"
)

image-20221214140900377

1. Then we can also set a certain column of the highlighted histogram, and set the other columns to the same color:

import altair as alt
import pandas as pd

source = pd.DataFrame({
    
    
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

alt.Chart(source).mark_bar().encode(
    x='a:O',
    y='b:Q',
    color=alt.condition(
        alt.datum.a=="A",#这里设置条件,如果a的值是"A",需要改动的只有a这个地方和"A"这个地方,后者是前者满足的条件
        alt.value("red"),#如果满足上面的条件颜色就变成红色
        alt.value("yellow")#如果不满足就变成黄色
    )
).properties(width=600,height=400)#这里的height和width分别设置图片的大小和高度

image-20221214153017888

2. Flip the picture, add picture annotations at the same time, and add data to the picture

Uh uh uh, in fact, flipping the picture means swapping the x and y axis data

import altair as alt
import pandas as pd

source = pd.DataFrame({
    
    
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

bars=   alt.Chart(source).mark_bar().encode(
    x='b:Q',
    y='a:O',
    color="a")
text = bars.mark_text(
    align='right',#在这里选择一个['left', 'center', 'right']
    baseline='middle',
    dx=10  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='a'#这里是添加数据
)
bars+text

image-20221214154018862

3. Add lines to the graph

import altair as alt
import pandas as pd

source = pd.DataFrame({
    
    
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

bars=   alt.Chart(source).mark_bar().encode(
    x='a',
    y='b',
    color="a")

rule = alt.Chart(source).mark_rule(color='red').encode(
    y='mean(b)',
)
(bars+rule).properties(width=600,height=400)

image-20221214155142287

4. Combined chart, histogram + line chart

First we need to fix the x-axis

import altair as alt
from vega_datasets import data
import pandas as pd

source = pd.DataFrame({
    
    
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
base = alt.Chart(source).encode(x='a:O')

bar = base.mark_bar().encode(y='b:Q')

line =  base.mark_line(color='red').encode(
    y='b:Q'
)

(bar + line).properties(width=600)

image-20221214155933379

(2). Heat map

import altair as alt
import numpy as np
import pandas as pd

# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x ** 2 + y ** 2

# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({
    
    'x': x.ravel(),
                     'y': y.ravel(),
                     'z': z.ravel()})

alt.Chart(source).mark_rect().encode(
    x='x:O',
    y='y:O',
    color='z:Q'
)

image-20221214141345469

(3). Histogram

Count the number of occurrences of numbers in different ranges

Here is an example of our initial cars data:

import altair as alt
from vega_datasets import data
cars = data.cars()
cars
alt.Chart(cars).mark_bar().encode(
    alt.X("Displacement", bin=True),
    y='count()',
    color="Origin"
)

image-20221214142326999

(4). Line graph

Can be used to draw function curves, for example:
y = sin ⁡ x 5 \displaystyle y=\frac{\sin x}{5}y=5sinx

import altair as alt
import numpy as np
import pandas as pd

x = np.arange(100)
source = pd.DataFrame({
    
    
  'x': x,
  'f(x)': np.sin(x / 5)
})

alt.Chart(source).mark_line().encode(
    x='x',
    y='f(x)'
)

image-20221214142546699

(5). Scatter plot with mouse tips

That is, when you click on a certain location, it will give you corresponding information, such as its coordinates

For example, I set the tooltip in the code below, and when I click on a certain point, the corresponding name, attribution, and horsepower will be displayed

import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source).mark_circle(size=60).encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()

Please add a picture description

(6). Stacked area chart

For example, the following code, where x is a different year, and y is the net power generation using different raw materials

import altair as alt
from vega_datasets import data

source = data.iowa_electricity()
source
alt.Chart(source).mark_area().encode(
    x="year:T",
    y="net_generation:Q",
    color="source:N"
)

image-20221214143550277

(7). Fan chart

import pandas as pd
import altair as alt

source = pd.DataFrame({
    
    "category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})

alt.Chart(source).mark_arc(innerRadius=50).encode(
    theta=alt.Theta(field="value", type="quantitative"),
    color=alt.Color(field="category", type="nominal"),
)

image-20221214161547967

2. Advanced operation

1. Line chart

1. Make a line chart with 95% confidence interval bands.

## 带有置信区间
import altair as alt
from vega_datasets import data

source = data.cars()

line = alt.Chart(source).mark_line().encode(
    x='Year',
    y='mean(Miles_per_Gallon)'
)

band = alt.Chart(source).mark_errorband(extent='ci').encode(
    x='Year',
    y=alt.Y('Miles_per_Gallon', title='Miles/Gallon'),
)

band + line

image-20221214160510796

2. Line chart markers

#折线图标记
import altair as alt
import numpy as np
import pandas as pd

x = np.arange(100)
source = pd.DataFrame({
    
    
  'x': x,
  'f(x)': np.sin(x / 5)
})

alt.Chart(source).mark_line(
    point=alt.OverlayMarkDef(color="red")
).encode(
    x='x',
    y='f(x)'
)

image-20221214160756661

3. Set the line thickness of the line chart at different positions

#线条粗细随之变化
import altair as alt
from vega_datasets import data

source = data.wheat()

alt.Chart(source).mark_trail().encode(
    x='year:T',
    y='wheat:Q',
    size='wheat:Q'
)

image-20221214161027315

2. Standard area stacked chart

The difference is that he will fill the whole pictureimage-20221214161332328

import altair as alt
from vega_datasets import data

source = data.iowa_electricity()

alt.Chart(source).mark_area().encode(
    x="year:T",
    y=alt.Y("net_generation:Q", stack="normalize"),
    color="source:N"
)

3. A pie chart with gaps

import numpy as np
import altair as alt

alt.Chart().mark_arc(color="gold").encode(
    theta=alt.datum((5 / 8) * np.pi, scale=None),
    theta2=alt.datum((19 / 8) * np.pi),
    radius=alt.datum(100, scale=None),
)

image-20221214161654529

1. Pie chart

import pandas as pd
import altair as alt

source = pd.DataFrame({
    
    "category": [1, 2, 3, 4, 5, 6], "value": [4, 6, 10, 3, 7, 8]})

alt.Chart(source).mark_arc().encode(
    theta=alt.Theta(field="value", type="quantitative"),
    color=alt.Color(field="category", type="nominal"),
)

image-20221214161804227

2. Radial pie chart

import pandas as pd
import altair as alt

source = pd.DataFrame({
    
    "values": [12, 23, 47, 6, 52, 19]})

base = alt.Chart(source).encode(
    theta=alt.Theta("values:Q", stack=True),
    radius=alt.Radius("values", scale=alt.Scale(type="sqrt", zero=True, rangeMin=20)),
    color="values:N",
)

c1 = base.mark_arc(innerRadius=20, stroke="#fff")

c2 = base.mark_text(radiusOffset=10).encode(text="values:Q")

c1 + c2

image-20221214162318304

4. Advanced scatter plot

1. Scatter plot with error bars

import altair as alt
import pandas as pd
import numpy as np

# generate some data points with uncertainties
np.random.seed(0)
x = [1, 2, 3, 4, 5]
y = np.random.normal(10, 0.5, size=len(x))
yerr = 0.2

# set up data frame
source = pd.DataFrame({
    
    "x": x, "y": y, "yerr": yerr})

# the base chart
base = alt.Chart(source).transform_calculate(
    ymin="datum.y-datum.yerr",
    ymax="datum.y+datum.yerr"
)

# generate the points
points = base.mark_point(
    filled=True,
    size=50,
    color='black'
).encode(
    x=alt.X('x', scale=alt.Scale(domain=(0, 6))),
    y=alt.Y('y', scale=alt.Scale(zero=False))
)

# generate the error bars
errorbars = base.mark_errorbar().encode(
    x="x",
    y="ymin:Q",
    y2="ymax:Q"
)

points + errorbars

image-20221214162544140

2. Scatterplot labeling

#散点图加标签
import altair as alt
import pandas as pd

source = pd.DataFrame({
    
    
    'x': [1, 3, 5, 7, 9],
    'y': [1, 3, 5, 7, 9],
    'label': ['我', '是', '你', '爸', '爸']
})

points = alt.Chart(source).mark_point().encode(
    x='x:Q',
    y='y:Q'
)

text = points.mark_text(
    align='left',
    baseline='middle',
    dx=7
).encode(
    text='label'
)

points + text

image-20221214170203065

5. World map

import altair as alt
from vega_datasets import data

# Data generators for the background
sphere = alt.sphere()
graticule = alt.graticule()

# Source of land data
source = alt.topo_feature(data.world_110m.url, 'countries')

# Layering and configuring the components
alt.layer(
    alt.Chart(sphere).mark_geoshape(fill='lightblue'),
    alt.Chart(graticule).mark_geoshape(stroke='white', strokeWidth=0.5),
    alt.Chart(source).mark_geoshape(fill='ForestGreen', stroke='black')
).project(
    'naturalEarth1'
).properties(width=600, height=400).configure_view(stroke=None)

image-20221214170416101

3. Save the picture

You can save it in svg,png,html,pdf,jsonformats like

import altair as alt
from vega_datasets import data

chart = alt.Chart(data.cars.url).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
)

chart.save('chart.json')
chart.save('chart.html')
chart.save('chart.png')
chart.save('chart.svg')
chart.save('chart.pdf')	

Also set the size of the saved picture

chart.save('chart.png', scale_factor=2.0)

4. Configuration of some properties of pictures

For example, add a title to the picture:

#世界地图
import altair as alt
from vega_datasets import data

# Data generators for the background
sphere = alt.sphere()
graticule = alt.graticule()

# Source of land data
source = alt.topo_feature(data.world_110m.url, 'countries')

# Layering and configuring the components
alt.layer(
    alt.Chart(sphere).mark_geoshape(fill='lightblue'),
    alt.Chart(graticule).mark_geoshape(stroke='white', strokeWidth=0.5),
    alt.Chart(source).mark_geoshape(fill='ForestGreen', stroke='black')
).project(
    'naturalEarth1'
).properties(width=600, height=400,title="世界地图").configure_view(stroke=None)

image-20221214171613109

Property Type Description
arc RectConfig Arc-specific Config
area AreaConfig Area-Specific Config
aria boolean A boolean flag indicating if ARIA default attributes should be included for marks and guides (SVG output only). If false, the "aria-hidden" attribute will be set for all guides, removing them from the ARIA accessibility tree and Vega-Lite will not generate default descriptions for marks.Default value: true.
autosize anyOf(AutosizeType, AutoSizeParams) How the visualization size should be determined. If a string, should be one of "pad", "fit" or "none". Object values can additionally specify parameters for content sizing and automatic resizing.Default value: pad
axis AxisConfig Axis configuration, which determines default properties for all x and y axes. For a full list of axis configuration options, please see the corresponding section of the axis documentation.
axisBand AxisConfig Config for axes with “band” scales.
axisBottom AxisConfig Config for x-axis along the bottom edge of the chart.
axisDiscrete AxisConfig Config for axes with “point” or “band” scales.
axisLeft AxisConfig Config for y-axis along the left edge of the chart.
axisPoint AxisConfig Config for axes with “point” scales.
axisQuantitative AxisConfig Config for quantitative axes.
axisRight AxisConfig Config for y-axis along the right edge of the chart.
axisTemporal AxisConfig Config for temporal axes.
axisTop AxisConfig Config for x-axis along the top edge of the chart.
axisX AxisConfig X-axis specific config.
axisXBand AxisConfig Config for x-axes with “band” scales.
axisXDiscrete AxisConfig Config for x-axes with “point” or “band” scales.
axisXPoint AxisConfig Config for x-axes with “point” scales.
axisXQuantitative AxisConfig Config for x-quantitative axes.
axisXTemporal AxisConfig Config for x-temporal axes.
axisY AxisConfig Y-axis specific config.
axisYBand AxisConfig Config for y-axes with “band” scales.
axisYDiscrete AxisConfig Config for y-axes with “point” or “band” scales.
axisYPoint AxisConfig Config for y-axes with “point” scales.
axisYQuantitative AxisConfig Config for y-quantitative axes.
axisYTemporal AxisConfig Config for y-temporal axes.
background anyOf(Color, ExprRef) CSS color property to use as the background of the entire view.Default value: "white"
bar BarConfig Bar-Specific Config
boxplot BoxPlotConfig Box Config
circle MarkConfig Circle-Specific Config
concat CompositionConfig Default configuration for all concatenation and repeat view composition operators (concat, hconcat, vconcat, and repeat)
countTitle string Default axis and legend title for count fields.Default value: 'Count of Records.
customFormatTypes boolean Allow the formatType property for text marks and guides to accept a custom formatter function registered as a Vega expression.
errorband ErrorBandConfig ErrorBand Config
errorbar ErrorBarConfig ErrorBar Config
facet CompositionConfig Default configuration for the facet view composition operator
fieldTitle [‘verbal’, ‘functional’, ‘plain’] Defines how Vega-Lite generates title for fields. There are three possible styles: - "verbal" (Default) - displays function in a verbal style (e.g., “Sum of field”, “Year-month of date”, “field (binned)”). - "function" - displays function using parentheses and capitalized texts (e.g., “SUM(field)”, “YEARMONTH(date)”, “BIN(field)”). - "plain" - displays only the field name without functions (e.g., “field”, “date”, “field”).
font string Default font for all text marks, titles, and labels.
geoshape MarkConfig Geoshape-Specific Config
header HeaderConfig Header configuration, which determines default properties for all headers.For a full list of header configuration options, please see the corresponding section of in the header documentation.
headerColumn HeaderConfig Header configuration, which determines default properties for column headers.For a full list of header configuration options, please see the corresponding section of in the header documentation.
headerFacet HeaderConfig Header configuration, which determines default properties for non-row/column facet headers.For a full list of header configuration options, please see the corresponding section of in the header documentation.
headerRow HeaderConfig Header configuration, which determines default properties for row headers.For a full list of header configuration options, please see the corresponding section of in the header documentation.
image RectConfig Image-specific Config
legend LegendConfig Legend configuration, which determines default properties for all legends. For a full list of legend configuration options, please see the corresponding section of in the legend documentation.
line LineConfig Line-Specific Config
lineBreak anyOf(string, ExprRef) A delimiter, such as a newline character, upon which to break text strings into multiple lines. This property provides a global default for text marks, which is overridden by mark or style config settings, and by the lineBreak mark encoding channel. If signal-valued, either string or regular expression (regexp) values are valid.
mark MarkConfig Mark Config
numberFormat string D3 Number format for guide labels and text marks. For example "s" for SI units. Use D3’s number format pattern.
padding anyOf(Padding, ExprRef) The default visualization padding, in pixels, from the edge of the visualization canvas to the data rectangle. If a number, specifies padding for all sides. If an object, the value should have the format {"left": 5, "top": 5, "right": 5, "bottom": 5} to specify padding for each side of the visualization.Default value: 5
params array(Parameter) Dynamic variables that parameterize a visualization.
point MarkConfig Point-Specific Config
projection ProjectionConfig Projection configuration, which determines default properties for all projections. For a full list of projection configuration options, please see the corresponding section of the projection documentation.
range RangeConfig An object hash that defines default range arrays or schemes for using with scales. For a full list of scale range configuration options, please see the corresponding section of the scale documentation.
rect RectConfig Rect-Specific Config
rule MarkConfig Rule-Specific Config
scale ScaleConfig Scale configuration determines default properties for all scales. For a full list of scale configuration options, please see the corresponding section of the scale documentation.
selection SelectionConfig An object hash for defining default properties for each type of selections.
square MarkConfig Square-Specific Config
style StyleConfigIndex An object hash that defines key-value mappings to determine default properties for marks with a given style. The keys represent styles names; the values have to be valid mark configuration objects.
text MarkConfig Text-Specific Config
tick TickConfig Tick-Specific Config
timeFormat string Default time format for raw time values (without time units) in text marks, legend labels and header labels.Default value: "%b %d, %Y" Note: Axes automatically determine the format for each label automatically so this config does not affect axes.
title TitleConfig Title configuration, which determines default properties for all titles. For a full list of title configuration options, please see the corresponding section of the title documentation.
trail LineConfig Trail-Specific Config
view ViewConfig Default properties for single view plots.

Advantages and disadvantages

Advantages : simple syntax, good compatibility with Chinese, very similar to ggplot in r language.

Disadvantage : The generated image cannot be copied directly and needs to be saved locally, which is not as good as matplotlib

If you are interested in research: click this link

show some pictures

image-20221214172125223

image-20221214172148674

image-20221214172159688

Reference: Please click me for more content : https://altair-viz.github.io/gallery/index.html

Guess you like

Origin blog.csdn.net/qq_54423921/article/details/128319485