[Data Analysis and Visualization] Draw beautiful tables based on matplotlib and plottable library

plottable is a Python library for drawing beautifully customized graphical tables in matplotlib. The official warehouse address of plottable is: plottable . This article mainly refers to its official documentation. The official documentation address of plottable is: plottable-doc . The plottable installation command is as follows:

pip install plottable

See all the code in this article: Python-Study-Notes

# jupyter notebook环境去除warning
import warnings
warnings.filterwarnings("ignore")
import plottable
# 打印plottable版本
print('plottable version:', plottable.__version__)
# 打印matplotlib版本
import matplotlib as plt
print('matplotlib version:', plt.__version__)
plottable version: 0.1.5
matplotlib version: 3.5.3

1 Instructions for use

1.1 Basic use

The following code shows a simple graphic table drawing example, plottable provides the Table class to create and display graphic tables.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import Table

# 生成一个包含随机数据的表格
d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 基于pandas表格数据创建和展示图形表格
tab = Table(d)

# 保存图片
plt.savefig("table.jpg", dpi=300,bbox_inches='tight')
plt.show()

png

For the Table class of plottable, its construction parameters are introduced as follows:

  • df: pd.DataFrame, the DataFrame object to display as a table
  • ax: mpl.axes.Axes, the axis object for drawing the table, the default is None
  • index_col: str, index column name in DataFrame. Default is None
  • columns: List[str], which columns are used for plotting. Use all columns for None
  • column_definitions: List[ColumnDefinition], the style definition class that needs to set the style column, the default is None
  • textprops: Dict[str, Any], a dictionary of text properties, the default is an empty dictionary
  • cell_kw: Dict[str, Any], a dictionary of cell properties, the default is an empty dictionary
  • col_label_cell_kw: Dict[str, Any], a dictionary of column label cell attributes, the default is an empty dictionary
  • col_label_divider: bool, whether to draw a divider below the column labels, the default is True.
  • footer_divider: bool, whether to draw a divider below the table, the default is False.
  • row_dividers: bool, whether to display row dividers, the default is True
  • row_divider_kw: Dict[str, Any], a dictionary of row divider attributes, the default is an empty dictionary
  • col_label_divider_kw: Dict[str, Any], a dictionary of column label divider attributes, the default is an empty dictionary
  • footer_divider_kw: Dict[str, Any], dictionary of footer divider attributes, default is empty dictionary
  • column_border_kw: Dict[str, Any], a dictionary of column border attributes, the default is an empty dictionary
  • even_row_color: str | Tuple, fill color of cells in even rows, default is None
  • odd_row_color: str | Tuple, fill color of odd row cells, default is None

Among these parameters, the parameters that control the table drawing effect are as follows:

  • column_definitions: column style customization
  • textprops: Text-like style customization
  • cell_kw: style customization of table cells
  • Styles for other setting parameters

Among these parameters, the most important parameter is column_definitions, because column_definitions can control almost all drawing effects. Next, this article mainly introduces the use of column_definitions in detail.

1.2 Column style customization

Plottable provides the ColumnDefinition class (alias ColDef) to customize the style of a single column of the graphical table. The construction parameters of the ColumnDefinition class are as follows:

  • name: str, the column name to set the drawing effect
  • title: str = None, plot title to override column names
  • width: float = 1, the width of the column, by default the width of each column is the width of the axis / the total number of columns
  • textprops: Dict[str, Any] = field(default_factory=dict), the text properties provided to each text cell
  • formatter: Callable = None, the callable function used to format the text appearance
  • cmap: Callable = None, a callable function that returns the color based on the value of the cell
  • text_cmap: Callable = None, a callable function that returns the color according to the value of the cell
  • group: str = None, set the grouped column label that will be displayed above the column label for each group
  • plot_fn: Callable = None, a callable function that takes the value of a cell as input and creates a subplot on each cell and plots on it to pass additional arguments
    to
  • plot_kw: Dict[str, Any] = field(default_factory=dict), additional keyword arguments provided to plot_fn
  • border: str | List = None, draw vertical border lines, can be "left" / "l", "right" / "r" or "both"

By setting the column_definitions parameter of the Table class through the ColumnDefinition class, the effect of different table column styles can be achieved. If it is the drawing effect of multiple columns at the same time, you need to use the [ColumnDefinition,ColumnDefinition] list form. Some usage examples are as follows

Set column headings and column widths

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from plottable import ColumnDefinition, ColDef, Table

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# name表示设置哪个列的样式
tab = Table(d, column_definitions=[ColumnDefinition(name="A", title="Title A"),
                                   ColumnDefinition(name="D", width=2)])

plt.show()

png

Set text properties and text formatting for columns

from plottable.formatters import decimal_to_percent

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# index列的文字居中,加粗
# 列A的文本数值改为百分制
tab = Table(d, column_definitions=[ColumnDefinition(name="index", textprops={
    
    "ha": "center", "weight": "bold"}),
                                   ColumnDefinition(name="A", formatter=decimal_to_percent)])

plt.show()

png

Set column cell background color and font color

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# cmap设置单元格背景色
tab = Table(d, column_definitions=[ColumnDefinition(name="A", cmap=matplotlib.cm.tab20, text_cmap=matplotlib.cm.Reds),
                                   ColumnDefinition(name="B", cmap=matplotlib.cm.tab20b),
                                   ColumnDefinition(name="C", text_cmap=matplotlib.cm.tab20c)])

plt.show()

png

Set the group name of the column

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 将列B和列C视为同一组,该组命名为group_name
tab = Table(d, column_definitions=[ColumnDefinition(name="B", group="group_name"), 
                                   ColumnDefinition(name="C", group="group_name")])

plt.show()

png

Set column borders

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 将列B和列C视为同一组,该组命名为group_name
tab = Table(d, column_definitions=[ColumnDefinition(name="A", border="l"), 
                                   ColumnDefinition(name="C",  border="both")])

plt.show()

png

Use of calling functions

The plot_fn and plot_kw parameters of the ColumnDefinition class provide a custom function to realize the drawing of the table effect. Among them, plot_fn represents the function to be called, and plot_kw represents the input parameter of the function to be called. In addition, some effect functions are preset in plotable.plots, and we can refer to these effect functions to define our own drawing functions. The preset effect functions are as follows:

from pathlib import Path
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap
from plottable import ColumnDefinition, Table
# 调用预置绘图函数
from plottable.plots import image,monochrome_image,circled_image,bar,percentile_bars,percentile_stars,progress_donut

cmap = matplotlib.cm.tab20
# 柱状图绘制
fig, ax = plt.subplots(figsize=(1, 1))
# 0.7表示数值,lw边框线宽
b = bar(ax, 0.7, plot_bg_bar=True, cmap=cmap, annotate=True, lw=2, height=0.35)
plt.show()

png

# 星星百分比图
fig, ax = plt.subplots(figsize=(2, 1))
stars = percentile_stars(ax, 85, background_color="#f0f0f0")

png

# 圆环图
fig, ax = plt.subplots(figsize=(1, 1))
donut = progress_donut(ax, 73, textprops={
    
    "fontsize": 14})
plt.show()

png

For the functions to be called, you can view the parameter meanings of these functions through the help function.

help(progress_donut)
Help on function progress_donut in module plottable.plots:

progress_donut(ax: matplotlib.axes._axes.Axes, val: float, radius: float = 0.45, color: str = None, background_color: str = None, width: float = 0.05, is_pct: bool = False, textprops: Dict[str, Any] = {}, formatter: Callable = None, **kwargs) -> List[matplotlib.patches.Wedge]
    Plots a Progress Donut on the axes.
    
    Args:
        ax (matplotlib.axes.Axes): Axes
        val (float): value
        radius (float, optional):
            radius of the progress donut. Defaults to 0.45.
        color (str, optional):
            color of the progress donut. Defaults to None.
        background_color (str, optional):
            background_color of the progress donut where the value is not reached. Defaults to None.
        width (float, optional):
            width of the donut wedge. Defaults to 0.05.
        is_pct (bool, optional):
            whether the value is given not as a decimal, but as a value between 0 and 100.
            Defaults to False.
        textprops (Dict[str, Any], optional):
            textprops passed to ax.text. Defaults to {}.
        formatter (Callable, optional):
            a string formatter.
            Can either be a string format, ie "{:2f}" for 2 decimal places.
            Or a Callable that is applied to the value. Defaults to None.
    
    Returns:
        List[matplotlib.patches.Wedge]

By setting the custom drawing function and function input parameters through the plot_fn and plot_kw parameters, different drawing effects can be displayed, as follows:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# plot_fn和plot_kw
tab = Table(d, textprops={
    
    "ha": "center"},
            column_definitions=[ColumnDefinition(name="B", plot_fn=bar,plot_kw={
    
    'plot_bg_bar':True,'cmap':cmap, 
                                'annotate':True, 'height':0.5}),
                                ColumnDefinition(name="D", plot_fn=progress_donut,plot_kw={
    
    'is_pct':True,})])

plt.show()

png

custom text format

Plottable provides the following three custom functions to represent different text formats:

  • decimal_to_percent: Convert numeric data to percentage
  • tickcross: format values ​​as ✔ or ✖
  • signed_integer: add positive and negative signs

We can set the text format through the formatter of ColumnDefinition, as follows:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from plottable import ColumnDefinition, Table
from plottable.formatters import decimal_to_percent,tickcross,signed_integer

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
tab = Table(d, column_definitions=[ColumnDefinition(name="A", formatter=decimal_to_percent),
                                   ColumnDefinition(name="C", formatter=tickcross),
                                   ColumnDefinition(name="D", formatter=signed_integer)])

plt.show()

png

In addition, you can also customize the function to set the text format, as follows:

def setformat(x):
    # 使用format格式函数
    return "{:.2e}".format(x)

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
tab = Table(d, textprops={
    
    "ha": "center"},column_definitions=[ColumnDefinition(name="B", formatter=setformat),
                                   ColumnDefinition(name="D", formatter=lambda x: round(x, 2))])

plt.show()

png

1.3 Row and column customization

Access row and column cells

Plottable provides a method to directly access a certain row and a certain column of a Table instance, as follows:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 实例化Table对象
tab = Table(d)

png

# 根据列名,提取整列
tab.columns['A']
Column(cells=[TextCell(xy=(1, 0), content=0.0, row_idx=0, col_idx=1), TextCell(xy=(1, 1), content=0.09, row_idx=1, col_idx=1), TextCell(xy=(1, 2), content=0.95, row_idx=2, col_idx=1), TextCell(xy=(1, 3), content=0.08, row_idx=3, col_idx=1), TextCell(xy=(1, 4), content=0.92, row_idx=4, col_idx=1)], index=1)
# 读取某列第1行的内容
tab.columns['B'].cells[1].content
0.04
# 根据行索引,提取整行
tab.rows[1]
Row(cells=[TextCell(xy=(0, 1), content=1, row_idx=1, col_idx=0), TextCell(xy=(1, 1), content=0.09, row_idx=1, col_idx=1), TextCell(xy=(2, 1), content=0.04, row_idx=1, col_idx=2), TextCell(xy=(3, 1), content=0.42, row_idx=1, col_idx=3), TextCell(xy=(4, 1), content=0.64, row_idx=1, col_idx=4), TextCell(xy=(5, 1), content=0.26, row_idx=1, col_idx=5)], index=1)
# 提取表头列名
tab.col_label_row
Row(cells=[TextCell(xy=(0, -1), content=index, row_idx=-1, col_idx=0), TextCell(xy=(1, -1), content=A, row_idx=-1, col_idx=1), TextCell(xy=(2, -1), content=B, row_idx=-1, col_idx=2), TextCell(xy=(3, -1), content=C, row_idx=-1, col_idx=3), TextCell(xy=(4, -1), content=D, row_idx=-1, col_idx=4), TextCell(xy=(5, -1), content=E, row_idx=-1, col_idx=5)], index=-1)

Set cell style

In the above example, we can see that plottable directly accesses the row and column objects of the table, so we can directly change the drawing effect or text effect by setting the drawing properties of these objects. The supported properties are as follows:

  • cell properties
    • set_alpha: Set the transparency of the cell.
    • set_color: Set the color of the cell.
    • set_edgecolor: Set the color of the cell edge.
    • set_facecolor: Set the color inside the cell.
    • set_fill: Set whether the cell is filled.
    • set_hatch: Set the fill pattern of the cell.
    • set_linestyle: Set the style of the cell edge line.
    • set_linewidth: Set the width of the cell edge line.
  • font properties
    • set_fontcolor: Set the color of the font.
    • set_fontfamily: Set the font family.
    • set_fontsize: Set the font size.
    • set_ha: Set the horizontal alignment of the text.
    • set_ma: Set the vertical alignment of the text.

The sample code is as follows:

from plottable.cmap import normed_cmap
import matplotlib.cm

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)
fig, ax = plt.subplots(figsize=(6, 5))
# 实例化Table对象
tab = Table(d)
# 设置行号为1的行的背景颜色
tab.rows[1].set_facecolor("grey")
# 设置B列的字体颜色
tab.columns['B'].set_fontcolor("red")
Column(cells=[TextCell(xy=(2, 0), content=0.38, row_idx=0, col_idx=2), TextCell(xy=(2, 1), content=0.69, row_idx=1, col_idx=2), TextCell(xy=(2, 2), content=0.15, row_idx=2, col_idx=2), TextCell(xy=(2, 3), content=0.74, row_idx=3, col_idx=2), TextCell(xy=(2, 4), content=0.41, row_idx=4, col_idx=2)], index=2)

png

2 Drawing example

2.1 Multi-line style setting

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import Table

d = pd.DataFrame(np.random.random((5, 5)), columns=["A", "B", "C", "D", "E"]).round(2)

fig, ax = plt.subplots(figsize=(6, 3))

# row_dividers显示行的分割线
# odd_row_color奇数行颜色
# even_row_color偶数行颜色
tab = Table(d, row_dividers=False, odd_row_color="#f0f0f0", even_row_color="#e0f6ff")

plt.show()

fig.savefig("table.jpg",dpi=300,bbox_inches='tight')

png

2.2 Custom cell effect

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

from plottable import ColumnDefinition, Table
from plottable.formatters import decimal_to_percent
from plottable.plots import bar, percentile_bars, percentile_stars, progress_donut

# 自定义颜色
cmap = LinearSegmentedColormap.from_list(
    name="BuYl", colors=["#01a6ff", "#eafedb", "#fffdbb", "#ffc834"], N=256
)

fig, ax = plt.subplots(figsize=(6, 6))

d = pd.DataFrame(np.random.random((5, 4)), columns=["A", "B", "C", "D"]).round(2)

tab = Table(
    d,
    cell_kw={
    
    
        "linewidth": 0,
        "edgecolor": "k",
    },
    textprops={
    
    "ha": "center"},
    column_definitions=[
        ColumnDefinition("index", textprops={
    
    "ha": "left"}),
        ColumnDefinition("A", plot_fn=percentile_bars, plot_kw={
    
    "is_pct": True}),
        ColumnDefinition(
            "B", width=1.5, plot_fn=percentile_stars, plot_kw={
    
    "is_pct": True}
        ),
        ColumnDefinition(
            "C",
            plot_fn=progress_donut,
            plot_kw={
    
    
                "is_pct": True,
                "formatter": "{:.0%}"
                },
            ),
        ColumnDefinition(
            "D",
            width=1.25,
            plot_fn=bar,
            plot_kw={
    
    
                "cmap": cmap,
                "plot_bg_bar": True,
                "annotate": True,
                "height": 0.5,
                "lw": 0.5,
                "formatter": decimal_to_percent,
            },
        ),
    ],
)

plt.show()

png

2.3 Heatmap

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap
# ColDef是ColumnDefinition的别名
from plottable import ColDef, Table

# 自定义颜色
cmap = LinearSegmentedColormap.from_list(
    name="bugw", colors=["#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab"], N=256
)
# 创建数据
cities = [
    "TORONTO",
    "VANCOUVER",
    "HALIFAX",
    "CALGARY",
    "OTTAWA",
    "MONTREAL",
    "WINNIPEG",
    "EDMONTON",
    "LONDON",
    "ST. JONES",
]
months = [
    "JAN",
    "FEB",
    "MAR",
    "APR",
    "MAY",
    "JUN",
    "JUL",
    "AUG",
    "SEP",
    "OCT",
    "NOV",
    "DEC",
]

data = np.random.random((10, 12)) + np.abs(np.arange(12) - 5.5)
data = (1 - data / (np.max(data)))
data.shape
(10, 12)
# 绘图
d = pd.DataFrame(data, columns=months, index=cities).round(2)
fig, ax = plt.subplots(figsize=(14, 5))

# 自定义各列的绘图效果
column_definitions = [
    ColDef(name, cmap=cmap, formatter=lambda x: "") for name in months
] + [ColDef("index", title="", width=1.5, textprops={
    
    "ha": "right"})]

tab = Table(
    d,
    column_definitions=column_definitions,
    row_dividers=False,
    col_label_divider=False,
    textprops={
    
    "ha": "center", "fontname": "Roboto"},
    # 设置各个单元格的效果
    cell_kw={
    
    
        "edgecolor": "black",
        "linewidth": 0,
    },
)


# 设置列标题文字和背景颜色
tab.col_label_row.set_facecolor("white")
tab.col_label_row.set_fontcolor("black")
# 设置行标题文字和背景颜色
tab.columns["index"].set_facecolor("black")
tab.columns["index"].set_fontcolor("white")
tab.columns["index"].set_linewidth(0)

plt.show()

png

2.4 Women's World Cup Prediction Data Display

step1 prepare data

Download example data, all example data in plottable-example_notebooks .

# 下载数据集
# !wget https://raw.githubusercontent.com/znstrider/plottable/master/docs/example_notebooks/data/wwc_forecasts.csv
from pathlib import Path

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.colors import LinearSegmentedColormap

from plottable import ColumnDefinition, Table
from plottable.cmap import normed_cmap
from plottable.formatters import decimal_to_percent
from plottable.plots import circled_image # image
cols = [
    "team",
    "points",
    "group",
    "spi",
    "global_o",
    "global_d",
    "group_1",
    "group_2",
    "group_3",
    "make_round_of_16",
    "make_quarters",
    "make_semis",
    "make_final",
    "win_league",
]

# 读取数据
df = pd.read_csv(
    "data/wwc_forecasts.csv",
    usecols=cols,
)

# 展示数据
df.head()
team group spi global_o global_d group_1 group_2 group_3 make_round_of_16 make_quarters make_semis make_final win_league points
0 USA F 98.32748 5.52561 0.58179 0.82956 0.17044 0.00000 1.0 0.78079 0.47307 0.35076 0.23618 6
1 France A 96.29671 4.31375 0.52137 0.99483 0.00515 0.00002 1.0 0.78367 0.42052 0.30038 0.19428 6
2 Germany B 93.76549 3.96791 0.67818 0.98483 0.01517 0.00000 1.0 0.89280 0.48039 0.27710 0.12256 6
3 Canada E 93.51599 3.67537 0.56980 0.38830 0.61170 0.00000 1.0 0.59192 0.36140 0.20157 0.09031 6
4 England D 91.92311 3.51560 0.63717 0.70570 0.29430 0.00000 1.0 0.68510 0.43053 0.16465 0.08003 6

In addition, we need to prepare the national flag picture corresponding to each country, and this data is also under plottable-example_notebooks .

# 读取图片路径
flag_paths = list(Path("data/country_flags").glob("*.png"))
country_to_flagpath = {
    
    p.stem: p for p in flag_paths}

step2 data processing

This step needs to merge the data and convert it into a data structure usable by plottable.

# 重置列名
colnames = [
    "Team",
    "Points",
    "Group",
    "SPI",
    "OFF",
    "DEF",
    "1st Place",
    "2nd Place",
    "3rd Place",
    "Make Rd Of 16",
    "Make Quarters",
    "Make Semis",
    "Make Finals",
    "Win World Cup",
]

col_to_name = dict(zip(cols, colnames))
col_to_name
{'team': 'Team',
 'points': 'Points',
 'group': 'Group',
 'spi': 'SPI',
 'global_o': 'OFF',
 'global_d': 'DEF',
 'group_1': '1st Place',
 'group_2': '2nd Place',
 'group_3': '3rd Place',
 'make_round_of_16': 'Make Rd Of 16',
 'make_quarters': 'Make Quarters',
 'make_semis': 'Make Semis',
 'make_final': 'Make Finals',
 'win_league': 'Win World Cup'}
df[["spi", "global_o", "global_d"]] = df[["spi", "global_o", "global_d"]].round(1)

df = df.rename(col_to_name, axis=1)
# 删除Points列
df = df.drop("Points", axis=1)
# 插入列
df.insert(0, "Flag", df["Team"].apply(lambda x: country_to_flagpath.get(x)))
df = df.set_index("Team")
df.head()
Flag Group SPI OFF DEF 1st Place 2nd Place 3rd Place Make Rd Of 16 Make Quarters Make Semis Make Finals Win World Cup
Team
USA data/country_flags/USA.png F 98.3 5.5 0.6 0.82956 0.17044 0.00000 1.0 0.78079 0.47307 0.35076 0.23618
France data/country_flags/France.png A 96.3 4.3 0.5 0.99483 0.00515 0.00002 1.0 0.78367 0.42052 0.30038 0.19428
Germany data/country_flags/Germany.png B 93.8 4.0 0.7 0.98483 0.01517 0.00000 1.0 0.89280 0.48039 0.27710 0.12256
Canada data/country_flags/Canada.png E 93.5 3.7 0.6 0.38830 0.61170 0.00000 1.0 0.59192 0.36140 0.20157 0.09031
England data/country_flags/England.png D 91.9 3.5 0.6 0.70570 0.29430 0.00000 1.0 0.68510 0.43053 0.16465 0.08003

step3 drawing

# 设置颜色
cmap = LinearSegmentedColormap.from_list(
    name="bugw", colors=["#ffffff", "#f2fbd2", "#c9ecb4", "#93d3ab", "#35b0ab"], N=256
)
team_rating_cols = ["SPI", "OFF", "DEF"]
group_stage_cols = ["1st Place", "2nd Place", "3rd Place"]
knockout_stage_cols = list(df.columns[-5:])

# 单独设置每一列的绘制参数
col_defs = (
    # 绘制第一部分效果
    [
        ColumnDefinition(
            name="Flag",
            title="",
            textprops={
    
    "ha": "center"},
            width=0.5,
            # 设置自定义效果展示函数
            plot_fn=circled_image,
        ),
        ColumnDefinition(
            name="Team",
            textprops={
    
    "ha": "left", "weight": "bold"},
            width=1.5,
        ),
        ColumnDefinition(
            name="Group",
            textprops={
    
    "ha": "center"},
            width=0.75,
        ),
        ColumnDefinition(
            name="SPI",
            group="Team Rating",
            textprops={
    
    "ha": "center"},
            width=0.75,
        ),
        ColumnDefinition(
            name="OFF",
            width=0.75,
            textprops={
    
    
                "ha": "center",
                # 设置填充方式
                "bbox": {
    
    "boxstyle": "circle", "pad": 0.35},
            },
            cmap=normed_cmap(df["OFF"], cmap=matplotlib.cm.PiYG, num_stds=2.5),
            group="Team Rating",
        ),
        ColumnDefinition(
            name="DEF",
            width=0.75,
            textprops={
    
    
                "ha": "center",
                "bbox": {
    
    "boxstyle": "circle", "pad": 0.35},
            },
            cmap=normed_cmap(df["DEF"], cmap=matplotlib.cm.PiYG_r, num_stds=2.5),
            group="Team Rating",
        ),
    ]
    # 绘制第二部分效果
    + [
        ColumnDefinition(
            name=group_stage_cols[0],
            title=group_stage_cols[0].replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            group="Group Stage Chances",
            # 设置边框
            border="left",
        )
    ]
    + [
        ColumnDefinition(
            name=col,
            title=col.replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            group="Group Stage Chances",
        )
        for col in group_stage_cols[1:]
    ]
    # 绘制第三部分效果
    + [
        ColumnDefinition(
            name=knockout_stage_cols[0],
            title=knockout_stage_cols[0].replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            cmap=cmap,
            group="Knockout Stage Chances",
            border="left",
        )
    ]
    + [
        ColumnDefinition(
            name=col,
            title=col.replace(" ", "\n", 1),
            formatter=decimal_to_percent,
            cmap=cmap,
            group="Knockout Stage Chances",
        )
        for col in knockout_stage_cols[1:]
    ]
)
# 绘图
fig, ax = plt.subplots(figsize=(18, 18))

table = Table(
    df,
    column_definitions=col_defs,
    row_dividers=True,
    footer_divider=True,
    ax=ax,
    textprops={
    
    "fontsize": 14},
    row_divider_kw={
    
    "linewidth": 1, "linestyle": (0, (1, 5))},
    col_label_divider_kw={
    
    "linewidth": 1, "linestyle": "-"},
    column_border_kw={
    
    "linewidth": 1, "linestyle": "-"},
).autoset_fontcolors(colnames=["OFF", "DEF"])


plt.show()
# 保存图片
fig.savefig("wwc_table.jpg", facecolor=ax.get_facecolor(), dpi=300,bbox_inches='tight')

png

2.5 Bundesliga standings display

step1 prepare data

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from plottable import ColDef, Table
from plottable.plots import image
# 下载联赛数据
# !wget https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv
# !wget https://projects.fivethirtyeight.com/soccer-api/club/spi_matches_latest.csv
# 数据地址
FIVETHIRTYEIGHT_URLS = {
    
    
    "SPI_MATCHES": "https://projects.fivethirtyeight.com/soccer-api/club/spi_matches.csv",
    "SPI_MATCHES_LATEST": "https://projects.fivethirtyeight.com/soccer-api/club/spi_matches_latest.csv",
}

# 读取数据
# df = pd.read_csv(FIVETHIRTYEIGHT_URLS["SPI_MATCHES_LATEST"])
df = pd.read_csv("data/spi_matches_latest.csv")
df.head()
season date league_id league team1 team2 sleep1 spi2 prob1 prob2 ... importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1 adj_score2
0 2019 2019-03-01 1979 Chinese Super League Shandong Luneng Guizhou Renhe 48.22 37.83 0.5755 0.1740 ... 45.9 22.1 1.0 0.0 1.39 0.26 2.05 0.54 1.05 0.00
1 2019 2019-03-01 1979 Chinese Super League Shanghai Greenland Shanghai SIPG 39.81 60.08 0.2387 0.5203 ... 25.6 63.4 0.0 4.0 0.57 2.76 0.80 1.50 0.00 3.26
2 2019 2019-03-01 1979 Chinese Super League Guangzhou Evergrande Tianjin Quanujian 65.59 39.99 0.7832 0.0673 ... 77.1 28.8 3.0 0.0 0.49 0.45 1.05 0.75 3.15 0.00
3 2019 2019-03-01 1979 Chinese Super League Wuhan Zall Beijing Guoan 32.25 54.82 0.2276 0.5226 ... 35.8 58.9 0.0 1.0 1.12 0.97 1.51 0.94 0.00 1.05
4 2019 2019-03-01 1979 Chinese Super League Chongqing Lifan Guangzhou RF 38.24 40.45 0.4403 0.2932 ... 26.2 21.3 2.0 2.0 2.77 3.17 1.05 2.08 2.10 2.10

5 rows × 23 columns

# 筛选德甲联赛数据,并删除为空数据
bl = df.loc[df.league == "German Bundesliga"].dropna()
bl.head()
season date league_id league team1 team2 sleep1 spi2 prob1 prob2 ... importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1 adj_score2
497 2022 2022-08-05 1845 German Bundesliga Eintracht Frankfurt Bayern Munich 68.47 91.75 0.1350 0.6796 ... 32.6 71.9 1.0 6.0 0.83 4.50 0.65 2.72 1.05 5.96
514 2022 2022-08-06 1845 German Bundesliga VfL Bochum Mainz 60.73 68.88 0.3568 0.3629 ... 33.5 34.5 1.0 2.0 1.00 1.62 0.96 0.86 1.05 2.10
515 2022 2022-08-06 1845 German Bundesliga Borussia Monchengladbach TSG Hoffenheim 69.38 66.77 0.4872 0.2742 ... 40.2 33.3 3.0 1.0 1.86 0.10 2.51 0.31 2.36 1.05
516 2022 2022-08-06 1845 German Bundesliga VfL Wolfsburg Werder Bremen 68.18 59.82 0.5319 0.2014 ... 30.2 33.3 2.0 2.0 0.81 0.97 1.07 1.25 2.10 2.10
517 2022 2022-08-06 1845 German Bundesliga 1. FC Union Berlin Hertha Berlin 69.98 59.70 0.5479 0.1860 ... 34.9 33.0 3.0 1.0 1.25 0.40 1.66 0.36 3.15 1.05

5 rows × 23 columns

step2 数据处理

# 统计得分
def add_points(df: pd.DataFrame) -> pd.DataFrame:
    # 三元表达式
    # df["score1"] > df["score2"],则返回3
    # np.where(df["score1"] == df["score2"],则返回1
    # 否则为0
    df["pts_home"] = np.where(
        df["score1"] > df["score2"], 3, np.where(df["score1"] == df["score2"], 1, 0)
    )
    df["pts_away"] = np.where(
        df["score1"] < df["score2"], 3, np.where(df["score1"] == df["score2"], 1, 0)
    )
    
    return df

# 统计得分数据
bl = add_points(bl)
# 总得分、总进球数、总助攻数和总黄牌数

# 以下代码先分别统计team1和team2的得分数据,然后将两组数据相加
perform = (
    bl.groupby("team1")[[
        "pts_home",
        "score1",
        "score2",
        "xg1",
        "xg2",
    ]]
    .sum()
    .set_axis(
        [
            "pts",
            "gf",
            "ga",
            "xgf",
            "xga",
        ],
        axis=1,
    )
    .add(
        bl.groupby("team2")[[
            "pts_away",
            "score2",
            "score1",
            "xg2",
            "xg1",
        ]]
        .sum()
        .set_axis(
            [
                "pts",
                "gf",
                "ga",
                "xgf",
                "xga",
            ],
            axis=1,
        )
    )
)

# 由于python和pandas版本问题,如果上面的代码出问题,则使用下面代码
# t1= bl.groupby("team1")[["pts_home","score1","score2","xg1","xg2", ]]
# t1 = t1.sum()
# t1.set_axis( ["pts","gf","ga","xgf","xga", ], axis=1,)
# t2 = bl.groupby("team1")[["pts_home","score1","score2","xg1","xg2", ]]
# t2 = t2.sum()
# t2.set_axis( ["pts","gf","ga","xgf","xga", ], axis=1,)
# perform = (t1.add(t2))

perform.shape
(18, 5)
# 汇总得分表现数据
perform.index.name = "team"

perform["gd"] = perform["gf"] - perform["ga"]

perform = perform[
    [
        "pts",
        "gd",
        "gf",
        "ga",
        "xgf",
        "xga",
    ]
]

perform["games"] = bl.groupby("team1").size().add(bl.groupby("team2").size())
perform.head()
pts gd gf ga xgf xga games
team
1. FC Union Berlin 62 13.0 51.0 38.0 35.93 43.06 34
Bayer Leverkusen 50 8.0 57.0 49.0 53.62 48.20 34
Bayern Munich 71 54.0 92.0 38.0 84.93 40.12 34
Borussia Dortmund 71 39.0 83.0 44.0 75.96 42.69 34
Borussia Monchengladbach 43 -3.0 52.0 55.0 53.05 58.88 34
# 统计各队的胜负数据
def get_wins_draws_losses(games: pd.DataFrame) -> pd.DataFrame:
    return (
        games.rename({
    
    "pts_home": "pts", "team1": "team"}, axis=1)
        .groupby("team")["pts"]
        .value_counts()
        .add(
            games.rename({
    
    "pts_away": "pts", "team2": "team"}, axis=1)
            .groupby("team")["pts"]
            .value_counts(),
            fill_value=0,
        )
        .astype(int)
        .rename("count")
        .reset_index(level=1)
        .pivot(columns="pts", values="count")
        .rename({
    
    0: "L", 1: "D", 3: "W"}, axis=1)[["W", "D", "L"]]
    )

wins_draws_losses = get_wins_draws_losses(bl)
wins_draws_losses.head()
pts W D L
team
1. FC Union Berlin 18 8 8
Bayer Leverkusen 14 8 12
Bayern Munich 21 8 5
Borussia Dortmund 22 5 7
Borussia Monchengladbach 11 10 13
# 合并得分和胜负数据
perform = pd.concat([perform, wins_draws_losses], axis=1)
perform.head()
pts gd gf ga xgf xga games W D L
team
1. FC Union Berlin 62 13.0 51.0 38.0 35.93 43.06 34 18 8 8
Bayer Leverkusen 50 8.0 57.0 49.0 53.62 48.20 34 14 8 12
Bayern Munich 71 54.0 92.0 38.0 84.93 40.12 34 21 8 5
Borussia Dortmund 71 39.0 83.0 44.0 75.96 42.69 34 22 5 7
Borussia Monchengladbach 43 -3.0 52.0 55.0 53.05 58.88 34 11 10 13

step3 映射队标图片

队标图片地址为:plottable-example_notebooks

# 创建队名和队标的索引数据
club_logo_path = Path("data/bundesliga_crests_22_23")
club_logo_files = list(club_logo_path.glob("*.png"))
club_logos_paths = {
    
    f.stem: f for f in club_logo_files}
perform = perform.reset_index()

# 添加新列
perform.insert(0, "crest", perform["team"])
perform["crest"] = perform["crest"].replace(club_logos_paths)

# 数据排序
perform = perform.sort_values(by=["pts", "gd", "gf"], ascending=False)

for colname in ["gd", "gf", "ga"]:
    perform[colname] = perform[colname].astype("int32")

perform["goal_difference"] = perform["gf"].astype(str) + ":" + perform["ga"].astype(str)

# 添加排名
perform["rank"] = list(range(1, 19))

perform.head()
crest team pts gd gf ga xgf xga games W D L goal_difference rank
2 data/bundesliga_crests_22_23/Bayern Munich.png Bayern Munich 71 54 92 38 84.93 40.12 34 21 8 5 92:38 1
3 data/bundesliga_crests_22_23/Borussia Dortmund... Borussia Dortmund 71 39 83 44 75.96 42.69 34 22 5 7 83:44 2
10 data/bundesliga_crests_22_23/RB Leipzig.png RB Leipzig 66 23 64 41 67.01 37.48 34 20 6 8 64:41 3
0 data/bundesliga_crests_22_23/1. FC Union Berli... 1. FC Union Berlin 62 13 51 38 35.93 43.06 34 18 8 8 51:38 4
11 data/bundesliga_crests_22_23/SC Freiburg.png SC Freiburg 59 7 51 44 53.11 52.25 34 17 8 9 51:44 5

step4 Set drawing data

# 设置颜色
row_colors = {
    
    
    "top4": "#2d3636",
    "top6": "#516362",
    "playoffs": "#8d9386",
    "relegation": "#c8ab8d",
    "even": "#627979",
    "odd": "#68817e",
}

bg_color = row_colors["odd"]
text_color = "#e0e8df"
# 确定绘图列
table_cols = ["crest", "team", "games", "W", "D", "L", "goal_difference", "gd", "pts"]
perform[table_cols].head()
crest team games W D L goal_difference gd pts
2 data/bundesliga_crests_22_23/Bayern Munich.png Bayern Munich 34 21 8 5 92:38 54 71
3 data/bundesliga_crests_22_23/Borussia Dortmund... Borussia Dortmund 34 22 5 7 83:44 39 71
10 data/bundesliga_crests_22_23/RB Leipzig.png RB Leipzig 34 20 6 8 64:41 23 66
0 data/bundesliga_crests_22_23/1. FC Union Berli... 1. FC Union Berlin 34 18 8 8 51:38 13 62
11 data/bundesliga_crests_22_23/SC Freiburg.png SC Freiburg 34 17 8 9 51:44 7 59
# 定义各列绘图效果
table_col_defs = [
    ColDef("rank", width=0.5, title=""),
    ColDef("crest", width=0.35, plot_fn=image, title=""),
    ColDef("team", width=2.5, title="", textprops={
    
    "ha": "left"}),
    ColDef("games", width=0.5, title="Games"),
    ColDef("W", width=0.5),
    ColDef("D", width=0.5),
    ColDef("L", width=0.5),
    ColDef("goal_difference", title="Goals"),
    ColDef("gd", width=0.5, title="", formatter="{:+}"),
    ColDef("pts", border="left", title="Points"),
]

step5 drawing

fig, ax = plt.subplots(figsize=(14, 12))

plt.rcParams["text.color"] = text_color
# 设置绘图字体
# plt.rcParams["font.family"] = "Roboto"

# 设置背景颜色
fig.set_facecolor(bg_color)
ax.set_facecolor(bg_color)

table = Table(
    perform,
    column_definitions=table_col_defs,
    row_dividers=True,
    col_label_divider=False,
    footer_divider=True,
    index_col="rank",
    columns=table_cols,
    even_row_color=row_colors["even"],
    footer_divider_kw={
    
    "color": bg_color, "lw": 2},
    row_divider_kw={
    
    "color": bg_color, "lw": 2},
    column_border_kw={
    
    "color": bg_color, "lw": 2},
    # 如果设置字体需要添加"fontname": "Roboto"
    textprops={
    
    "fontsize": 16, "ha": "center"},
)


# 设置不同行的颜色
for idx in [0, 1, 2, 3]:
    table.rows[idx].set_facecolor(row_colors["top4"])
    
for idx in [4, 5]:
    table.rows[idx].set_facecolor(row_colors["top6"])
    
table.rows[15].set_facecolor(row_colors["playoffs"])

for idx in [16, 17]:
    table.rows[idx].set_facecolor(row_colors["relegation"])
    table.rows[idx].set_fontcolor(row_colors["top4"])


fig.savefig(
    "bohndesliga_table_recreation.png",
    facecolor=fig.get_facecolor(),
    bbox_inches='tight',
    dpi=300,
)

png

3 Reference

Guess you like

Origin blog.csdn.net/LuohenYJ/article/details/131644942