Pandas time series-specific operations in Altair

arjan-hada :

Is it possible to perform groupby operations for datetime object in Altair using transform_aggregate function? I am trying to replicate some of the time series plot from "Example: Visualizing Seattle Bicycle Counts" example of Jake VDP's book - https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html

Does transform_aggregate allow time-series specific operations like resample?

jakevdp :

Altair has built-in time groupings using the TimeUnit transform, which can be used either via an explicit transform, or via encoding shorthands.

Here is an example of reproducing one of the charts from that section of the book – note that the Vega-Lite renderer becomes slow when data grows to tens of thousands of entries, so I use altair_data_server to serve the data and limit the chart to the first year:

# Load the data
# !curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

import pandas as pd
data = pd.read_csv('FremontBridge.csv', parse_dates=['Date'])
data.columns = ['Date', 'Total', 'East', 'West']
df = data.iloc[:24 * 365]  # limit to first year of data

# Draw the chart
import altair as alt
alt.data_transformers.enable('data_server')  # handle larger datasets

alt.Chart(df).mark_line().transform_fold(
    ['Total', 'East', 'West'],
).encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N'
)

enter image description here

This timeUnit grammar is quite flexible, and allows you to split and group by multiple date attributes in a single chart; for example, here's the trend faceted by day of the week:

alt.Chart(df).transform_fold(
    ['Total', 'East', 'West']
).mark_line().encode(
    x='hours(Date):T',
    y='sum(value):Q',
    color='key:N',
    facet=alt.Facet('day(Date):O', columns=4)
).properties(width=200, height=150)

enter image description here

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=12400&siteId=1