Back to the Future: Analyzing Time Series Data Using Markov Transition Matrices

1. Description

        In this article, we examine how reconstruction of time series data using Markov transition matrices can yield interesting descriptive insights as well as elegant methods for forecasting, backtracking, and convergence analysis. Travel back and forth in time - like Doc's modified DeLorean time machine from the sci-fi classic Back to the Future .

        Note: All equations and graph images in the following sections were created by the authors of this article.

2. Basic building blocks

        Let  E define the set of k  unique events  that make up the time series data  . For example, a time series might consist of three basic and distinct events that represent the types of path trajectories observed when plotting data across discrete time steps: downward , sideways , and upward . Let  S  define a sequence of length  n  (representing discrete time steps)   consisting of the events defined in E , representing some or all of the data. For example, the sequence [up, down , up, sideways , up ] represents five time steps of the data.

        A Markovian transition matrix M of dimension  can now be defined such that each element  M ( i, j) describes  the transition from  an event  E(i) in time step  t to time step in a given time series The probability of event E(j)  in  t+1  . In other words, M(i, j)  represents the conditional probability of transitioning between two events in successive time steps. In a graph-theoretic sense, events  E(i ) and E(j) can be thought of as nodes connected by directed edges E(i) → E(j), if E(i) in the time series data is followed by  E  ( j);  then the Markov transition matrix  M  essentially represents the normalized version of the adjacency matrix (or co-occurrence matrix) of the events described by the nodes in the graph.

Next, let's see what we can do with these basic building blocks.

3. Practical application of the transition matrix: a simple example

        Suppose we have the following raw time series data covering 11 consecutive time steps: [1, 2, -2, -1, 0, 0, 2, 2, 1, 2, 3]. Using the simplified view of the path trajectory described above, we can transform the data into the following sequence of 10 events that describe transitions between adjacent time steps: [up, down, up, up, up, up , up , up , flat , up , flat , down , up, up ].

We can now construct the following adjacency matrix for capturing coincident patterns in event sequences:

        The element  A(i, j)  represents the number of times in our sequence of events that event  at some time step  t  is followed by event  j  at time step t+1  ; i  and  j  are the row and column indices, respectively. Note that rows represent events in top-to-bottom order , and columns represent the same order from left to right. For example, the top-left element of A  indicates that in a given sequence of events, an up event is followed by another  up  event twice. The middle right element of A  indicates that, in the sequence of events, a flat event is followed by a down event. etc.

We can normalize the matrix A         either row-wise or column-wise   to generate the transition matrix. If we were to use row-based normalization, then element  M(i, j) would describe the probability of  seeing event E( j)  in time step  t+1  , given event E(i)  in  time step t . Therefore, the probabilities in each row should sum to 1. In our example, the row normalized matrix looks like this:

        Likewise, if we were to use column-based normalization, then an element  M(i , j) would describe an event E(j) at a given time step t that, in time step t-1  , has The probability of event  E(i)  . The probabilities in each column should now sum to 1. In our example, the column normalized matrix looks like this:

        Note that conditional probabilities for row normalization (nominally moving forward) may not be the same as for column normalization (looking backward in time).

4. Example in Python code

        To try out these concepts, here's some basic Python code to capture what's happening in the example above.

        Make sure you have the Pandas package installed first:

pip install pandas==0.25.2

Then run the following code:

import pandas as pd

# Define helper functions
def get_transition_tuples(ls):
    ''' Converts a time series into a list of transition tuples
    '''
    return [(ls[i-1], ls[i]) for i in range(1, len(ls))]

def get_transition_event(tup):
    ''' Converts a tuple into a discrete transition event
    '''
    transition_event = 'flat'
    if tup[0] < tup[1]:
        transition_event = 'up'
    if tup[0] > tup[1]:
        transition_event = 'down'
    return transition_event

# Generate raw time series data
ls_raw_time_series = [1, 2, -2, -1, 0, 0, 2, 2, 1, 2, 3]

# Derive single-step state transition tuples
ls_transitions = get_transition_tuples(ls_raw_time_series)

# Convert raw time series data into discrete events
ls_events = [get_transition_event(tup) for tup in ls_transitions]
ls_event_transitions = get_transition_tuples(ls_events)

# Create an index (list) of unique event types
ls_index = ['up', 'flat', 'down']

# Initialize Markov transition matrix with zeros
df = pd.DataFrame(0, index=ls_index, columns=ls_index)

# Derive transition matrix (or co-occurrence matrix)
for i, j in ls_event_transitions:
    df[j][i] += 1  # Update j-th column and i-th row

''' Derive row-normalized transition matrix:
- Elements are normalized by row sum (fill NAs/NaNs with 0s)
- df.sum(axis=1) sums up each row, df.div(..., axis=0) then divides each column element
'''
df_rnorm = df.div(df.sum(axis=1), axis=0).fillna(0.00)

''' Derive column-normalized transition matrix:
- Elements are normalized by column sum (fill NAs/NaNs with 0s)
- df.sum(axis=0) sums up each col, df.div(..., axis=1) then divides each row element
'''
df_cnorm = df.div(df.sum(axis=0), axis=1).fillna(0.00)

This should produce the following transition matrix:

>>> df  # Transition matrix with raw event co-occurrences
      up    flat  down
up    2     2     1
flat  1     0     1
down  2     0     0
>>> df_rnorm  # Row-normalized transition matrix
      up    flat  down
up    0.4   0.4   0.2
flat  0.5   0.0   0.5
down  1.0   0.0   0.0
>>> df_cnorm  # Column-normalized transition matrix
      up    flat  down
up    0.4   1.0   0.5
flat  0.2   0.0   0.5
down  0.4   0.0   0.0

        A neat way to visualize transfer matrices is to describe them as directed weighted graphs using a graphics package like Graphviz or NetworkX .

        We'll be using Graphviz here, so you'll need to install the package to follow along:

pip install graphviz==0.13.2

        It's worth browsing the short and sweet official installation guide to make sure you've got the package set up correctly, especially for Windows users who may need to go through some additional installation steps.

        After setting up Graphviz, create some helper functions for plotting:

from graphviz import Digraph

# Define functions to visualize transition matrices as graphs

def get_df_edgelist(df, ls_index):
    ''' Derive an edge list with weight values
    '''
    edgelist = []
    for i in ls_index:
        for j in ls_index:
            edgelist.append([i, j, df[j][i]])
    return pd.DataFrame(edgelist, columns=['src', 'dst', 'weight'])

def edgelist_to_digraph(df_edgelist):
    ''' Convert an edge list into a weighted directed graph
    '''
    g = Digraph(format='jpeg')
    g.attr(rankdir='LR', size='30')
    g.attr('node', shape='circle')
    nodelist = []
    for _, row in df_edgelist.iterrows():
        node1, node2, weight = [str(item) for item in row]
        if node1 not in nodelist:
            g.node(node1, **{'width': '1', 'height': '1'})
            nodelist.append(node1)
        if node2 not in nodelist:
            g.node(node2, **{'width': '1', 'height': '1'})
            nodelist.append(node2)
        g.edge(node1, node2, label=weight)
    return g

def render_graph(fname, df, ls_index):
    ''' Render a visual graph and saves it to disk
    '''
    df_edgelist = get_df_edgelist(df, ls_index)
    g = edgelist_to_digraph(df_edgelist)
    g.render(fname, view=True)

Now you can generate each transition matrix. By default, the output graphics will be stored in your working directory.

# Generate graph of transition matrix (raw co-occurrences)
render_graph('adjmat', df, ls_index)

# Generate graph of row-normalized transition matrix
render_graph('transmat_rnorm', df_rnorm, ls_index)

# Generate graph of column-normalized transition matrix
render_graph('transmat_cnorm', df_cnorm, ls_index)

        Original co-occurrence:

        Row normalized transition probabilities:

        Column normalized transition probabilities:

5. Practical application

5.1 Descriptive insights

        The first and most obvious thing we can do with the transition matrix is ​​to gain descriptive insights just by examining the matrix and its visual graphical representation. For example, from the output of the example in the previous section, we can glean high-level insights like this:

  • Of the 9 possible event transitions, 3 never occurred in our sample ( flatflat , downdown, and downflat ). A low probability of consecutive flat events may indicate volatility in the system the time series data is tracking.
  • The up  event is the only event type with a non-zero probability (0.4) of consecutive occurrence. In fact, this transition probability is one of the highest in our data, possibly indicating a strengthening effect in the data underlying system.
  • In our case, row-based and column-based normalization produce different matrices, although there is some overlap. This tells us that our time series is inherently asymmetric in time, that is, we see patterns somewhat differently depending on whether we look backward or forward from a given reference point.

5.2 Forecasting and backtracking

        By chaining together copies of the transition matrix, we can generate probabilities for events to occur forward and backward in time; this can be called prediction and backward prediction , respectively . A central assumption here is that "history doesn't matter"; no matter what time step T we take as  a reference point, we assume that the transition matrix gives  T+1 (if the rows are normalized) and  T-1 (if the columns are normalized ) relative probability. The result is that we can use the transition matrix to predict and backtrack from arbitrary time steps. In particular, we can use the row-normalized transition matrix for forecasting and the column-normalized transition matrix for inverse forecasting.

        Taking the matrix computed in the example above, suppose we observe an up event at time step  t = 25 , and we wish to predict which event is most likely to occur at time step = 27. By inspecting the first row of the row-normalized transition matrix, we directly see that at the next time step = 26, the probabilities of observing up , sideways and down events are 0.4, 0.4 and 0.2 , respectively . In order to derive the probability of a similar event at time step = 27 (i.e., two time steps from our reference point), we need to multiply the transition matrix itself, as follows:

        Notice how the event probabilities change relative to our reference time step. For example, given an up event at t = 25, the probability of observing another up event is 4.26 at t = 0 (one step into the future), increasing to 56.27 at = 0 (two steps into the future).  At the same time, the probability of observing a flat event at t = 0 is also 4.26, but decreases to 16.27 at t = 0. Crucially, this matrix multiplication approach supports prediction and playback. In general, to predict or inverse predict  event probabilities beyond the  nth power, we can compute row-normalized or column-normalized transition matrices to the  nth  power, respectively.

        Transition matrices can also be used to forecast raw underlying time series data. Let us assume that a rising or falling event is equivalent to a single unit of change in the time series data. Now suppose the time series goes up from 2 to 25 at t=1 ( up event) and we wish to predict the progress of the time series at t=26 and  =27. Rising and sideways events have the highest probability (4.26) at t  = 0 after an up event . Therefore, we can predict that at t = 26 the time series is likely to be [1, 2, 3] or [1, 2, 2], both of which may yield two possibilities at t =   27 : [1, 2, 3] leads to [1, 2, 3, 4] or [1, 2, 3, 3] (each with probability 0.4, as before), and [1, 2, 2] leads to [1, 2, 2, 3] or [1, 2, 2, 1] (each with probability 0.5). In general, we expect that the larger and richer the dataset used to generate the transition matrix, the greater the variance captured in terms of the underlying chain of events, and thus the higher the stepwise prediction accuracy. 

        Multiplicative chains of transition matrices lead to increasingly complex but fully decomposable combinations of raw event transition probabilities. This decomposability can help us gain a deeper understanding of the interdependence of events that make up time series data (or stochastic processes).

5.3 Convergence analysis

        The concept of linking transition matrices together naturally leads to an interesting question: Will the probability of transition matrix M converge? Specifically, is there a stable transition matrix  M * such that MM* = M* ? If so, then  lim(n → ∞) Mⁿ = M*, i.e., we expect the Markov process represented by the chain of matrix multiplications  Mⁿ to converge to a steady state M*  at some point in time  ; in this case, the The process is convergent and therefore stable. Assuming our transition matrix is ​​row normalized, the element  M*(i, j)  gives us   the stable long-run probability of event i  followed by event  j . However, if no stable matrix  M* can be found , the process is non-convergent and unstable.

        Using the running examples from the previous sections, we can briefly outline how to parse the convergence of a Markov process.

        First, we assume a stable transition matrix  M * such that  MM*=M* , and  M  is row-normalized. Now that we know  what M  looks like, we can write matrix multiplication as follows:

        Then we have the following system of linear equations:

        If there exists a solution to this system of equations (which we can check using methods such as Gaussian elimination), then we can also derive a converged and stable transition matrix.

6. Packaging

        Once you get the hang of it, reconstructing time series data using Markov transition matrices can be a useful part of your data science toolkit. Just as you typically visualize time-series data using line charts to better understand overall trends, transition matrices provide a complementary representation of data that is highly condensed but versatile in its use case. Transition matrices can already be used to gain high-level descriptive insights when visualized as directed graphs. When embedded into larger workflows, transition matrices can form the basis for more sophisticated forecasting and backtracking methods. Also, while the simple examples we ran in the above section treated the transition matrix as a static entity, we can derive different matrices for different time intervals; this is especially useful when analyzing time-series data shown by A clear trend reversal reflected in a prominent U-shaped or elbow pattern. Obviously, there are several possible extensions to the ideas discussed above, so go ahead and try them out - they might come in handy in your next data science project.

Reference: Time Series Data and Markov Transition Matrices | Towards Data Science

 

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/132356955