[Data Mining] Time Series Tutorial [2]

2.4 Example: Particulate Matter Concentration

        In this chapter, we will use some air pollution data from the US Environmental Protection Agency as a running sample. The dataset consists of particles with an aerodynamic diameter less than or equal to 3.2017 \(mu\)g/m\(^2018\) in 2 and 5 years.

        We will specifically focus on data from two specific monitors, one in Fresno, CA, and the other in St. Louis, MO.

        Here's what the data looks like plotted over time.

        First, try to describe these time series, given the (very little) we know about them:

  • They are a daily time series of air pollution levels in two US cities in 2017-2018.

  • The overall average for the Fresno, CA series appears to be higher than the St. Louis, Missouri series.

  • The Fresno series seems to be more "sharp" than the St. Louis series.

  • In Fresno, there appears to be a seasonal trend; levels rise steadily from January to July, then vary more and drop more after that; generally, they are higher in summer and lower in winter.

  • The St.Louis series appears to be very stable at year-round levels; there aren't any strong upward or downward trends and don't appear to exhibit strong seasonality

        Of course, we have the data, so we can try to test some of these claims.

        Population mean and variance values ​​for each city.

# A tibble: 2 x 3
  city           mean variance
  <chr>         <dbl>    <dbl>
1 Fresno, CA    10.6      74.5
2 St. Louis, MO  9.07     17.0

We can use quarterly averages to examine seasonal trends.

# A tibble: 8 x 4
# Groups:   city [2]
  city          season  mean    sd
  <chr>          <int> <dbl> <dbl>
1 Fresno, CA         1  6.81  7.69
2 Fresno, CA         2  7.73  3.09
3 Fresno, CA         3 15.1  10.1 
4 Fresno, CA         4 12.5   9.06
5 St. Louis, MO      1  9.16  4.25
6 St. Louis, MO      2  8.47  3.56
7 St. Louis, MO      3 10.3   4.19
8 St. Louis, MO      4  8.52  4.26

From here we can see the average growth in Fresno until the third quarter and then a slight decline. The St. Louis average actually fell slightly in the second quarter before picking up in the third. Note that this column shows the standard deviation of the data, not the standard deviation of the mean.sd

Another (arguably better) way we could represent the above table is as an overall mean and deviation.

# A tibble: 8 x 4
# Groups:   city [2]
  city          season overall     dev
  <chr>          <int>   <dbl>   <dbl>
1 Fresno, CA         1   10.6  -3.76  
2 Fresno, CA         2   10.6  -2.84  
3 Fresno, CA         3   10.6   4.53  
4 Fresno, CA         4   10.6   1.94  
5 St. Louis, MO      1    9.07  0.0912
6 St. Louis, MO      2    9.07 -0.599 
7 St. Louis, MO      3    9.07  1.18  
8 St. Louis, MO      4    9.07 -0.546 

Here, it's clear which seasons are "below average" and which are above average.

So far we have characterized the above data in terms of:

  • Linear trend (increase and decrease) over time

  • seasonality, annual period over time

  • Overall level over time (average)

  • Changes over time (spiciness)

These four characteristics may seem simple and basic, but they are key building blocks for understanding the structure of many time series.

2.5 Trend-season-residual decomposition

A commonly used exploratory tool is to decompose the time series into

  1. smooth long-term trend

  2. seasonal changes

  3. residual change

The main benefit of looking at long-term trends and seasonal changes is that they are highly interpretable and very general.

Examining a time series for trend, seasonality, and residuals is the start of many critical tasks in time series analysis, known as timescale analysis .

2.6 Example: Filter Donation Spending Rules

University endowments typically face two conflicting goals. On the one hand, they are expected to last forever in order to support future generations of students. But, on the other hand, they are expected to provide support to students in school. The former goal suggests focusing on riskier investments that promise higher long-term returns, while the latter goal suggests focusing on conservative investments that provide steady income (but may be outpaced by inflation in the long run).

        A typical university endowment has a target spend rate, which is approximately the percentage of the total endowment that is drawn from the endowment and transferred to the university's operating budget. The target interest rate is usually between 4%-6% of the value of the donation. The two extreme strategies are

  1. Accurately spend the target percentage of the endowment fund each year (for example, 4%). This strategy clearly achieves the target rate in full, but exposes the university's operating budget to potentially wild year-to-year volatility in the stock and bond markets. Such an endowment spending strategy would make budget planning difficult since the university cannot reasonably slash and raise staff salaries each year.

  2. A fixed amount is spent each year (adjusted for inflation), regardless of the value of the donation. This strategy brings stability to spending plans, but completely ignores potential gains or losses in the endowment's market value. If endowments grow, universities may miss out on opportunities to invest in new and exciting areas. If the endowment declines (such as the sharp decline during the 2008-2009 U.S. recession), universities may overspend and damage the long-term health of the endowment.

One strategy employed by         some universities (most notably the Yale Endowment ) is the complementary filter, sometimes called exponential smoothing. The idea here is to take a weighted average of the current and past observations and incorporate them as new data comes in.

        Let (y_1,y_2 \dots,y_{t-1} )is the market value of the annual endowment as of year (t-1), and let \ be the (x_1,x_2,\dots,x_{t-1},)annual amount spent by the university from the endowment to year (t-1). The goal is to determine how much the endowment is spent at time t, i.e. what is our estimate of (x_t)?

        Complementary filtering methods basically take a weighted average between the two extreme methods. Using the fixed spending rule, our forecast for the year \(t\) is x_t^{t-1} = (1+\alpha) x_{t-1} , where \alphais the inflation rate, so for an inflation rate of 2%, \alpha = 0.02 .  How should we update our estimates once we observe endowment market values y_t​​(except )? y_{t-1},y_{t-2}\dotsIf \beta there is a target spending rate (for example, a 4% spending rule \beta=0.04 ), then a strict spending rule will cause the university to spend \beta y_t   in year t.

        Complementary filters work as follows. Given a tuning parameter \lambda\in( 0, 1)  ,

                x_t = x_t^{t-1} + (1 - \lambda)( \beta y_t - x_t^{t-1} )  

y_tThis approach incorporates         new data  as they become available, but also pulls these data towards the overall stability of the fixed spending rules. Obviously, if \lambda=0 , we just follow the strict payout rule, and if \lambda=1, we follow the fixed payout rule. Thus, complementary filters allow both extreme spending rules and many spending rules in between.

        The chart below shows how much the Yale Endowment would spend if they adhered to the (assumed) strict spending rule of spending 4% of market value per year. Endowment market value data is taken from the Yale Endowment Annual Report .

     While there appeared to be a general increase over the 18-year period, there was a sharp decline in spending between 2008 and 2009, during which time there was a deep recession. This loss is about $26 billion, which is a huge annual change even for a large university like Yale. On the other hand, from 2006 to 2007, there was a substantial increase of about $18 billion. While this increase in spending may seem beneficial, it is difficult to responsibly budget such a large increase in one year.

        The diagram above shows spending patterns based on a strict 4% payout rule ("Strict"), a fixed payout rule linked to inflation ("Fixed"), and a filtered payout rule using a supplemental \lambda = 0.8   filter ("Filter") . Clearly, the filter rule underspends relative to the endowment's annual market value, but tracks it much closer than the fixed rule, and removes most of the variation in market value.

        The application of complementary filters here is sometimes referred to as "sensor fusion" in engineering applications. The general idea is that we wish to "fuse" the two types of measurements. One comes from the fixed payout rule, which is very smooth and predictable, but does not adapt to market conditions. The other spends strictly on market value, but is very noisy every year. A strict spending rule is in a sense less biased (because it targets the spending rule directly), but noisier. Fixed spending rules are largely noiseless, but become very biased over time.

        In general, complementary filters allow us to take advantage of the unbiasedness of noisy measurements and the smoothness of biased measurements. A generalization of this idea will be formulated as a Kalman filter, described in a later section. Another benefit of the complementary filter is that it is very simple and does not require any complex calculations other than multiplication and addition. While this is not a problem when the calculations can be done on a powerful computer, it is very important in embedded applications where the algorithms are often run on very small or low capacity computers (such as Arduino or Raspberry Pi) realized.

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/131469949