《Pro Oracle SQL》Chapter 8 -- 8.5 Lead and Lag

Lead and Lag         (page 249)
    Lag  and  lead  functions provide inter-row referencing ability. Lag  provides the ability to access prior row in the result set. The  lead function allows access to later row in the result set.
    Lag和lead 函数提供了行间引用的能力。Lag提供访问结果集中(当前行)之前行的能力。lead函数则允许访问结果集中(当前行)之后行。

    In retail industry, same-store sales is a metric calculated to measure an outlet’s performance, usually
sales data compared to the same quarter last year.
With normalized data model, this metric calculation
would require accessing another row as the Sale column values for current and prior years are stored in
different rows. Using the powerful inter-row referencing ability of lead and lag  functions, this metric can
be calculated with ease.
    在零售业,同店销售额是一计算出来的度量标准,用于度量一商店的(销售)能力,通常和上一年同季的数据相比较。 使用规范的数据模型,该标准的计算需要访问其他行,因为当前年和之前年的Sale列的值存储于不同的行。使用lead和lag函数的强大的行间引用能力,能够轻松的计算这个标准。
    Another example is percentage increase or decrease calculations requiring access to the prior or
following row. This calculation can be optimally written using  lead and lag functions, too.
    另一个例子是百分比增加或减少的计算,需要访问之前或之后行。使用lead和lag函数也能优化计算。
Syntax and Ordering     句法和排序
    As discussed earlier, data in analytic SQL is partitioned on a partitioning column. Fetching a prior row is
a position-dependant operation, and the order of the rows in a data partition is important in
maintaining logical consistency.
Within a data partition, rows are sorted with an order by clause to
control the position of a row in the result set. Syntax for the lag function is:
     正如之前讨论的,在分析SQL中的数据按分区列分区。取一之前的行是依赖位置的操作,且行在数据分区中的顺序对保持逻辑一致而言是重要的。 在一数据分区内,行依照order by子句排序来控制行在结果集中的位置。lag函数的句法是:
    lag (expression, offset, default ) over (partition-clause order-by-clause)
 
    Lead and lag functions do not support windowing clause. Only partition-by and order by clauses
are supported with these two functions.
    Lead 和 lag函数都不支持windowing(开窗)子句。这两个函数只支持partition-by 和order by子句。

Example 1: Returning a Value from Prior Row     例子1:从之前行返回值
    Let’s say that you need to fetch the sales quantity for the current week and prior week in the same row. Your requirement indicates an inter-row reference, and this in turn necessitates a need for a self-join in a non-analytic SQL statement. However, the lag function provides this inter-row reference without a self-join. 
    我们说你需要取出当前周和同一行之前周的销售量。你需要指定一行间引用,在非分析语句中必然 需要使用自连接。然而,lag函数提供行间引用而不要自连接。
    Listing 8-5 uses lag(sale,1,sale)  to retrieve the Sale column value from one row prior in the result
set. The clause  order by year, week specifies the column sort order in each data partition. Since the
rows are ordered by the columns Year and Week, the function lag(sale,1,sale) is retrieving the sale
column value from the prior row, which is the Sale column value from the prior week (assuming no gaps
in the week Column). For example, refer to the row where Year=1998 and week=3. For that row, the lag
function is retrieving the Sale column value from the prior row where Year=1998 and week=2. Notice
that analytic function does not specify the partitioning column in the clause  lag(sale,1,sale). It is
implicitly referring to the current partition.

    列表8-5使用lag(sale,1,sale)从结果集中的前一行检索Sale列值。子句order by year, week 指定在每个数据分区中的列排序顺序。由于行按照Year 和 Week排序,函数lag(sale,1,sale)检索之前行的sale列值,也就是之前周的Sale列值(假设在week列中没有间隙)。例如,对于 Year=1998 和week=3的那行而言,lag函数检索的就是之前的Year=1998 和week=2行的Sale列值。注意分析函数没有指定在子句 lag(sale,1,sale)中的分区列。它隐式的引用当前行。
Listing 8-5. Lag Function
col product format A30
col country format A10
col region format A10
col year format 9999
col week format 99
col sale format 999999.99
col receipts format 999999.99
set lines 120 pages 100
1 select  year, week,sale, 
2    lag(sale,1,sale) over(
3          partition by product, country, region 
4          order by year, week
5     ) prior_wk_sales
6  from sales_fact
7  where country in ('Australia')  and product ='Xtend Memory'
8  order by product, country,year, week
9 /
 
YEAR WEEK       SALE PRIOR_WK_SALES
---- ---- ---------- --------------
1998    1      58.15          58.15
1998    2      29.39          58.15
1998    3      29.49          29.39
...
1998   52      86.38          58.32
1999    1      53.52          86.38
1999    3      94.60          53.52
 
    The third argument in the lag  function specifies a default value and it is optional. If the analytic
function refers to a non-existent row, then a null is returned. That’s the default behavior, which you can
modify by specifying some other return value in the third argument.
For example, consider the row with
Year=1998 and Week=1. That is the first row in its data partition. In that row’s case, the lag function is
accessing a non-existing prior row.  Because the third argument to lag is Sale, the lag  function will return
the current row’s Sale value when the referenced row does not exist.
    lag函数的第三个参数指定一默认值且它是可选的。如果分析函数引用了不存在的行,将会返回NULL值。这是默认的行为,你能通过在第三个参数设置其他的返回值修改它。 例如,考虑行Year=1998 和Week=1。这是它的数据分区的第一行。对那行而言,lag函数访问一不存在的之前行。因为lag的第三个参数是Sale,当引用的行不存在则lag 函数将返回当前行的Sale值。

Understanding that Offset is in Rows    理解行间的偏移
    It is possible to access any row within a data partition by specifying a different offset. In Listing 8-6, the lag function is using an offset of 10 to access prior tenth row. Output also shows that at row with
Year=2001 and Week=52, the lag function is accessing the tenth prior row in the result set, which is for
the week=40. Notice that  Lag (sale,10,sale)  is  not accessing the week=42 by subtracting 10 from the
current week column value of 52; rather, this clause is accessing tenth prior row in the partition. In this
case, the tenth prior row is the row with a Week column value equal to 40. 
    通过指定不同的偏移值就可能访问数据分区中的任意行。在列表8-6中,lag函数使用偏移值10访问前第10行。输出对于Year=2001 和 Week=52的那一行,lag函数访问的是结果集中的前第十行,也就是week=40。注意Lag (sale,10,sale)  没有访问week=42,简单的通过week列值52减去10得到;而是,这条子句访问的分区中的前第10行。这样,前第10行就是Week列值=40的那行。
    This issue is tricky , as usually data gaps are not detected in the development environment. But in
the production environment, this problem manifests itself as a bug. If there are gaps in the data, as in
this example, you have a few options: populate dummy values for the missing rows or use the model
clause discussed in Chapter 9.        

    这个问题有些诡异 ,因为通常数据的间断在开发环境中是不会检测到的。但是在生产环境中,这个问题证明它自身就是一个bug。如果数据的间断,如本例中那样,你有几种选择:对缺失的行填充占位值或使用第九章中讨论的model子句。

Listing 8-6. Lag Function with Offset of 10
1 select  year, week,sale, 
2    lag(sale,10,sale) over(
3          partition by product, country, region 
4          order by year, week
5     ) prior_wk_sales_10
6  from sales_fact
7  where country in ('Australia')  and product ='Xtend Memory'
8  order by product, country,year, week
9 /
 
 YEAR WEEK       SALE PRIOR_WK_SALES_10
----- ---- ---------- -----------------
2001   38     139.00            139.28
2001   39     115.57             94.48
2001   40      45.18            116.85
2001   41      67.19            162.91
...
2001   49      45.26             93.16
2001   50      23.14               139
2001   51     114.82            115.57
2001   52      23.14             45.18

Example 2: Returning a Value from an Upcoming Row     访问一之后行的值
    The  lead function is similar to the  lag function, except that the  lead function accesses later rows in the ordered result set. For example, in the Listing 8-7, the clause  lead(sale, 1,sale) is accessing a later row in the ordered result set. 
    lead函数类似于lag函数,只不过lead函数访问的是排序后的结果集的(当前行的)之后的行。 例如,在列表8-7中,子句lead(sale,1,sale)访问排序结果集中(当前行)的下一行。
Listing 8-7. Lead Function 
  1   select  year, week,sale,
  2      lead(sale, 1,sale) over(
  3            partition by product, country, region
  4            order by year, week
  5       ) prior_wk_sales
  6    from sales_fact
  7    where country in ('Australia')  and product ='Xtend Memory'
  8*   order by product, country,year, week
 
YEAR WEEK       SALE PRIOR_WK_SALES
---- ---- ---------- --------------
2000   31      44.78         134.11
2000   33     134.11         178.52
2000   34     178.52          78.82
2000   35      78.82         118.41
...

    The partition-by clause can be used to specify different partition boundaries and the order-by clause can be used to alter the sorting order within a partition. With effective choice of partitioning and order by columns, any row in a result set can be accessed.
    partition-by子句用于指定不同的分区边界而order-by子句用于改变在分区内的排序顺序。通过有效的选择分区和排序列,在结果集中的任何行均可访问到。

猜你喜欢

转载自caohong286.iteye.com/blog/1412143