閱讀(2.2k) 書簽贊(0) 我要糾錯

OceanBase 分析函數(shù)

2021-06-09 16:01 更新

簡介

分析函數(shù)（某些數(shù)據(jù)庫下也叫做窗口函數(shù)）與聚合函數(shù)類似，計算總是基于一組行的集合，不同的是，聚合函數(shù)一組只能返回一行，而分析函數(shù)每組可以返回多行，組內(nèi)每一行都是基于窗口的邏輯計算的結(jié)果。分析函數(shù)可以顯著優(yōu)化需要 self-join 的查詢。

分析函數(shù)語法

“窗口”也稱為 FRAME，OceanBase 數(shù)據(jù)庫同時支持 ROWS 與 RANGE 兩種 FRAME 語義，前者是基于物理行偏移的窗口，后者則是基于邏輯值偏移的窗口。

分析函數(shù)語法如下：

analytic_function:
  analytic_function([ arguments ]) OVER (analytic_clause)

analytic_clause:
  [ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]

query_partition_clause:
  PARTITION BY { expr[, expr ]... | ( expr[, expr ]... ) }

order_by_clause:
  ORDER [ SIBLINGS ] BY{ expr | position | c_alias } [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] [, { expr | position | c_alias } [ ASC | DESC ][ NULLS FIRST | NULLS LAST ]]...

windowing_clause:
  { ROWS | RANGE } { BETWEEN { UNBOUNDED PRECEDING | CURRENT ROW | value_expr {
  PRECEDING | FOLLOWING } } AND{ UNBOUNDED FOLLOWING | CURRENT ROW | value_expr { 
  PRECEDING | FOLLOWING } } | { UNBOUNDED PRECEDING | CURRENT ROW| value_expr 
  PRECEDING}}

SUM/MIN/MAX/COUNT/AVG

聲明

SUM 的語法為：SUM([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

MIN 的語法為：MIN([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

MAX 的語法為：MAX([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

COUNT 的語法為：COUNT({ * | [ DISTINCT | ALL ] expr }) [ OVER (analytic_clause) ]

AVG 的語法為：AVG([ DISTINCT | ALL ] expr) [ OVER(analytic_clause) ]

說明

以上分析函數(shù)都有對應(yīng)的聚合函數(shù)，其中，SUM 返回 expr 的和，MIN/MAX 返回 expr 的最小值/最大值，COUNT 返回窗口中查詢的行數(shù)，AVG 返回 expr 的平均值。

對于 COUNT 函數(shù)，如果指定了 expr，即返回 expr 不為 NULL 的統(tǒng)計個數(shù)，如果指定 COUNT(*) 返回所有行的統(tǒng)計數(shù)目。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.17 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.03 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient>select last_name, sum(salary) over(partition by job_id) totol_s, min(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s, count(*) over(partition by job_id) count_s from exployees;
+-----------+---------+-------+-------+---------+
| last_name | totol_s | min_s | max_s | count_s |
+-----------+---------+-------+-------+---------+
| jim       |    2000 |  2000 |  2000 |       1 |
| mike      |   36000 | 11000 | 13000 |       3 |
| lily      |   36000 | 11000 | 13000 |       3 |
| tom       |   36000 | 11000 | 13000 |       3 |
+-----------+---------+-------+-------+---------+
4 rows in set (0.01 sec)

NTH_VALUE/FIRST_VALUE/LAST_VALUE

聲明

NTH_VALUE 的語法為：NTH_VALUE (measure_expr, n) [ FROM { FIRST | LAST } ] [ { RESPECT | IGNORE } NULLS ] OVER (analytic_clause)

FIRST_VALUE 的語法為：FIRST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)

LAST_VALUE 的語法為：LAST_VALUE { (expr) [ {RESPECT | IGNORE} NULLS ] | (expr [ {RESPECT | IGNORE} NULLS ])} OVER (analytic_clause)

說明

NTH_VALUE 函數(shù)表示第幾個值，方向由 [ FROM { FIRST | LAST } ] 確定，默認為 FROM FIRST，含有是否忽略 NULL 值的標志。其窗口為統(tǒng)一的 analytic_clause。這里 n 應(yīng)該是正數(shù)，如果 n 是 NULL，函數(shù)將返回錯誤；如果 n 大于窗口內(nèi)所有的行數(shù)，此函數(shù)將返回 NULL。

FIRST_VALUE 和 LAST_VALUE 表示從第一個開始計數(shù)或者是從最后一個開始計數(shù)。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> select last_name, first_value(salary) over(partition by job_id) totol_s, last_value(salary) over(partition by job_id) min_s, max(salary) over(partition by job_id) max_s from exployees;
+-----------+---------+-------+-------+
| last_name | totol_s | min_s | max_s |
+-----------+---------+-------+-------+
| jim       |    2000 |  2000 |  2000 |
| mike      |   12000 | 11000 | 13000 |
| lily      |   12000 | 11000 | 13000 |
| tom       |   12000 | 11000 | 13000 |
+-----------+---------+-------+-------+
4 rows in set (0.01 sec)

LEAD/LAG

聲明

LEAD 的語法為：LEAD { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)

LAG 的語法為：LAG { ( value_expr [, offset [, default]]) [ { RESPECT | IGNORE } NULLS ] | ( value_expr [ { RESPECT | IGNORE } NULLS ] [, offset [, default]] )} OVER ([ query_partition_clause ] order_by_clause)

說明

LEAD 和 LAG 含義為可以在一次查詢中取出當前行的同一個字段的前面或后面第 N 行的數(shù)據(jù)，這種操作可以使用相同表的自連接來實現(xiàn)，但 LEAD/LAG 窗口函數(shù)有更高的效率。

其中，value_expr 是要做比對的字段，offset 是 value_expr 的偏移量，default 參數(shù)的默認值為 NULL，即如果在 LEAD/LAG 沒有顯示的設(shè)置 default 值的情況下，返回值為 NULL。例如：對 LAG 來說，當前行為 4，offset 值為 6，這時候所要找的數(shù)據(jù)就是第 -2 行，不存在此行即返回 default 的值。

[ { RESPECT | IGNORE } NULLS ] 的語法為是否考慮 NULL 值，默認為 RESPECT，考慮 NULL 值。

注意 LEAD/LAG 兩個函數(shù)后必須有 order_by_clause，數(shù)據(jù)應(yīng)該在一個列上排序之后才能有前多少行后多少行的概念。query_partition_clause 是可選的，如果沒有 query_partition_clause，就是全局的數(shù)據(jù)。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, lead(salary) over(order by salary) lead, lag(salary) over(order by salary) lag from exployees;
+-----------+-------+-------+
| last_name | lead  | lag   |
+-----------+-------+-------+
| jim       | 11000 |  NULL |
| tom       | 12000 |  2000 |
| mike      | 13000 | 11000 |
| lily      |  NULL | 12000 |
+-----------+-------+-------+
4 rows in set (0.01 sec)

STDDEV/VARIANCE/STDDEV_SAMP/STDDEV_POP

聲明

VARIANCE 的語法為：VARIANCE([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

STDDEV 的語法為：STDDEV([ DISTINCT | ALL ] expr) [ OVER (analytic_clause) ]

STDDEV_SAMP 的語法為：STDDEV_SAMP(expr) [ OVER (analytic_clause) ]

STDDEV_POP 的語法為：STDDEV_POP(expr) [ OVER (analytic_clause) ]

說明

VARIANCE 返回的是 expr 的方差，expr 可能是數(shù)值類型或者可以轉(zhuǎn)換成數(shù)值類型的類型，方差的類型和輸入的值的類型相同。

STDDEV 返回的是 expr 的標準差，參數(shù)類型方面和 VARIANCE 的相同。

STDDEV_SAMP 返回的是樣本標準差。

STDDEV_POP 返回的是總體標準差。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, stddev(salary) over(order by salary) std, variance(salary) over(order by salary) var, stddev_pop(salary) over() std_pop, stddev_samp(salary) over() from exployees;
+-----------+-------------------+--------------------+-------------------+----------------------------+
| last_name | std               | var                | std_pop           | stddev_samp(salary) over() |
+-----------+-------------------+--------------------+-------------------+----------------------------+
| jim       |                 0 |                  0 | 4387.482193696061 |          5066.228051190222 |
| tom       |              4500 |           20250000 | 4387.482193696061 |          5066.228051190222 |
| mike      | 4496.912521077347 | 20222222.222222224 | 4387.482193696061 |          5066.228051190222 |
| lily      | 4387.482193696061 |           19250000 | 4387.482193696061 |          5066.228051190222 |
+-----------+-------------------+--------------------+-------------------+----------------------------+
4 rows in set (0.00 sec)

NTILE

聲明

NTILE(expr) OVER ([ query_partition_clause ] order_by_clause)

說明

NTILE 函數(shù)將分區(qū)中已經(jīng)排序的行劃分為大小盡可能相同的指定數(shù)量的分組，并返回給每行組號。expr 如果是 NULL，則返回 NULL。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, ntile(10) over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl  |
+-----------+------+
| jim       |    1 |
| tom       |    1 |
| mike      |    2 |
| lily      |    3 |
+-----------+------+
4 rows in set (0.01 sec)

ROW_NUMBER

聲明

ROW_NUMBER( ) OVER ([ query_partition_clause ] order_by_clause)

說明

ROW_NUMBER 函數(shù)按照 order_by_clause 子句中指定的行的順序，為每一行分配一個編號。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.08 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.01 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, row_number() over(partition by job_id order by salary) ntl from exployees;
+-----------+------+
| last_name | ntl  |
+-----------+------+
| jim       |    1 |
| tom       |    1 |
| mike      |    2 |
| lily      |    3 |
+-----------+------+
4 rows in set (0.00 sec)

RANK/DENSE_RANK/PERCENT_RANK

聲明

RANK 的語法為：RANK( ) OVER ([ query_partition_clause ] order_by_clause)

DENSE_RANK 的語法為：DENSE_RANK( ) OVER([ query_partition_clause ] order_by_clause)

PERCENT_RANK 的語法為：PERCENT_RANK( ) OVER ([ query_partition_clause ] order_by_clause)

說明

RANK 計算每一行數(shù)據(jù)在某列上的排序，該列由 order_by_clause 中的列決定。例如，按照 salary 排序可以看出員工的收入排名。

DENSE_RANK 的語義基本和 RANK 函數(shù)相同，但是 RANK 的排序中間會有‘跳過’，但是 DENSE_RANK 中不會有。

PERCENT_RANK 的語義基本和 RANK 函數(shù)相同，但是 PERCENT_RANK 排序的結(jié)果是百分比，計算的是給定行的百分比。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, rank() over(partition by job_id order by salary) rank, dense_rank() over(partition by job_id order by salary) dense_rank, percent_rank() over(partition by job_id order by salary) percent_rank from exployees;
+-----------+------+------------+----------------------------------+
| last_name | rank | dense_rank | percent_rank                     |
+-----------+------+------------+----------------------------------+
| jim       |    1 |          1 | 0.000000000000000000000000000000 |
| tom       |    1 |          1 | 0.000000000000000000000000000000 |
| mike      |    2 |          2 | 0.500000000000000000000000000000 |
| lily      |    3 |          3 | 1.000000000000000000000000000000 |
+-----------+------+------------+----------------------------------+
4 rows in set (0.01 sec)

CUME_DIST

聲明

CUME_DIST() OVER ([ query_partition_clause ] order_by_clause)

說明

該函數(shù)計算一個值的分布，返回值為大于 0 小于等于 1 的值。作為一個分析函數(shù)，CUME_DIST 在升序情況下計算比當前行的特定列小的數(shù)據(jù)的占比。例如如下例子中，按 job_id 分組并在薪水排序的情況下，每行數(shù)據(jù)在窗口內(nèi)的排序列上的占比。

例子

obclient> create table exployees(last_name char(10), salary decimal, job_id char(32));
Query OK, 0 rows affected (0.10 sec)

obclient> insert into exployees values('jim', 2000, 'cleaner');
Query OK, 1 row affected (0.11 sec)

obclient> insert into exployees values('mike', 12000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('lily', 13000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> insert into exployees values('tom', 11000, 'engineering');
Query OK, 1 row affected (0.00 sec)

obclient> select last_name, cume_dist() over(partition by job_id order by salary) cume_dist from exployees;
+-----------+----------------------------------+
| last_name | cume_dist                        |
+-----------+----------------------------------+
| jim       | 1.000000000000000000000000000000 |
| tom       | 0.333333333333333333333333333333 |
| mike      | 0.666666666666666666666666666667 |
| lily      | 1.000000000000000000000000000000 |
+-----------+----------------------------------+
4 rows in set (0.01 sec)

以上內(nèi)容是否對您有幫助：

← OceanBase 聚合函數(shù)

OceanBase 信息函數(shù) →

寫筆記

我要補充