Using SQL Window Functions
The SELECT SQL keyword can be combined with the OVER keyword to apply window functions into your data.
Using SELECT with OVER
The OVER clause can be used to define a window within a query result set. A window function then computes values for each row in the window. The OVER clause can be used to easily calculate cumulative sums, moving averages and other calculations in an efficient manner.
The syntax is:
SELECT window_function OVER (
[ PARTITION BY value_expression [ , ... n ] ]
[ ORDER BY order_by_expression [ COLLATE collation ] [ ASC | DESC ] [ , ... n ] ]
[ { ROWS | RANGE } <window frame extent> ]
) [ AS Alias] [ , ... n ]
And <window frame extent> is defined as:
<window frame extent> ::= { <window frame preceding> | <window frame between> }
<window frame between> ::= BETWEEN <window frame bound> AND <window frame bound>
<window frame bound> ::= { <window frame preceding> | <window frame following> }
<window frame preceding> ::=
{
UNBOUNDED PRECEDING
| <unsigned integer> PRECEDING
| CURRENT ROW
}
<window frame following> ::=
{
UNBOUNDED FOLLOWING
| <unsigned integer> FOLLOWING
| CURRENT ROW
}
Let’s now examine the arguments of the OVER clause more closely:
- PARTITION BY divides the query result sets into partitions. The window function is applied to each partition separately and computation restarts for each partition.
- ORDER BY defines the logical order of the rows within each partition of the result set and so determines the logical order in which the window function calculation is performed.
- ROWS / RANGE limits the rows within the partition by specifying start and end points within the partition with respect to the current row by either physical (ROWS) or logical (RANGE) association.
The window function can be either an aggregate function, a ranking function or an analytic function.
When OVER is specified with no arguments, the window function is then applied to the entire dataset. For example, the following query calculates the sum of all the units in stock for the entire Products table:
Example:
SELECT
CategoryID,
ProductName,
SUM(UnitsInStock) OVER() AS TotalUnits
FROM
Products
The TotalUnits value is the same for each row because the entire table is considered as a single partition, which means every row computes the same result.
PARTITION BY
Specifies the columns by which the dataset is partitioned. Expressions used in PARTITION BY can only reference columns from the FROM clause and cannot use expressions or aliases from the SELECT list. The following example uses PARTITION BY to split the result set in partitions before computing the window function:
Example:
SELECT
CategoryID,
ProductName,
SUM(UnitsInStock) OVER(PARTITION BY CategoryID) AS TotalUnits
FROM
Products
ORDER BY
ProductName
In this case since the data is partitioned by CategoryID, each row with the same CategoryID has the same result, but different partitions have different results.
ORDER BY
Defines the logical order of the rows within each partition of the dataset. If ROWS / RANGE is not specified then the default ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used as a default window frame by the window functions that can accept optional ROWS / RANGE specification. When a sort direction is not specified, ascending order (ASC) is applied.
The following example calculates the running total of Freight by CustomerID:
Example:
SELECT
CustomerID,
OrderDate,
Freight,
SUM(Freight) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM
Orders
WHERE
CustomerID IS NOT NULL
Since ROWS / RANGE is not specified, the default ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is utilized. This means that for each row, the frame that the window function uses to compute the result starts at the beginning of the partition (UNBOUNDED PRECEDING) and stops at the CURRENT ROW.
ROWS / RANGE
Further limits the window frame over which the window function is computed by specifying start and end points within the partition. This is accomplished by specifying a range of rows with respect to the current row either by logical (RANGE) or physical (ROWS) association.
The ROWS clause limits the rows within a partition by a fixed number of rows preceding and/or following the current row, whereas the RANGE clause logically limits the rows within a partition by specifying a range of values with respect to the value in the current row. Preceding and following rows are defined based on the ordering in the ORDER BY clause and all rows with the same values in the ORDER BY expression as the current row will be considered in the same frame (when using RANGE).
The possible values that can be used to specify a window frame are:
- UNBOUNDED PRECEDING: Specifies that the window frame starts at the first row of the partition. UNBOUNDED PRECEDING can only be used as a window starting point.
- <unsigned integer> PRECEDING: Specifies that the window frame starts at the fixed offset defined by <unsigned integer> before the current row. This specification is not allowed for RANGE.
- CURRENT ROW: Specifies that the window frame starts or ends at the current row, or at the current value when used with RANGE. CURRENT ROW can be used both as a starting and ending point.
- UNBOUNDED FOLLOWING: Specifies that the window frame ends at the last row of the partition. UNBOUNDED FOLLOWING can only be used as a window ending point.
- <unsigned integer> FOLLOWING: Specifies that the window frame ends at the fixed offset defined by <unsigned integer> after the current row. This specification is not allowed for RANGE.
To understand the difference between ROWS and RANGE, consider the following query, which sums the unit price for each product in an order:
Example:
SELECT
o.OrderDate,
o.OrderID,
SUM(UnitPrice) OVER (PARTITION BY o.OrderDate ORDER BY o.OrderID ROWS CURRENT ROW),
SUM(UnitPrice) OVER (PARTITION BY o.OrderDate ORDER BY o.OrderID RANGE CURRENT ROW)
FROM
Orders AS o
INNER JOIN
OrderDetails AS od
ON
o.OrderID = od.OrderID
WHERE
CustomerID IS NOT NULL
The window frame starts at the beginning of the partition (when omitted it defaults to UNBOUNDED PRECEDING) and, when using ROWS, the function is computed until the current physical row. For RANGE, however, all rows with the same values in the ORDER BY expression as the current row will be considered in the same frame and so computed at the same time because they are logically associated with the current physical row.