I've found DataCamp, whose cheat sheets comprise a large portion of the above links, to provide very clear explanations and am currently enjoying their Python Data Science courses. Steve Testa. 2 years ago. Options. DataCamp offers interactive R, Python, Sheets, SQL and shell courses. All on topics in data science, statistics and machine learning. Learn from a team of expert teachers in the comfort of your browser with video lessons and fun coding challenges and projects. SQL CHEAT SHEET SQL GROUP BY The GROUP BY clause is usually used with an aggregate function (COUNT, SUM, AVG, MIN, MAX). It groups the rows by a given column value (specified after GROUP BY) then calculates the aggregate for each group and returns that to the screen. SELECT column1, COUNT(column2) FROM tablename GROUP BY column1.
Download this 2-page SQL Window Functions Cheat Sheet in PDF or PNG format, print it out, and stick to your desk.
The SQL Window Functions Cheat Sheet provides you with the syntax of window functions, a list of window functions, and examples. You can download this cheat sheet as follows:
Window Functions
Window functions compute their result based on a sliding window frame, a set of rows that are somehow related to the current row.
Aggregate Functions vs. Window Functions
Unlike aggregate functions, window functions do not collapse rows.
Syntax
Named Window Definition
PARTITION BY
, ORDER BY
, and window frame definition are all optional.
PARTITION BY
PARTITION BY
divides rows into multiple groups, called partitions, to which the window function is applied.
Default Partition: With no PARTITION BY
clause, the entire result set is the partition.
ORDER BY
ORDER BY specifies the order of rows in each partition to which the window function is applied.
Default ORDER BY: With no ORDER BY
clause, the order of rows within each partition is arbitrary.
Window Frame
A window frame is a set of rows that are somehow related to the current row. The window frame is evaluated separately within each partition.
The bounds can be any of the five options:
UNBOUNDED PRECEDINGi
n PRECEDING
CURRENT ROW
n FOLLOWING
UNBOUNDED FOLLOWING
The lower_bound
must be BEFORE the upper_bound
.
As of 2020, GROUPS
is only supported in PostgreSQL 11 and up.
Abbreviations
Abbreviation | Meaning |
---|---|
UNBOUNDED PRECEDING | BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
n PRECEDING | BETWEEN n PRECEDING AND CURRENT ROW |
CURRENT ROW | BETWEEN CURRENT ROW AND CURRENT ROW |
n FOLLOWING | BETWEEN AND CURRENT ROW AND n FOLLOWING |
UNBOUNDED FOLLOWING | BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING |
Default Window Frame
- If
ORDER BY
is specified, then the frame isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. - Without
ORDER BY
, the frame specification isROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
.
Logical Order of Operations in SQL
FROM
,JOIN
WHERE
GROUP BY
- aggregate functions
HAVING
- window functions
SELECT
DISTINCT
UNION
/INTERSECT
/EXCEPT
ORDER BY
OFFSET
LIMIT
/FETCH
/TOP
You can use window functions in SELECT
and ORDER BY
. However, you can't put window functions anywhere in the FROM
, WHERE
, GROUP BY
, or HAVING
clauses.
- Ranking Functions
row_number()
rank()
dense_rank()
- Distribution Functions
percent_rank()
cume_dist()
- Analytic Functions
lead()
lag()
ntile()
first_value()
last_value()
nth_value()
- Aggregate Functions
avg()
count()
max()
min()
sum()
Ranking Functions
row_number()
- unique number for each row within partition, with different numbers for tied valuesrank()
- ranking within partition, with gaps and same ranking for tied valuesdense_rank()
- ranking within partition, with no gaps and same ranking for tied values
ORDER BY and Window Frame:rank()
and dense_rank()
require ORDER BY
, but row_number()
does not require ORDER BY
. Ranking functions do not accept window frame definition (ROWS
, RANGE
, GROUPS
).
Distribution Functions
percent_rank()
- the percentile ranking number of a row—a value in [0, 1] interval: (rank-1) / (total number of rows - 1)cume_dist()
- the cumulative distribution of a value within a group of values, i.e., the number of rows with values less than or equal to the current row’s value divided by the total number of rows; a value in (0, 1] interval
ORDER BY and Window Frame: Distribution functions require ORDER BY
. They do not accept window frame definition (ROWS
, RANGE
, GROUPS
).
Analytic Functions
lead(expr, offset, default)
- the value for the row offset rows after the current; offset and default are optional; default values: offset = 1, default =NULL
lag(expr, offset, default)
- the value for the row offset rows before the current; offset and default are optional; default values: offset = 1, default =NULL
ntile(n)
- divide rows within a partition as equally as possible into n groups, and assign each row its group number.
ORDER BY and Window Frame:ntile()
, lead()
, and lag()
require an ORDER BY
. They do not accept window frame definition (ROWS
, RANGE
, GROUPS
).
first_value(expr)
- the value for the first row within the window framelast_value(expr)
- the value for the last row within the window frame
Note: You usually want to use RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
with last_value()
. With the default window frame for ORDER BY
, RANGE UNBOUNDED PRECEDING
, last_value()
returns the value for the current row.
nth_value(expr, n)
- the value for the n-th row within the window frame; n must be an integer
ORDER BY and Window Frame:first_value()
, last_value()
, and nth_value()
do not require an ORDER BY
. They accept window frame definition (ROWS
, RANGE
, GROUPS
).
Aggregate Functions
avg(expr)
- average value for rows within the window framecount(expr)
- count of values for rows within the window framemax(expr)
- maximum value within the window framemin(expr)
- minimum value within the window framesum(expr)
- sum of values within the window frame
ORDER BY and Window Frame: Aggregate functions do not require an ORDER BY
. They accept window frame definition (ROWS
, RANGE
, GROUPS
).