SELECT
Purpose
Use this statement to retrieve data from tables or views.
Prerequisites
-
You need either the system privilege
USE ANY SCHEMA
or the object privilegeUSAGE
on the target schema, or the schema must be owned by you or one of your assigned roles. - You need either the system privilege
SELECT ANY TABLE
or theSELECT
object privilege on tables or views which are referenced in theSELECT
list, or the tables or views must be owned by you or one of your assigned roles. - When accessing views, the owner of the view must have
SELECT
privileges on the objects that are referenced in the view. - If you use a subimport, you need the appropriate rights similar to the IMPORT statement.
Syntax
subquery::=
with_clause::=
select_list::=
from_item::=
connect_by_clause::=
preferring_clause::=
preference_term::=
group_by_clause::=
cube_rollup_clause::=
grouping_sets_clause::=
grouping_expression_list::=
order_by_clause::=
limit_clause::=
join_clause::=
inner_outer_clause::=
subimport::=
values_table::=
value_range_table::=
Usage notes
-
You can calculate scalar expressions by omitting the
FROM
clause. For example:SELECT 'abc'
. -
By using the
WITH
clause you can define common table expressions (CTEs). These are temporary views that are valid only during the execution of a subquery. -
By using
DISTINCT
, identical rows will be eliminated. If you use the keywordALL
(default), all rows will be present in the result table. -
Source tables and views are defined in the
FROM
clause. Through thevalues_table
, you can easily define static tables. For example, by(VALUES (1, TRUE), (2, FALSE), (3, NULL)) AS t(i, b)
a table with two columns and three rows is specified. -
The
SELECT
list defines the columns of the result table. If*
is used, then all columns will be listed.t.*
can be used to list all columns of the table, the viewt
, or the object with the table aliast
. -
In complex expressions within the
SELECT
list, the usage of column aliases can be very useful. You can only reference such aliases directly inside theORDER BY
clause and theQUALIFY
filter condition. However, you can also reference those aliases indirectly through the keywordLOCAL
within the other clauses (WHERE, GROUP BY, HAVING
), and even in theSELECT
list.
An example of indirect referencing:SELECT ABS(x) AS x FROM t WHERE local.x>10
.
Moreover, column aliases define the column names of the result table. -
The
WHERE
clause can be used to restrict the result by certain filter conditions. -
Equality join conditions among tables can be specified within the
WHERE
clause by using the = operator. If you want to define an outer condition, add(+)
after the outer-expression. -
The
CONNECT BY
clause can be used to define hierarchical conditions. It is evaluated before theWHERE
conditions - except the join conditions on tables which are referenced in theCONNECT BY
clause.
The following elements are relevant for theCONNECT BY
clause: -
The
PREFERRING
clause defines a Skyline preference term. -
The
GROUP BY
clause defines groups of rows which will be aggregated in the result table. Inside theSELECT
list, you can use aggregate functions. Using a numerical valuex (position)
results in aggregating the result table by the x-th column. For example,GROUP BY
1
is equivalent toGROUP BY
the first column in theSELECT
list. IfGROUP BY
is used, allSELECT
list elements have to be aggregated except those which define the grouping keys. -
CUBE
,ROLLUP
, andGROUPING SETS
are extensions of theGROUP BY
clause for calculating superaggregates. Those are hierarchical aggregation levels like, for example, partial sums for days, months and years and can be computed within one singleGROUP BY
statement instead of using aUNION
of several temporary results. -
You can distinguish between regular result rows (normal aggregation on the deepest level) and superaggregate rows. The total of all arguments results in the normal aggregation, the subsets result in the corresponding superaggregates. You can discern the result row types by using the function GROUPING[_ID].
-
By using the
HAVING
clause you can restrict the number of groups. -
QUALIFY
can be used to restrict the result by certain filter conditions in a way similar toWHERE
orHAVING
clauses, but its filter condition may contain or reference Analytic Functions. -
If the condition does not contain an analytic function, it is preferable to use
WHERE
orHAVING
clauses instead ofQUALIFY
, because the filter forQUALIFY
is evaluated later. -
The result table can be sorted by specifying the
ORDER BY
clause. Using a numerical value x (position) results in sorting the result table by the x-th column, for example,ORDER BY
1
.Options of the
ORDER BY
clause:ASC
(Default) means ascending,DESC
means descending sorting.- The
NULLS LAST
(Default) andNULLS FIRST
option can be used to determine whetherNULL
values are sorted at the end or the beginning.
String data is sorted by its binary representation.
-
The number of result rows can be restricted by defining the
LIMIT
clause. The optional offset can only be used in combination withORDER BY
.LIMIT
is not allowed within correlated subqueries.LIMIT
withOFFSET
is not allowed in aggregatedSELECT
statements withoutGROUP BY
. -
By using the
subimport
clause, you can integrate the import of external data sources directly in your query.-
For information about the usage of external data sources and their options, refer to the description of the IMPORT statement in Section Manipulation of database (DML) section. It is highly recommended to explicitly specify the target column types (see example below). Otherwise, the column names and data types are chosen in a generic way. For importing files, these types are mandatory.
-
Local files cannot be imported directly within queries.
-
By creating a view, external data sources can be integrated with Exasol as a sort of external tables.
-
Local filter conditions on such imports are not propagated to the source databases. However, you can achieve that by using the
STATEMENT
option.
-
-
SELECT
statements can be combined using the Table Operators - UNION [ALL], INTERSECT, MINUS. -
The
SELECT
queries can be directly returned from the query cache in case the syntactically equivalent query was already executed before. For more information, refer to the command ALTER SYSTEM. -
The arguments
min_value
andmax_value
in theVALUES BETWEEN
clause must be constant expressions that evaluate to integers. The argumentstep_value
must be a constant expression that evaluates to a positive integer.
The value range table has a single column with the nameRANGE_VALUE
and an integer type (DECIMAL(9,0)
,DECIMAL(18,0)
, orDECIMAL(36,0)
as appropriate). The column contains all the values betweenmin_value
andmax_value
combined if no step is given. If a step is given, the column skips values and produces the arithmetic seriesmin_value + step_value * n
with the last value less than or equal tomax_value
. See also Example 7.VALUES BETWEEN
is significantly faster than usingCONNECT BY
to generate a range of values. -
The
WINDOW
clause allows the definition of named window specifications (i.e. the content ofOVER
clauses) for analytic functions. For more information, refer to Analytic Functions.
Elements | Meaning |
---|---|
START WITH
|
By this condition you can specify the set of root nodes in the graph. |
condition(s)
|
You can define several conditions. A hierarchical connection between father and son rows can be defined by the keyword PRIOR (for example, PRIOR employee_id = manager_id ). If you don't specify such a PRIOR condition, the cross product will be computed. Thus, the statement SELECT LEVEL FROM dual CONNECT BY LEVEL<=100 results in a table with 100 rows, because 100 cross products were calculated for the table dual. |
NOCYCLE
|
If you specify this option, the query also returns results if there exists a cycle. In this case the expansion will be terminated when a cycle is detected. |
The following functions and operators can be used in the SELECT
list and the WHERE
conditions to qualify the results of a hierarchical query.
Functions and operators | Meaning |
---|---|
SYS_CONNECT_BY_PATH (expr,
char)
|
Returns a string containing the full path from the root node to the current
node, containing the values for expr and separated by char . |
LEVEL
|
Returns the hierarchy level of a row, that is, 1 for the root node, 2 for its direct sons, and so on. IMPORTANT: Cycle detection is disabled if |
PRIOR
|
This operator references the row’s parent row, which allows you to define a parent-child condition. You can also use this operator to access the values
of the parent row. Hence, the following two
|
CONNECT_BY_ROOT
|
Instead of a row’s value, the corresponding value of the root node is used. For example, |
CONNECT_BY_ISLEAF
|
This expression returns 1 if a row is a leaf in the tree (the row has no
children), otherwise 0 . |
CONNECT_BY_ISCYCLE
|
Returns whether the current row causes a cycle. In the path, such a row will occur exactly twice. This expression can only be used in combination with the NOTE: This behavior is different to Oracle’s implementation, where the parent node is returned instead of the connection point of the cycle. |
The function LEVEL
and the operator PRIOR
can also be used within the CONNECT
BY
clause.
Grouping | Meaning |
---|---|
CUBE
|
Calculates the aggregation levels for all possible combinations of the arguments (2n combinations). Example: By |
ROLLUP
|
Calculates the aggregation levels for the first n, n-1, n-2, ... 0 arguments (overall n+1 combinations). The last level corresponds to the grand total. Example: By This expression is equivalent to |
GROUPING SETS
|
Calculates the aggregation levels for the specified combinations. CUBE and ROLLUP are special forms of GROUPING SETS and simplify the notation. |
()
|
Is similar to GROUPING SETS () and aggregates the whole table as one single group. |
If multiple hierarchical groupings are specified, separated by a comma, then the result is the set of all combinations of partial groupings (cross product). For example, the expression ROLLUP(a,b),ROLLUP(x,y)
results in 9 combinations. Starting with the subsets (a,b), (a), () and (x,y), (x), () you get the following combinations: (a,b,x,y), (a,b,x), (a,b), (a,x,y), (a,x), (a), (x,y), (x),().
Examples
The examples relate to the following two tables (customers and sales):
Customers table
C-ID | NAME |
---|---|
1 | smith |
2 | jackson |
Sales table
S-ID | C_ID | PRICE | STORE |
---|---|---|---|
1 | 1 | 199.99 | MUNICH |
2 | 1 | 9.99 | TOKYO |
3 | 2 | 50.00 | MUNICH |
4 | 1 | 643.59 | TOKYO |
5 | 2 | 2.99 | MUNICH |
6 | 2 | 1516.78 | NEW YORK |
Example 1
STORE | VOLUME |
---|---|
TOKYO | 653.58 |
NEW YORK | 1516.78 |
MUNICH | 252.98 |
Example 2
NAME | VOLUME |
---|---|
jackson | 1569.77 |
smith | 853.57 |
Example 3
WITH tmp_view AS (SELECT name, price, store FROM customers, sales
WHERE customers.c_id=sales.c_id)
SELECT sum(price) AS volume, name, store FROM tmp_view
GROUP BY GROUPING SETS (name,store,());
VOLUME | NAME | STORE |
---|---|---|
1569.77 | jackson | |
853.57 | smith | |
653.58 | TOKYO | |
252.98 | MUNICH | |
1516.78 | NEW YORK | |
2423.34 |
Example 4
SELECT last_name, employee_id id, manager_id mgr_id,
CONNECT_BY_ISLEAF leaf, LEVEL,
LPAD(' ', 2*LEVEL-1)||SYS_CONNECT_BY_PATH(last_name, '/') "PATH"
FROM employees
CONNECT BY PRIOR employee_id = manager_id
START WITH last_name = 'Clark'
ORDER BY employee_id;
LAST_NAME | ID | MGR_ID | LEAF | LEVEL | PATH |
---|---|---|---|---|---|
Clark | 10 | 9 | 0 | 1 | /Clark |
Sandler | 11 | 10 | 1 | 2 | /Clark/Sandler |
Smith | 12 | 10 | 0 | 2 | /Clark/Smith |
Jackson | 13 | 10 | 0 | 2 | /Clark/Jackson |
Taylor | 14 | 10 | 1 | 2 | /Clark/Taylor |
Brown | 15 | 12 | 1 | 3 | /Clark/Smith/Brown |
Jones | 16 | 12 | 1 | 3 | /Clark/Smith/Jones |
Popp | 17 | 12 | 1 | 3 | /Clark/Smith/Popp |
Williams | 18 | 13 | 1 | 3 | /Clark/Jackson/Williams |
Johnson | 19 | 13 | 1 | 3 | /Clark/Jackson/Johnson |
Example 5
-- Get the largest sale per customer
SELECT S_ID, C_ID, PRICE, ROW_NUMBER() OVER (PARTITION BY C_ID ORDER BY PRICE DESC) NUM FROM SALES QUALIFY NUM = 1;
S-ID | C_ID | PRICE | NUM |
---|---|---|---|
4 | 1 | 643.59 | 1 |
6 | 2 | 1516.78 | 1 |
Example 6
-- It is also possible to specify the analytic function call directly in the qualify expression
SELECT S_ID, C_ID, PRICE FROM SALES QUALIFY ROW_NUMBER() OVER (PARTITION BY C_ID ORDER BY PRICE DESC) = 1;
S-ID | C_ID | PRICE |
---|---|---|
4 | 1 | 643.59 |
6 | 2 | 1516.78 |
Example 7
RANGE_VALUE |
---|
1 |
5 |
9 |
13 |
Example 8
SELECT v,
DATE'2020-10-26' + v * INTERVAL'7'DAY AS late_2020_mondays,
5 * v AS five_times_table
FROM VALUES BETWEEN 1 AND 9 AS v(v);
V | LATE_2020_MONDAYS | FIVE_TIMES_TABLE |
---|---|---|
1 | 2020-11-02 | 5 |
2 |
2020-11-09 | 10 |
3 | 2020-11-16 | 15 |
4 | 2020-11-23 | 20 |
5 | 2020-11-30 | 25 |
6 | 2020-12-07 | 30 |
7 | 2020-12-14 | 35 |
8 | 2020-12-21 | 40 |
9 | 2020-12-28 | 45 |