Basic language elements

Comments in SQL

Comments are primarily used to make SQL scripts clear and maintainable, and can be added anywhere in an SQL script.

Line comments begin with the characters -- (two hyphens), indicating that the remaining part of the current line is a comment.

Block comments are delimited by /* and */, and can span across several lines. All characters between the delimiters are ignored. Characters after the closing delimiter will be interpreted as code.

Examples:
Copy
-- This is a comment

SELECT * FROM dual;

-- Display the number of rows of table dual
SELECT count(*) FROM dual;

SELECT avg(salary) -- Average salary
FROM staff         -- From table staff
WHERE age>50;      -- Limitation to over 50 years old

/* Any number of comment rows can be
inserted between the delimiters
*/
SELECT * FROM dual;

SELECT /*This section is not evaluated!*/ * FROM dual;

SQL identifier

Each database object has a unique name. Within a SQL statement, object names are referenced as SQL identifiers. Database objects include schemas, tables, views, columns, functions, users, and roles. In Exasol, SQL identifiers can have a maximum length of 128 characters. Identifiers can be regular identifiers (unquoted) or delimited identifiers (quoted).

Regular identifiers

Regular identifiers are unquoted.

As defined in the SQL standard, Exasol only allows a subset of the Unicode character set in regular identifiers. The first character of an SQL identifier is more restricted than the following characters.

First character:
All letters of Unicode classes Lu (uppercase letters), LI (lowercase letters), Lt (title case letters), Lm (modifier letters), Lo (other letters), and Nl (letter numbers).
Following characters:
All classes allowed for the first character, plus all letters of Unicode classes Mn (non-spacing marks), Mc (spacing combination marks), Nd (decimal numbers), Pc (connectors punctuations), Cf (formatting codes), and the Unicode character U+00B7 (middle dot).

For details about the characters included in the different Unicode classes, refer to the Unicode standard.

When using the simple ASCII character set, a regular identifier may start with letters of the set {a-z, A-Z} and may further contain letters of set {a-z, A-Z, 0-9,_}.

A reserved word cannot be used as a regular identifier. For more information, see Reserved words.

The database converts regular identifiers to UPPERCASE when storing them in the database.

Examples:
Identifier Name in the database
ABC ABC
aBc ABC
a123 A123

Delimited identifiers

Delimited identifiers are quoted, which here means that they are delimited by double quotation marks ". All characters except a dot (.) are allowed in a delimited identifier. In the case of users and roles you are allowed to use a dot, for example, in email addresses. This is because users and roles do not conflict with schema-qualified names. For more details, see Schema qualified names.

To use double quotation marks within a name you must include two successive double quotation marks. For example, "ab""c" indicates the name ab"c.

Except in users and roles, delimited identifiers are case-sensitive when stored in the database.

Examples:
Identifier Name in the database
"ABC" ABC
"abc" abc
"_X_" _X_
"ab""c" ab"c

Comparison of identifiers

Exasol preserves case when storing identifiers, but the system value SQL_IDENTIFIER_COMPARISON controls how identifiers are compared.

If SQL_IDENTIFIER_COMPARISON is set to CASE SENSITIVE then comparison is case sensitive, which means that the identifiers "Exasol" and "EXASOL" are different. This is the behavior defined by the SQL Standard and the default behavior in Exasol.

If SQL_IDENTIFIER_COMPARISON is set to IGNORE CASE then comparison ignores case, which means that "Exasol" and "EXASOL" match. If more than one object matches (because they differ only in case and were created when the system value was CASE SENSITIVE) then it will raise the error the matching objects … are ambiguous because of case-insensitive comparison.

In statements where a name with a different case matched and the result depends on the name chosen, Exasol will use the most recent name. For example:

  • For CREATE OR REPLACE statements, the name used in the statement will replace that of the original object. For example, if there is a view called "V" and the user does CREATE OR REPLACE VIEW "v" AS ...; then the new view will be called "v".

  • When a select list element is a column reference, the name in the result set will be that of the column reference, not the name of the original column in the FROM clause. For example, SELECT "One" FROM VALUES (1) AS "ONE"; will give the result column "One".

SQL_IDENTIFIER_COMPARISON does not affect the following object types:

Passwords and consumer group names

These remain case-sensitive regardless of the system value.

User and role names

These remain case-insensitive regardless of the system value.

Considerations when changing the comparison method

Changing the value of SQL_IDENTIFIER_COMPARISON at a later time may cause named objects such as views and scripts to stop working. To avoid this, observe the following recommendations:

  • When the value is set to CASE SENSITIVE, do not create objects whose names differ only in case.

  • When the value is set to IGNORE CASE, always use the exact name of the objects (considering the case).

Initial value

Example

New value

Result

CASE SENSITIVE

A user creates tables "A" and "a" and a view V that selects from "A".

IGNORE CASE

Selecting from V now gives the error the matching objects "A" and "a" are ambiguous because of case-insensitive comparison

IGNORE CASE

A user creates a table "A" and a view V that selects from the table using the name "a" (relying on case-insensitive comparison).

CASE SENSITIVE

Selecting from V now gives the error object "a" not found.

Examples:
Copy
alter system set SQL_IDENTIFIER_COMPARISON = 'CASE SENSITIVE';
create table "MyTable" (C int);

-- Observed: Success
select * from "MyTable";

-- Observed: Error 'object "mytable" not found'
select * from "mytable";

alter system set SQL_IDENTIFIER_COMPARISON = 'IGNORE CASE';

-- Observed: Success
select * from "MyTable";
-- Observed: Success
select * from "mytable";

alter system set SQL_IDENTIFIER_COMPARISON = 'CASE SENSITIVE';
create table "MYTABLE" (C int);

-- Observed: Success
select * from "MyTable";

alter system set SQL_IDENTIFIER_COMPARISON = 'IGNORE CASE';

-- Observed: Error 'the matching objects "MyTable" and "MYTABLE" are ambiguous because of case-insensitive comparison'
select * from "MyTable";

Schema qualified names

A schema qualified name consists of the schema name and the object name, separated by a dot. Using schema qualified names enables access to objects that are not located in the current schema. If the schema name is omitted, the current schema is implied.

Examples:
Copy
SELECT my_column FROM my_schema.my_table;
SELECT "my_column" FROM "my_schema"."my_table";
SELECT my_schema.my_function(my_column) from my_schema.my_table;

Reserved words

Reserved words such as the keyword SELECT cannot be used as regular identifiers. If you want to create a table with the name SELECT, the name must be delimited with double quotation marks. Note that the name will then be case-sensitive, which means that SELECT as a table name is different from a table name such as Select or seLect.

A list of the reserved words in Exasol can be found in the EXA_SQL_KEYWORDS system table.

Regular expressions

Regular expressions, often shortened as regex or regexp, can be used to filter or replace strings with certain patterns. Exasol supports the Perl Compatible Regular Expressions (PCRE2) dialect, which has a powerful syntax and exceeds the functionality of dialects like POSIX Basic Regular Expressions (POSIX BRE) or POSIX Extended Regular Expressions (POSIX ERE).

The following describes the basic functionality of regular expressions in Exasol. For more information about the PCRE2 dialect, refer to https://www.pcre.org.

Regular expressions can be used in the following functions and predicates:

The compiled code for PCRE2 by default uses an offset of 2 bytes stored in big-endian order. These are used, for example, to link from the start of a subpattern to its alternatives and its end. The use of 2 bytes per offset limits the size of the compiled pattern to 64K.

Examples:

In the description of the corresponding scalar functions and predicates, you can find examples of regular expressions within SQL statements. The following examples demonstrate the general expressions: 

Regular expression Meaning
3[47][0-9]{13} American Express credit card number (15 digits and starting with 34 or 37)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} IP addresses like 192.168.0.1
[+-]?[0-9]*\.?[0-9]+([+-]?[eE][0-9]+)? Floating point numbers like -1.23E-6 or 40
(19|20)\d\d-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01]) Date values with format YYYY-MM-DD, for example, 2010-12-01 (restricted to 1900-2099)
(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4} Email addresses like jane.doe@example.com
select regexp_substr(',match1,,match2', '(?<=,)[^,]*', 1,2); Empty strings are not valid. For example, the result in this example would be match2.

The expressions in these examples are simplified to provide clarity. Expressions used in a production environment, for example, an expression for identifying valid email addresses, can be much more complex.

Pattern elements

You can use the following elements in regular expressions:

Element Meaning in a regular expression
General elements
. Matches any single character except a newline - if modifier (?s) is set, newlines are also matched
^ Beginning of a string - if modifier (?m) is set, the beginning of a line is also matched (after a newline)
$ End of a string - if modifier (?m) is set, the end of a line is also matched (before a newline)
(?#) Comment which is not evaluated. For example: (?# This is a comment)
Quantifiers and operators
{}

Minimum/maximum quantifier. For example: {1,5} for 1 to 5 occurrences, {3} for exactly three occurrences, or {,7} for a maximum of 7 occurrences.

{}? Similar to {}, but non-greedy (the minimal number of occurrences is used).
? 0 or 1 quantifier (at most 1 time) - similar to {0,1}
* 0 or more quantifier (any) - similar to {0,}
*? Similar to *, but non-greedy (the minimal number of occurrences is used).
+ 1 or more quantifier (at least 1 time) - similar to {1,}.
+? Similar to +, but non-greedy (the minimal number of occurrences is used)
|

Alternative operator - for example, a|b|c means a, b or c.

The first matching alternative is used, which means that the order of the alternatives can be relevant.

()

Defines a subpattern for grouping. For example, (abc)+ for at least 1 occurence of the word abc).

Captures are defined by the brackets (see below).

(?(?=pattern1)pattern2| pattern3)

Depending on whether the pattern in the first bracket matches, the following pattern is evaluated.

Example:

(?(?=[0-9])[0-9]*|[a-z]*) searches only for letters a to z if the first character is not a digit.

Character classes via []
[] Definition of a character class. For example: [abc] for a, b, or c
\ Escape character within a character class. For example: [a\\] for a or \
- Defines a range. For example: [0-9] for digits.
^ Negation of a character class. For example: [^0-9] for non-digits.
Usage of backslash(\)
\ Escape character to be able to use special characters. For example: \+ for the character +
\R Any newline character (newline or carriage return)
\n Newline
\r Carriage return
\f Form feed
\t Tab
\e Escape
\d Any digit (0-9)
\D Any non-digit
\w Any word character (alphanumeric characters and _)
\W Any non-word character
\s Any whitespace character
\S Any character except whitespace characters
Modifiers (set/unset options beginning from the current position)
(?i) Case insensitive
(?m) ^ and $ also consider newlines
(?s) . (dot) also matches newlines
(?x) Ignore whitespace characters completely
(?U) Non-greedy search (try to find the shortest possible match)
(?X)

Any backslash followed by a letter that has no special meaning throws an error.

For example, \a is not allowed. Without this modifier, \a is evaluated as a.

(?ims) Set multiple modifiers. Here, i, m, and s are set.
(?i-ms) Use - to unset one or more modifiers. Here, i is set, while m and s are unset.

(*CR)

(*LF)

(*CRLF)

Specifies which newline characters are considered by the special characters ^, $ and .(dot).

(*CR) = Only carriage return (\r) matches as newline character.
(*LF) = Only newline (\n) matches as newline character.
(*CRLF) = Both alternatives match as newline characters.

Example:

The pattern (*LF)(?ms)^abc.def$ matches the string abc\ndef.

Captures

\1

\2

...

\9

Back reference from the first capture up to the ninth capture.

Captures are defined by subpatterns, which are delimited by brackets. The captures are numbered by the opening brackets, from left to right. If you want to use more than 9 captures, you have to name a capture.

Example:

The pattern (a(b|c))([0-9]*) applied on the string ab123 results in the following captures:

\1 -> ab \2 -> b
\3 -> 123

\0 References the whole string.
(?P<name>) You can use this notation to name a capture and then reference the capture using \g<name>. This is particularly useful when you have a large number of captures or brackets.

Important: name must start with a non-digit.

Named captures do not affect the ascending numeration.

Example:

The pattern (?P<all>([a-zA-Z]*)(?P<digits>[0-9]*)) applied on the string user123 results in the following captures:

\1 and \g<all> -> user123 \2 -> user \3 and \g<digits> -> 123

(?:)

Brackets are not counted for captures.

Example:

The pattern (a(?:b|c))([0-9])* applied on the string ab123 results in the following captures:

\1 -> ab \2 -> 123

(?i:)

To simplify the syntax, you can set/unset modifiers between ? and :

Example:

(?i:hello)world will result in only the sub-pattern hello being case insensitive:

Copy
SELECT 
    col1, 
    col1 regexp_like '(?i:hello)world' as match_flag
FROM 
    values 'helloworld', 'Helloworld', 'HellOworld', 'helloWorld', 'helloWORLD' as t1(col1);
col1 match_flag
helloworld true
Helloworld true
HellOworld true
helloWorld false
helloWORLD false