Basic Language Elements

Comments in SQL

Comments are primarily used to facilitate the clarity and maintainability of SQL scripts. They can be added at the discretion of the user anywhere in the SQL script.

Comments in Exasol can be of two types: 

  • Line comments begin with the characters -- (two hyphens), indicating that the remaining part of the current line is a comment.
  • Block comments are delimited by the characters /* and */, and can span across several lines. All the characters between the delimiters are ignored.

Examples

-- This is a comment

SELECT * FROM dual;

-- Display the number of rows of table dual
SELECT count(*) FROM dual;

SELECT avg(salary) -- Average salary
FROM staff         -- From table staff
WHERE age>50;      -- Limitation to over 50 years old

/* Any number of comment rows can be
inserted between the delimiters
*/
SELECT * FROM dual;

SELECT /*This section is not evaluated!*/ * FROM dual;

SQL identifier

Each database object has a unique name. Within a SQL statement, object names are referenced as SQL identifiers. Database objects include schemas, tables, views, columns, functions, users, and roles. In Exasol, SQL identifiers can have a maximum length of 128 characters. Identifiers can be regular identifiers (unquoted) or delimited identifiers (quoted).

Regular Identifiers

Regular identifiers are unquoted.

As defined in the SQL standard, Exasol only allows a subset of the Unicode character set in regular identifiers. The first character of an SQL identifier is more restricted than the following ones:

  • First character: All letters of Unicode classes Lu (uppercase letters), LI (lowercase letters), Lt (title case letters), Lm (modifier letters), Lo (other letters), and Nl (letter numbers).
  • Following characters: All classes allowed for the first character, plus all letters of Unicode classes Mn (non-spacing marks), Mc (spacing combination marks), Nd (decimal numbers), Pc (connectors punctuations), Cf (formatting codes), and the Unicode character U+00B7 (middle dot).

For details about the characters included in each of the specified Unicode classes, see the Unicode standard.

When using the simple ASCII character set, a regular identifier may start with letters of the set {a-z, A-Z} and may further contain letters of set {a-z, A-Z, 0-9,_}.

A reserved word cannot be used as a regular identifier. For more information, see Reserved words.

Regular identifiers are not case-sensitive but are always stored in UPPERCASE letters in the database.

Examples

Identifier Name in the database
ABC ABC
aBc ABC
a123 A123

Delimited identifiers

Delimited identifiers are quoted, which here means that they are enclosed in (delimited by) double quotation marks. All characters except a dot (.) are allowed in a delimited identifier. In the case of users and roles you are allowed to use a dot, for example, in email addresses. This is because users and roles do not conflict with schema-qualified names. For more details, see Schema qualified names.

To use double quotation marks within a name you must include two successive double quotation marks. For example, "ab""c" indicates the name ab"c.

Except in users and roles, delimited identifiers are case-sensitive when stored in the database.

Examples
Identifier Name in the database
"ABC" ABC
"abc" abc
"_X_" _X_
"ab""c" ab"c

Schema qualified names

A schema qualified name consists of the schema name and the object name, separated by a dot. Using schema qualified names enables access to objects that are not located in the current schema. If the schema name is omitted, the current schema is implied.

Examples

SELECT my_column FROM my_schema.my_table;
SELECT "my_column" FROM "my_schema"."my_table";
SELECT my_schema.my_function(my_column) from my_schema.my_table;

Reserved words

Reserved words such as the keyword SELECT cannot be used as regular identifiers. If you want to create a table with the name SELECT, the name must be delimited with double quotation marks. Note that the name will then be case-sensitive, which means that SELECT as a table name is different from a table name such as Select or seLect.

A list of the reserved words in Exasol can be found in the EXA_SQL_KEYWORDS system table.

Regular expressions

Regular expressions, often shortened as regex or regexp, can be used to filter or replace strings with certain patterns. Exasol supports the Perl Compatible Regular Expressions (PCRE) dialect, which has a powerful syntax and exceeds the functionality of dialects like POSIX Basic Regular Expressions (POSIX BRE) or POSIX Extended Regular Expressions (POSIX ERE).

The following describes the basic functionality of regular expressions in Exasol. Detailed information about the PCRE dialect can be found on www.pcre.org.

Regular expressions can be used in the following functions and predicates:

The compiled code for PCRE by default uses an offset of 2 bytes stored in big-endian order. These are used, for example, to link from the start of a subpattern to its alternatives and its end. The use of 2 bytes per offset limits the size of the compiled pattern to 64K.

Examples

In the description of the corresponding scalar functions and predicates, you can find examples of regular expressions within SQL statements. The following examples demonstrate the general expressions: 

Regular Expressions Meaning
3[47][0-9]{13} American Express credit card number (15 digits and starting with 34 or 37)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} IP addresses like 192.168.0.1
[+-]?[0-9]*\.?[0-9]+([+-]?[eE][0-9]+)? Floating point numbers like -1.23E-6 or 40
(19|20)\d\d-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01]) Date values with format YYYY-MM-DD, for example, 2010-12-01 (restricted to 1900-2099)
(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4} Email addresses like jane.doe@example.com
select regexp_substr(',match1,,match2', '(?<=,)[^,]*', 1,2); Empty strings are not valid. For example, the result in this example would be match2.

The expressions in these examples are simplified to provide clarity. Expressions used in a production environment, for example, an expression for identifying valid email addresses, are much more complex.

Pattern Elements

You can use the following elements in regular expressions:

Element Meaning within the regular expression
General Elements
. Matches any single character except a newline - if modifier (?s) is set, then newlines are also matched
^ Begin of a string - if modifier (?m) is set, the begin of a line is also matched (after a newline)
$ End of a string - if modifier (?m) is set, the end of a line is also matched (before a newline)
(?#) Comment which is not evaluated (for example: (?# This is a comment))
Quantifiers and Operators
{} Minimum/Maximum quantifier (for example, {1,5} for 1-5 occurrences or {3} for exactly three occurrences or {,7} for a maximum of 7 occurrences)
{}? Similar to {}, but non-greedy (i.e. the minimal number of occurrences is used)
? 0 or 1 quantifier (at most 1 time) - similar to {0,1}
* 0 or more quantifier (any) - similar to {0,}
*? Similar to *, but non-greedy (i.e. the minimal number of occurrences is used)
+ 1 or more quantifier (at least 1 time) - similar to {1,}
+? Similar to +, but non-greedy (i.e. the minimal number of occurrences is used)
| Alternative operator (for example, a|b|c for 'a', 'b' or 'c'). Please consider that the first matching alternative is used. Hence the order of the alternatives can be relevant
() Definition of a subpattern (for grouping, e.g. (abc)+ for at least 1 time the word 'abc'). Furthermore, captures are defined by the brackets (see below)
(?(?=pattern1)pattern2| pattern3) Depending whether the pattern in the first bracket matches, the following pattern is evaluated (e.g. (?(?=[0-9])[0-9]*|[a-z]*) Searches only for letters from a-z, if the first character is no digit).
Characters Classes via []
[] Definition of a character class (for example: [abc] for 'a', 'b', or 'c')
\ Escape character within a character class (e.g. [a\\] for 'a' or '\')
- Defines a range (for example: [0-9] for digits)
^ Negation of a character class (for example: [^0-9] for non-digits)
Usage of Backslash(\)
\ Escape character to be able to use special characters (for example: \+ for character '+')
\R Any newline character (newline or carriage return)
\n Newline
\r Carriage return
\f Form feed
\t Tab
\e Escape
\d Any digit (0-9)
\D Any non-digit
\w Any word character (alphanumeric characters and _)
\W Any non-word character
\s Any whitespace character
\S Any character except whitespace characters
Modifiers (set/unset options beginning from the current position)
(?i) Case insensitive
(?m) ^ and $ also consider newlines
(?s) . (period) also matches newlines
(?x) Ignore whitespace characters completely
(?U) Non-greedy search (tried to find the shortest possible match)
(?X) Any backslash that is followed by a letter that no special meaning cause an error . For example, \a is not allowed. Without this modifier, \a is evaluated as 'a'.
(?ims) Switch on multiple modifiers (i, m, and s are switched on)
(?i-ms) Switch off modifiers (i is switched on, m and s are switched off)

(*CR)

(*LF)

(*CRLF)

By the use of these three modifiers, you can specify which new line characters are considered by the special characters ^, $ and .(period).

(*CR) - Only carriage return (\r) matches as newline character
(*LF) - Only newline (\n) matches as newline character
(*CRLF) - Both alternatives match as newline characters

Example: pattern (*LF)(?ms)^abc.def$ matches the string 'abc\ndef'.

Captures

\1

\2

...

\9

Back-Reference from the first up to the ninth capture.

Captures are defined by subpatterns which are delimited by brackets. They are numbered by the opening brackets, from left to right. If you want to use more than 9 captures, you have to name a capture.

For Example:
pattern (a(b|c))([0-9]*) applied on the string 'ab123' results in the following captures:
\1      ab
\2      b
\3      123

\0 This special capture always references the whole string.
(?P<name>) With this notation, you can name a capture which should be used for clarity if you want to use many captures or brackets. Afterward, you can reference the capture via \g<name>.

Important: name must start with a non-digit. Named captures do not affect the ascending numeration.

For example:
pattern (?P<all>([a-zA-Z]*)(?P<digits>[0-9]*)) applied on the string 'user123' results in the following captures:
\1 and \g<all>          users123
\2                      user
\3 and \g<digits>       123

(?:)

Such brackets are not counted for captures.

For example:
pattern (a(?:b|c))([0-9])* applied on the string 'ab123' results in the following captures:
\1   ab
\2    123

(?i:)

To simplify the syntax, you can set/unset modifiers between ? and :

For example:
(?i:hello)world will result in only the sub-pattern ‘hello’ being case insensitive:

SELECT 
    col1, 
    col1 regexp_like '(?i:hello)world' as match_flag
FROM 
    values 'helloworld', 'Helloworld', 'HellOworld', 'helloWorld', 'helloWORLD' as t1(col1);
col1 match_flag
helloworld true
Helloworld true
HellOworld true
helloWorld false
helloWORLD false