Comments in SQL

Comments are primarily used to facilitate the clarity and maintainability of SQL scripts. They can be added at the discretion of the user anywhere in the SQL script.

Comments in Exasol can be of two types: 

  • Line comments begin with the character '--' and indicate that the remaining part of the current line is a comment.
  • Block comments are indicated by the characters '/*' and '*/', and can span across several lines. All the characters between the delimiters are ignored.
Examples
-- This is a comment

SELECT * FROM dual;

-- Display the number of rows of table dual
SELECT count(*) FROM dual;

SELECT avg(salary) -- Average salary
FROM staff         -- From table staff
WHERE age>50;      -- Limitation to over 50 years old

/* Any number of comment rows can be
inserted between the delimiters
*/
SELECT * FROM dual;

SELECT /*This section is not evaluated!*/ * FROM dual;

SQL Identifier

Each database object has a unique name. Within an SQL statement, these names are referenced as SQL identifiers. These database objects include schemas, tables, views, columns, functions, users, and roles. In Exasol, SQL identifiers can have a maximum length of 128 characters. One differentiates between regular identifiers and identifiers in quotation marks.

Regular Identifiers

Regular identifiers are stated without quotation marks. As defined in the SQL standard, Exasol only allows a subset of the unicode characters. The first letter is more restricted than the following ones. The detailed list of character of the specified classes are defined in the unicode standard.

  • First Letter: All letters of unicode classes Lu (upper-case letters), LI (lower-case letters), Lt (title-case letters), Lm (modifier letters), Lo (other letters), and Nl(letter numbers).
  • Following Letters: The classes of the first letter (aforementioned) and additionally all letters of unicode classes Mn (non-spacing marks), Mc (spacing combination marks), Nd (decimal numbers), Pc (connectors punctuations), Cf (formatting codes), and the unicode character U+00B7 (middle dot).

Regarding the simple ASCII character set, these rules denote that a regular identifier may start with letters of the set {a-z, A-Z} and may further contain letters of set {a-z, A-Z, 0-9,_}.

A further restriction is that a reserved word cannot be used as regular identifiers.

If you want to use characters which are prohibited for regular identifiers, you can use Delimited Identifiers. For example, if you want to use the word table as identifier, you have to quote it ("table"), since it is a reserved word.

Regular identifiers are always stored in upper case. They are not case-sensitive. Therefore, in the below example, the two identifiers ABC and abc are equal.

Regular identifiers are stored in upper case.

Examples
Identifier Name in the DB
ABC ABC
aBc ABC
a123 A123

Delimited Identifiers

These identifiers are names enclosed in double quotation marks. Any characters can be contained within the quotation marks except the dot ('.'). In the case of users and roles, you are allowed to use the dot ('.'), for example, in email addresses. This is possible because users and roles do not conflict with the schema-qualified objects. For more information, refer to the Schema-Qualified Names section.

When you want to use double quotation marks in the name, you must include two double quotation marks next to one another. For example, "ab""c" indicates the name ab"c.

Note: Except in users and roles, delimited identifiers in quotation marks are always case-sensitive when stored in the database.

Examples
Identifier Name in the DB
"ABC" ABC
"abc" abc
"_X_" _X_
"ab""c" ab"c

Schema-Qualified Names

Database objects, when called via schema-qualified names, the names are separated with a period. This enables access to schema objects which are not located in the current schema.

Examples
SELECT my_column FROM my_schema.my_table;
SELECT "my_column" FROM "my_schema"."my_table";
SELECT my_schema.my_function(my_column) from my_schema.my_table;

Reserved Words

There are a series of reserved words that cannot be used as regular identifiers. For example, the keyword SELECT. It is a reserved word. If you need to create a table with the name SELECT, this is only possible if the name is within double quotation marks. Therefore, "SELECT" as a table name is different from a table name such as "Select" or "seLect".

A list of reserved words in Exasol can be found in the EXA_SQL_KEYWORDS system table.

Regular Expressions

Regular expressions can be used to filter or replace strings with certain patterns. Exasol supports the Perl Compatible Regular Expressions (PCREClosed Perl Compatible Regular Expressions) dialect, which has a powerful syntax and exceeds the functionality of dialects like POSIXClosed Portable Operating System Interface [for Unix] Basic Regular Expressions (POSIX BRE) or POSIX Extend Regular Expressions (POSIX ERE). This chapter describes the basic functionality of regular expressions in Exasol. Detailed information about the PCRE dialect can be found on www.pcre.org.

Regular expressions can be used in the following functions and predicates:

The compiled code for PCRE uses an offset of 2-bytes by default (always stored in big-endian order). These are used, for example, to link from the start of a subpattern to its alternatives and its end. The use of 2 bytes per offset limits the size of the compiled pattern to 64K.

Examples

In the description of the corresponding scalar functions and predicates, you can find examples of regular expressions within SQL statements. The following examples demonstrates the general expressions: 

Regular Expressions Meaning
3[47][0-9]{13} American Express credit card number (15 digits and starting with 34 or 37)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} IP addresses like 192.168.0.1
[+-]?[0-9]*\.?[0-9]+([+-]?[eE][0-9]+)? Floating point numbers like -1.23E-6 or 40
(19|20)\d\d-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01]) Date values with format YYY-MM-DD,for example, 2010-12-01 (restricted to 1900-2099)
(?i)[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4} Email addresses like hello.world@yahoo.com

Note: These example expressions are simplified to provide clarity. The productive expressions, for example, identifying valid email addresses are much more complex.

Pattern Elements

You can use the following elements in regular expressions:

Element Meaning within the regular expression
General Elements
. Matches any single character except a newline - if modifier (?s) is set, then newlines are also matched
^ Begin of a string - if modifier (?m) is set, the begin of a line is also matched (after a newline)
$ End of a string - if modifier (?m) is set, the end of a line is also matched (before a newline)
(?#) Comment which is not evaluated (for example: (?# This is a comment))
Quantifiers and Operators
{} Minimum/Maximum quantifier (for example, {1,5} for 1-5 occurrences or {3} for exactly three occurrences or {,7} for at maximum 7 occurrences)
{}? Similar to {}, but non-greedy (i.e. the minimal number of occurrences is used)
? 0 or 1 quantifier (at most 1 time) - similar to {0,1}
* 0 or more quantifier (any) - similar to {0,}
*? Similar to *, but non-greedy (i.e. the minimal number of occurrences is used)
+ 1 or more quantifier (at least 1 time) - similar to {1,}
+? Similar to +, but non-greedy (i.e. the minimal number of occurrences is used)
| Alternative operator (for example, a|b|c for 'a', 'b' or 'c'). Please consider that the first matching alternative is used. Hence the order of the alternatives can be relevant
() Definition of a subpattern (for grouping, e.g. (abc)+ for at least 1 time the word 'abc'). Furthermore, captures are defined by the brackets (see below)
(?(?=pattern1)pattern2| pattern3) Depending whether the pattern in the first bracket matches, the following pattern is evaluated (e.g. (?(?=[0-9])[0-9]*|[a-z]*) Searches only for letters from a-z, if the first character is no digit).
Characters Classes via []
[] Definition of a character class (for example: [abc] for 'a', 'b', or 'c')
\ Escape character within a character class (e.g. [a\\] for 'a' or '\')
- Defines a range (for example: [0-9] for digits)
^ Negation of a character class (for example: [^0-9] for non-digits)
Usage of Backslash(\)
\ Escape character to be able to use special characters (for example: \+ for character '+')
\R Any newline character (newline or carriage return)
\n Newline
\r Carriage return
\f Form feed
\t Tab
\e Escape
\d Any digit (0-9)
\D Any non-digit
\w Any word character (alphanumeric characters and _)
\W Any non-word character
\s Any white space character
\S Any character except a white space character
Modifiers (set/unset options beginning from the current position)
(?i) Case insensitive
(?m) ^ and $ also consider newlines
(?s) . (period) also matches newlines
(?x) Ignore white space characters completely
(?U) Non-greedy search (tried to find the shortest possible match)
(?X) Any backslash that is followed by a letter that no special meaning cause an error . For example, \a is not allowed. Without this modifier, \a is evaluated as 'a'.
(?ims) Switch on multiple modifiers (i, m, and s are switched on)
(?i-ms) Switch off modifiers (i is switched on, m and s are switched off)

(*CR)

(*LF)

(*CRLF)

By the use of these three modifiers, you can specify which new line characters are considered by the special characters ^, $ and .(period).

(*CR) - Only carriage return (\r) matches as newline character
(*LF) - Only newline (\n) matches as newline character
(*CRLF) - Both alternatives match as newline characters

Example: pattern (*LF)(?ms)^abc.def$ matches the string 'abc\ndef'.

Captures

\1

\2

...

\9

Back-Reference from the first up to the ninth capture.

Captures are defined by subpatterns which are delimited by brackets. They are numbered by the opening brackets, from left to right. If you want to use more than 9 captures, you have to name a capture.

For Example:
pattern (a(b|c))([0-9]*) applied on the string 'ab123' results in the following captures:
\1     ab
\2      b
\3      123

\0 This special capture always references the whole string.
(?P<name>) With this notation, you can name a capture which should be used for clarity if you want to use many captures or brackets. Afterward, you can reference the capture via \g<name>.

Important: name must start with a non-digit. Named captures doesn't affect the ascending numeration.

For example:
pattern (?P<all>([a-zA-Z]*)(?P<digits>[0-9]*)) applied on the string 'user123' results in the following captures:
\1 and \g<all >        users123
\2                      user
\3 and \g<digits>       123

(?:)

Such brackets are not counted for captures.

For example:
pattern (a(?:b|c))([0-9])* applied on the string 'ab123' results in the following captures:
\1   ab
\2    123

(?i:) To simplify the syntax, you can set/unset modifiers between ? and :