SQL Preprocessor
This section explains how to use the SQL preprocessor in Exasol.
Exasol provides an SQL preprocessor that can preprocess all executed SQL statements. Using the preprocessor, unsupported
SQL constructs can be transformed into existing SQL features (see examples below). Additionally, you
can introduce syntactic sugar
by replacing simple constructs with more complex elements.
By default, the SQL preprocessor is deactivated. Using the statements ALTER SESSION and ALTER SYSTEM, you can define a script for a session or for the whole system that is responsible for the preprocessing of all SQL commands. Before an SQL statement is passed to the actual database compiler, the preprocessor does a text transformation. Within the script, you can get and set the original text and manipulate it by using the auxiliary library (sqlparsing). For details of scripting language, see Scripting.
Be cautious before activating SQL preprocessing for a whole system, since the SQL statements may stop working if preprocessing has been incorrectly implemented.
You can deactivate the preprocessing using the statements ALTER SESSION and ALTER SYSTEM (by setting the parameter SQL_PREPROCESSOR_SCRIPT to the NULL value), as these statements are deliberately excluded from the preprocessing.
For data security reasons, all statements that include passwords are excluded (CREATE USER, ALTER USER, CREATE CONNECTION, ALTER CONNECTION, IMPORT, EXPORT if the IDENTIFIED BY clause was specified).
In the auditing table EXA_DBA_AUDIT_SQL, a separate entry for the execution of the preprocessor script is added (EXECUTE SCRIPT and the original text within a comment). The executed transformed SQL statement is listed in another entry.
Library sqlparsing
Exasol provides a library with parsing capabilities for SQL text manipulation.
- Split into tokens
- tokenize()
- Identification of token types
- iswhitespace()
- iscomment()
- iswhitespaceorcomment()
- isidentifier()
- iskeyword()
- isstringliteral()
- isnumericliteral()
- isany()
- Normalizing a string
- normalize()
- Finding token sequences
- find()
- Access to the SQL text
- getsqltext()
- setsqltext()
Details
sqlparsing.tokenize(sqlstring)
Splits an input string into an array of strings which correspond to the tokens recognized by the database compiler. If you concatenate these tokens, you will get the original input string (including upper/lowercase, line breaks, whitespaces, and so on). Therefore, the equation table.concat(tokens)==sqlstring is valid.
The possible tokens are as follows:
- Valid SQL identifiers, for example, test.tab or "foo"."bar"
- Keywords, for example, SELECT
- String literals, for example, 'abcdef'
- Numerical literals without sign, for example, 123e4
- Connected whitespaces corresponding to the SQL standard
- Comments corresponding to the SQL standard (line and block comments)
- Multi-character tokens, for example, ':=', '::', '||', '->', '>>', '<<', '>=', '<=', '<>', '!=', '^='
- Single-character tokens, for example, '+', '-', '/', '*', '~'
--/
CREATE SCRIPT example(sql_text) AS
local tokens = sqlparsing.tokenize(sql_text)
for i=1,#tokens do
print(tokens[i])
end
/
EXECUTE SCRIPT example('SELECT dummy FROM dual') WITH OUTPUT;
sqlparsing.iscomment(tokenstring)
Returns whether the input string is a comment token.
sqlparsing.iswhitespace(tokenstring)
Returns whether the input string is a whitespace token.
sqlparsing.iswhitespaceorcomment(tokenstring)
Returns whether the input string is a whitespace or comment token. This function can be useful for function find(), because it filters all irrelevant tokens corresponding to the SQL standard.
sqlparsing.isidentifier(tokenstring)
Returns whether the input string is an identifier token.
sqlparsing.iskeyword(tokenstring)
Returns whether the input string is a SQL keyword token (for example, SELECT, FROM, TABLE). The functions isidentifier() and iskeyword() return both true for non-reserved keywords. Therefore, you can identify non-reserved keywords.
sqlparsing.isstringliteral(tokenstring)
Returns whether the input string is a string literal token.
sqlparsing.isnumericliteral(tokenstring)
Returns whether the input string is a numeric literal token.
sqlparsing.isany(tokenstring)
Always returns true. This can be useful if you want to find any first relevant token (as match function within the method find()).
sqlparsing.normalize(tokenstring)
Returns a normalized string for similar representations (upper/lowercase identifiers), on the basis of the following rules:
- Regular identifiers are transformed into uppercase letters, for example, dual -> DUAL
- Keywords are transformed in uppercase letters, for example, From -> FROM
- Whitespace-Token of any size are replaced by a single whitespace
- In numerical literals, an optional lowercase "e" is replaced by "E", for example, 1.2e34 -> 1.2E34
sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN])
Searches in the token list, starting from positions startTokenNr, forward or backward (searchForward), and optionally only within the current level of brackets (searchSameLevel), for the directly successive sequence of tokens which are matched by parameters match1, ... matchN. In that search process, all tokens that match by function ignoreFunction will be not considered by the match functions. Please note that only the first occurrence is returned. In case of multiple occurrences, find can be called multiple times with a start position which indicates where the search begins.
If the searched token sequence is found, then an array of size N is returned (in case of N match elements) whose X-th entry contains the position of the token within the token list which was matched by matchX. If the token sequence was not found, the function returns nil.
The details of the parameters are as follows:
Paramenter | Description |
---|---|
tokenlist |
List of tokens which is produced by the function tokenize. |
startTokenNr |
Number of the first token to be considered for the search. |
searchForward |
Defines whether the search should be applied forward (true) or backward (false). This affects only the direction by that the search process is moving across the list, but the match functions always search forward. That means that if you search the token sequence KEYWORD, IDENTIFIER, within the token list 'select' 'abc' 'from' 'dual', and start from position 3, then 'from' 'dual' will be matched and not 'from' 'abc', even when searching backward. If you start your search at position 2, then the backward search will return 'select' 'abc', and the forward search will return 'from' 'dual'. |
searchSameLevel |
Defines whether the search should be limited to the current level of brackets (true) or also beyond (false). This applies only to the match of the first token of the sequence. Subsequent tokens can also be located in more inner bracket levels. That means that the search of the token sequence '=' '(' 'SELECT' is also possible if it is constrained to the current level, although the 'SELECT' is located in the next inner bracket level. The option searchSameLevel is especially useful for finding the corresponding closing bracket, for example, of a subquery. Example: Search the closing bracket within the token sequence 'SELECT' 't1.x' '+' '(' 'SELECT' 'min' '(' 'y' ')' 'FROM' 't2' ')' 'FROM' 't1' which corresponds to the bracket at position 4: sqlparsing.find(tokens, 4, true, true, sqlparsing. iswhitespaceorcomment, ')'). |
ignoreFunction |
A function of type function(string)->bool is expected. The tokens for which the function ignoreFunction returns true will be ignored by the match functions. That means, you can specify tokens types which may occur within the sequence without breaking the match. In many cases, the function iswhitespaceorcomment is useful for that purpose. |
match1...matchN |
By match1..matchN, the searched token sequence is specified. These parameters should either be functions of type function(tokenstring)->bool or simple strings. A token sequence is searched where the first token matches match1, the second matches match2, and so on, while tokens in between are ignored if function ignoreFunction returns true. If a parameter is a string, then the comparison normalize(tokenstring)== normalize(matchstring) is applied. |
sqlparsing.getsqltext()
Returns the current SQL statement text.
This function is only available within the main SQL Preprocessor script.
sqlparsing.setsqltext(string)
Sets the SQL statement text to a new value which will be passed to the database compiler for execution.
This function is only available within the main SQL Preprocessor script.
Best Practice
For better management of the SQL Preprocessor, you should consider the followings:
- Extensively test a preprocessor script in your own session before activating it throughout the system.
- Implement the SQL processing by using separate auxiliary scripts and integrated in one main script which is just a wrapper to hand over the SQL text (for example. sqlparsing.setsqltext(myscript.preprocess(sqlparsing.getsqltext()))). This is because the functions getsqltext() and setsqltext() are only available within the preprocessing and not in normal script executions. By the separation you can test the processing on several test SQL constructs (on your own daily SQL history stored within a table) before activating the main script as preprocessor script.
- Ensure that all necessary privileges are granted to execute the preprocessor script. It is recommended that you start a test with a user without special rights. Otherwise, certain user groups could be blocked from executing any SQL statements.
- Make the preprocessing as simple as possible. Especially, query() and pquery() should only be used in exceptional cases if you activate the preprocessing globally, because all SQL queries will be decelerated, and a parallel access on similar tables increases the risk of transaction conflicts.
Examples
This section shows some examples for Preprocessor scripts that explain the functionality.
IF() function
In this example, the IF() function (currently not supported in Exasol) is transformed into an equivalent CASE WHEN expression.
--/
CREATE SCRIPT sql_preprocessing.transformIf() AS
function processIf(sqltext)
while (true) do
local tokens = sqlparsing.tokenize(sqltext)
local ifStart = sqlparsing.find(tokens,
1,
true,
false,
sqlparsing.iswhitespaceorcomment,
'IF',
'(' )
if (ifStart==nil) then
break;
end
local ifEnd = sqlparsing.find(tokens,
ifStart[2],
true,
true,
sqlparsing.iswhitespaceorcomment,
')')
if (ifEnd==nil) then
error("if statement not ended properly")
break;
end
local commas1 = sqlparsing.find(tokens,
ifStart[2]+1,
true,
true,
sqlparsing.iswhitespaceorcomment,
',' )
if (commas1==nil) then
error("invalid if function")
break;
end
local commas2 = sqlparsing.find(tokens,
commas1[1]+1,
true,
true,
sqlparsing.iswhitespaceorcomment,
',' )
if (commas2==nil) then
error("invalid if function")
break;
end
local ifParam1=table.concat(tokens, '', ifStart[2]+1, commas1[1]-1)
local ifParam2=table.concat(tokens, '', commas1[1]+1, commas2[1]-1)
local ifParam3=table.concat(tokens, '', commas2[1]+1, ifEnd[1]-1)
local caseStmt='CASE WHEN ('..ifParam1..') != 0 \
THEN ('..ifParam2..') \
ELSE ('..ifParam3..') END '
sqltext=table.concat(tokens, '',1,
ifStart[1]-1)..caseStmt..table.concat(tokens,
'',
ifEnd[1]+1)
end
return sqltext
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessIf() AS
import( 'sql_preprocessing.transformIf', 'transformIf' )
sqlparsing.setsqltext(
transformIf.processIf(sqlparsing.getsqltext()))
/
SELECT IF( 3+4 > 5, 6, 7 ) from dual;
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessIf;
SELECT IF( 3+4 > 5, 6, 7 ) AS col1 FROM dual;
ls command
In this example, Unix command ls is transferred to the database. This command returns either the list of all objects within a schema or the list of all schemas if no schema is opened. Additionally, you can apply filters (case insensitive) for example, ls '%name%' to display all objects whose name contains the text 'name'.
--/
CREATE SCRIPT sql_preprocessing.addunixcommands() AS
function processLS(input, tokens, commandPos)
local result = query("SELECT CURRENT_SCHEMA")
local current_schema = result[1][1]
local returnText = ""
local searchCol = ""
if (current_schema==null) then
returnText = "SELECT schema_name FROM exa_schemas WHERE true"
searchCol = "schema_name"
elseif (current_schema=='SYS' or current_schema=='EXA_STATISTICS') then
returnText = "SELECT object_name, object_type FROM exa_syscat \
WHERE schema_name='"..current_schema.."'"
searchCol = "object_name"
else
returnText = "SELECT object_name, object_type FROM exa_all_objects \
WHERE root_type='SCHEMA' \
AND root_name='"..current_schema.."'"
searchCol = "object_name"
end
local addFilters = {}
local lastValid = commandPos
local foundPos = sqlparsing.find(tokens,
lastValid+1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
while (not(foundPos==nil) )
do
local foundToken = tokens[foundPos[1]]
if (sqlparsing.isstringliteral(foundToken)) then
addFilters[#addFilters+1] = "UPPER("..searchCol..") \
LIKE UPPER("..foundToken .. ")"
elseif (not (sqlparsing.normalize(foundToken) == ';')) then
error("only string literals allowed as arguments for ls,\
but found '"..foundToken.."'")
end
lastValid = foundPos[1]
foundPos = sqlparsing.find(tokens,
lastValid+1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
end
if ( #addFilters > 0 ) then
local filterText = table.concat(addFilters, " OR ")
return returnText.." AND ("..filterText..")".." ORDER BY "..searchCol
else
return returnText.." ORDER BY "..searchCol
end
end
function processUnixCommands(input)
local tokens = sqlparsing.tokenize(input)
local findResult = sqlparsing.find(tokens,
1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
if (findResult==nil) then
return input
end
local command = tokens[findResult[1]]
if (sqlparsing.normalize( command )=='LS') then
return processLS(input, tokens, findResult[1])
end
return input;
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessWithUnixTools() AS
import( 'sql_preprocessing.addunixcommands', 'unixCommands' )
sqlparsing.setsqltext(
unixCommands.processUnixCommands(sqlparsing.getsqltext()));
/
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessWithUnixTools;
OPEN SCHEMA sql_preprocessing;
LS '%unix%';
CLOSE SCHEMA;
LS;
ANY/ALL
ANY and ALL SQL constructs are currently not supported by Exasol. However, using the following script, you can add this functionality.
--/
CREATE SCRIPT sql_preprocessing.transformAnyAll() AS
function rebuildAnyAll(inputsqltext)
local sqltext = inputsqltext;
local tokens = sqlparsing.tokenize(sqltext);
local found = true;
local searchStart = 1;
-- search for sequence >|>=|<|<= ANY|ALL ( SELECT
repeat
local foundPositions =
sqlparsing.find(tokens,
searchStart,
true,
false,
sqlparsing.iswhitespaceorcomment,
function (token)
return (token=='<' or token=='<=' or token=='>'
or token=='>=' or token=='!=' or token=='<>'
or token=='=');
end, -- match <|<=|>|>=|=|!=|<>
function ( token )
local normToken = sqlparsing.normalize(token);
return (normToken=='ANY' or normToken=='SOME'
or normToken=='ALL');
end, -- match ANY|ALL
'(', -- match (
'SELECT' -- match SELECT
);
if (foundPositions==nil) then
found = false;
break;
end
local operatorPos = foundPositions[1];
local anyAllPos = foundPositions[2];
local openBracketPos = foundPositions[3];
searchStart = anyAllPos + 1
foundPositions = sqlparsing.find(tokens,
openBracketPos,
true,
true,
sqlparsing.iswhitespaceorcomment,
')');
if (foundPositions ~= nil) then
local closeBracketPos = foundPositions[1]
local operatorToken = tokens[operatorPos];
local anyOrAll = sqlparsing.normalize(tokens[anyAllPos]);
if (operatorToken=='<' or operatorToken=='<='
or operatorToken=='>' or operatorToken=='>=') then
-- now we have <|<=|>|>= ANY|ALL (SELECT <something> FROM
-- rebuild to <|<=|>|>= (SELECT MIN|MAX(<something>) FROM
local setfunction = 'MIN';
if ( ((anyOrAll=='ANY' or anyOrAll=='SOME') and
(operatorToken=='<' or operatorToken=='<=')
) or
(anyOrAll=='ALL' and (operatorToken=='>'
or operatorToken=='>=')
)
) then
setfunction = 'MAX';
end
tokens[anyAllPos] = '';
tokens[openBracketPos] =
'(SELECT ' .. setfunction .. '(anytab.anycol) FROM (';
tokens[closeBracketPos] = ') as anytab(anycol) )';
elseif (operatorToken=='=' and anyOrAll=='ALL') then
-- special rebuild for = ALL
-- rebuild to=(SELECT CASE WHEN COUNT(DISTINCT <something>)==1
-- THEN FIRST_VALUE(<something>) ELSE NULL END FROM
tokens[anyAllPos] = '';
tokens[openBracketPos] =
'(SELECT CASE WHEN COUNT(DISTINCT anytab.anycol) = 1 \
THEN FIRST_VALUE(anytab.anycol) ELSE NULL END FROM (';
tokens[closeBracketPos] = ') as anytab(anycol) )';
elseif ((operatorToken=='!=' or operatorToken=='<>')
and anyOrAll=='ALL') then
-- special rebuild for != ALL
-- rebuild to NOT IN
tokens[operatorPos] = ' NOT IN '
tokens[anyAllPos] = ''
elseif (operatorToken=='!=' and
(anyOrAll=='ANY' or anyOrAll=='SOME')) then
--special rebuild for != ANY, rebuild to
-- CASE WHEN (SELECT COUNT(DISTINCT <something>) FROM ...) == 1
-- THEN operand != (SELECT FIRST_VALUE(<something>) FROM ...)
-- ELSE operand IS NOT NULL END
--note: This case would normally require to determine the operand
-- which requires full understanding of a value expression
-- in SQL standard which is nearly impossible in
-- preprocessing (and very susceptible to errors)
-- so we evaluate the
-- SELECT COUNT(DISTINCT <something) FROM ...) == 1 here and
-- insert the correct expression
--
-- first preprocess the inner query
local queryText = table.concat(tokens,
'',
openBracketPos,
closeBracketPos)
queryText = rebuildAnyAll( queryText )
-- since the subquery was already processed we can continue
-- searching *after* the SELECT
searchStart = closeBracketPos + 1
local distinctQueryText='SELECT COUNT(DISTINCT anytab.anycol) \
FROM '..queryText..' AS anytab(anycol)'
local success, result = pquery(distinctQueryText)
if (success) then
if (result[1][1] == 1) then
tokens[anyAllPos] ='(SELECT FIRST_VALUE(anytab.anycol) \
FROM '..queryText..' AS anytab(anycol))'
else
tokens[operatorPos] = ' IS NOT NULL '
tokens[anyAllPos] = ''
end
-- clear all tokens of the SELECT
for curTokenNr=openBracketPos,closeBracketPos do
tokens[curTokenNr] = ''
end
end
end
end
until found == false;
return table.concat(tokens);
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessAnyAll AS
import('sql_preprocessing.transformAnyAll', 'anyallparser');
sqlparsing.setsqltext(
anyallparser.rebuildAnyAll(sqlparsing.getsqltext()))
/
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES 1,2,3,4,5,6,7,8,9,10;
CREATE TABLE t2 (j INT);
INSERT INTO t2 VALUES 5,6,7;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);
Error: [0A000] Feature not supported: comparison with quantifier ALL
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessAnyAll;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);