SQL Preprocessor
This article explains how to use the SQL preprocessor in Exasol.
Introduction
Exasol provides a SQL preprocessor that can preprocess all executed SQL statements. By using the preprocessor, unsupported
SQL constructs can be transformed into existing SQL features (see examples below). Additionally, you
can introduce syntactic sugar
by replacing simple constructs with more complex elements.
In Exasol 2025.1.5 and later, preprocessing can be implemented using SQL preprocessing scripts written in the same languages that support virtual schema adapter scripts, which currently includes Python 3 and Java, or using scripting programs written in Lua. Earlier versions of Exasol only support preprocessing with Lua scripting programs.
When using scripting programs written in Lua, you can get and set the original text and manipulate it within the script by using the auxiliary library sqlparsing. To learn more about the Lua scripting language, see Scripting.
The SQL preprocessor is deactivated by default. By using the statements ALTER SESSION and ALTER SYSTEM you can define a script for a session or for the whole system that is responsible for the preprocessing of all SQL commands. Before an SQL statement is passed to the actual database compiler, the preprocessor does a text transformation.
Make sure that preprocessing is correctly implemented before activating it for a whole system, since incorrect implementation of preprocessing may cause SQL statements to stop working.
You can deactivate preprocessing using the statements ALTER SESSION and ALTER SYSTEM (by setting the parameter SQL_PREPROCESSOR_SCRIPT to the NULL value), since these statements are deliberately
excluded from the preprocessing.
For data security reasons, all statements that include
passwords are excluded from preprocessing: CREATE USER, ALTER USER, CREATE CONNECTION, ALTER CONNECTION, IMPORT,
and EXPORT if the IDENTIFIED BY clause was specified.
In the auditing table EXA_DBA_AUDIT_SQL, a separate entry for the execution of the preprocessor script is
added (EXECUTE SCRIPT and the original text within a comment). The executed transformed SQL statement is
listed in another entry.
Preprocessing scripts
In Exasol 2025.1.5 and later, preprocessing can be implemented using SQL preprocessing scripts written in the languages that support virtual schema adapter scripts, which currently includes Python 3 and Java. Earlier versions of Exasol only support preprocessing with Lua scripting programs.
A preprocessor script is defined using the syntax CREATE <language> PREPROCESSOR SCRIPT <name> AS .... For Lua, the existing syntax CREATE LUA SCRIPT is also supported for backwards compatibility.
For languages other than LUA, the preprocessor script must implement a callback function. The callback functions will receive the original SQL statement as a string parameter and must return the preprocessed SQL statement. For more details, see Preprocessor script callback function (Java) and Preprocessor script callback (Python).
| Language | Preprocessor script API |
|---|---|
| Java | String adapterCall(final ExaMetadata metadata, final String sqlStatement)
|
| Python 3 | adapter_call(sql_statement)
|
| Lua | sqlparsing.getsqltext(), sqlparsing.setsqltext(string), and utility functions in sqlparsing
|
Use the preprocessor scripts in the same way as Lua scripting programs:
-
Set the session or system variable
SQL_PREPROCESSOR_SCRIPTto the name of the preprocessor script to enable the preprocessing. -
To stop the preprocessing, set the session or system variable
SQL_PREPROCESSOR_SCRIPTtoNULL. -
To drop a preprocessor script, use the existing
DROP SCRIPT <name>statement. -
The required privileges are the same as for Lua scripting programs.
Differences between preprocessing scripts and Lua scripting programs:
-
In the system tables EXA_ALL_SCRIPTS, EXA_DBA_SCRIPTS, and EXA_USER_SCRIPTS, scripts created through the new syntax are listed with the new script type PREPROCESSOR.
-
EXECUTE SCRIPT cannot be used to call preprocessor scripts. For easier development, you can move the implementation to a different script type and import it into the preprocessor script.
SQL parsing library (Lua)
For scripting programs written in Lua, Exasol provides a library with the following parsing capabilities for SQL text manipulation:
| Split into tokens |
|
| Identification of token types |
|
| Normalizing a string |
|
| Finding token sequences |
|
| Access to the SQL text |
|
Details
-
sqlparsing.tokenize(sqlstring) -
Splits an input string into an array of strings which correspond to the tokens recognized by the database compiler.
If you concatenate these tokens, you will get the original input string (including upper/lowercase, line breaks, whitespaces, and so on). Therefore, the equation
table.concat(tokens)==sqlstringis valid.The possible tokens are as follows:
- Valid SQL identifiers, for example,
test.tabor"foo"."bar" - Keywords, for example,
SELECT - String literals, for example,
'abcdef' - Numerical literals without sign, for example,
123e4 - Connected whitespaces corresponding to the SQL standard
- Comments corresponding to the SQL standard (line and block comments)
- Multi-character tokens, for example,
':=','::','||','->','>>','<<','>=','<=','<>','!=','^=' - Single-character tokens, for example,
'+','-','/','*','~'
Copy--/
CREATE SCRIPT example(sql_text) AS
local tokens = sqlparsing.tokenize(sql_text)
for i=1,#tokens do
print(tokens[i])
end
/
EXECUTE SCRIPT example('SELECT dummy FROM dual') WITH OUTPUT; - Valid SQL identifiers, for example,
-
sqlparsing.iscomment(tokenstring) -
Returns whether the input string is a comment token.
-
sqlparsing.iswhitespace(tokenstring) -
Returns whether the input string is a whitespace token.
-
sqlparsing.iswhitespaceorcomment(tokenstring) -
Returns whether the input string is a whitespace or comment token.
This function can be useful for function
find(), because it filters all irrelevant tokens corresponding to the SQL standard. -
sqlparsing.isidentifier(tokenstring) -
Returns whether the input string is an identifier token.
-
sqlparsing.iskeyword(tokenstring) -
Returns whether the input string is a SQL keyword token (for example,
SELECT,FROM,TABLE).The functions
isidentifier()andiskeyword()return both true for non-reserved keywords. Therefore, you can identify non-reserved keywords. -
sqlparsing.isstringliteral(tokenstring) -
Returns whether the input string is a string literal token.
-
sqlparsing.isnumericliteral(tokenstring) -
Returns whether the input string is a numeric literal token.
-
sqlparsing.isany(tokenstring) -
Always returns true.
This can be useful if you want to find any first relevant token (as match function within the method
find()). -
sqlparsing.normalize(tokenstring) -
Returns a normalized string for similar representations (upper/lowercase identifiers), on the basis of the following rules:
- Regular identifiers are transformed into uppercase letters, for example,
dual->DUAL - Keywords are transformed in uppercase letters, for example,
From->FROM - Whitespace-Token of any size are replaced by a single whitespace
- In numerical literals, an optional lowercase "e" is replaced by "E", for example,
1.2e34->1.2E34
- Regular identifiers are transformed into uppercase letters, for example,
-
sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN]) -
Searches in the token list for a directly successive sequence of tokens matched by parameters
match1, ... matchN, starting from positionsstartTokenNr, forward or backward (searchForward), and optionally only within the current level of brackets (searchSameLevel). Tokens that match by the functionignoreFunctionwill be ignored in the search.Only the first occurrence is returned. In case of multiple occurrences,
findcan be called multiple times with a start position that indicates where the search begins.If the searched token sequence is found, then an array of size N is returned (in case of N match elements) whose X-th entry contains the position of the token within the token list which was matched by
matchX. If the token sequence was not found, the function returnsnil.Explanation of
sqlparsing.findparameters:Parameter Description tokenlistList of tokens which is produced by the function
tokenize.startTokenNrNumber of the first token to be considered for the search.
searchForwardDefines whether the search should be applied forward (
true) or backward (false).This affects only the direction by which the search process moves across the list. The match functions always searches forward. This means that if you search the token sequence KEYWORD, IDENTIFIER, within the token list 'select' 'abc' 'from' 'dual', and start from position 3, then 'from' 'dual' will be matched and not 'from' 'abc', even when searching backward. If you start your search at position 2, then the backward search will return 'select' 'abc', and the forward search will return 'from' 'dual'.
searchSameLevelDefines whether the search should be limited to the current level of brackets (
true) or also beyond (false).This option applies only to the match of the first token of the sequence. Subsequent tokens can also be located in more inner bracket levels. That means that the search of the token sequence '=' '(' 'SELECT' is also possible if it is constrained to the current level, although the 'SELECT' is located in the next inner bracket level. This is especially useful for finding the corresponding closing bracket, for example of a subquery.
Example: Search the closing bracket within the token sequence 'SELECT' 't1.x' '+' '(' 'SELECT' 'min' '(' 'y' ')' 'FROM' 't2' ')' 'FROM' 't1' which corresponds to the bracket at position 4:
sqlparsing.find(tokens, 4, true, true, sqlparsing.iswhitespaceorcomment, ')').ignoreFunctionA function of type function(string)->bool is expected. The tokens for which the function
ignoreFunctionreturns true will be ignored by the match functions.This means that you can specify token types that may occur within the sequence without breaking the match. In many cases, the function
iswhitespaceorcommentis useful for that purpose.match1...matchNSpecifies the searched token sequence.
These parameters should either be functions of type function(tokenstring)->bool or simple strings. A token sequence is searched where the first token matches
match1, the second matchesmatch2, and so on, while tokens in between are ignored if functionignoreFunctionreturns true. If a parameter is a string, then the comparisonnormalize(tokenstring)== normalize(matchstring)is applied. -
sqlparsing.getsqltext() -
Returns the current SQL statement text.
This function is only available within the main SQL Preprocessor script.
-
sqlparsing.setsqltext(string) -
Sets the SQL statement text to a new value which will be passed to the database compiler for execution.
This function is only available within the main SQL Preprocessor script.
Best practices
For better management of the SQL preprocessor, consider the following best practices:
-
Extensively test a preprocessor script in your own session before activating it throughout the system, since errors in the preprocessing script may cause SQL statements to stop working.
-
Implement SQL processing by using separate auxiliary scripts that are integrated in one main script, which then acts as a wrapper that hands over the SQL text. For example:
sqlparsing.setsqltext(myscript.preprocess(sqlparsing.getsqltext())).The functionsgetsqltext()andsetsqltext()are only available within the preprocessing, and not in normal script executions. By separating preprocessing into several scripts, you can test the processing on several test SQL constructs (on your own daily SQL history, stored within a table) before activating the main script as a preprocessor script. -
Make sure that all necessary privileges are granted to execute the preprocessor script. We recommend that you start a test with a user without special rights. Otherwise, certain user groups could be blocked from executing any SQL statements.
-
Make the preprocessing as simple as possible. Especially,
query()andpquery()should only be used in exceptional cases if you activate preprocessing globally, since all SQL queries will be decelerated and a parallel access on similar tables increases the risk of transaction conflicts.
Examples - Preprocessor scripts
Python example using SQLGlot
Exasol does not support the non-standard syntax "SELECT TOP N ... ORDER BY ..." used by some other databases, including SQL Server. However, the SQLGlot library which is included in the PYTHON3 script language container can rewrite this syntax into a form that Exasol does support.
-- Setup: Schema
ALTER SESSION SET SQL_PREPROCESSOR_SCRIPT = NULL;
CREATE SCHEMA sql_preprocessor;
OPEN SCHEMA sql_preprocessor;
CREATE TABLE example_users(name VARCHAR(100));
INSERT INTO example_users VALUES 'A', 'B', 'C', 'D', 'E', 'F';
-- Setup: Create a preprocessor script to rewrite TSQL queries into Exasol syntax.
--/
CREATE PYTHON3 PREPROCESSOR SCRIPT sql_preprocessor.python_script AS
import sqlglot
def adapter_call(sql_statement):
translated = sqlglot.transpile(sql_statement, read="tsql", write="exasol")
return translated[0]
/
-- Observed: Syntax error
SELECT TOP 5 * FROM example_users ORDER BY name;
-- Enable the preprocessor script.
ALTER SESSION SET SQL_PREPROCESSOR_SCRIPT = sql_preprocessor.python_script;
-- Observed: Success
SELECT TOP 5 * FROM example_users ORDER BY name;
Simple Java example
--/
CREATE JAVA PREPROCESSOR SCRIPT sql_preprocessor.java_preprocessor AS
class JAVA_PREPROCESSOR {
public static String adapterCall(final ExaMetadata metadata, final String sqlStatement) throws Exception {
// Implement the preprocessing here.
return sqlStatement;
}
}
/
Simple Lua example using the new syntax
--/
CREATE LUA PREPROCESSOR SCRIPT sql_preprocessor.lua_preprocessor AS
local statement = sqlparsing.getsqltext()
-- Implement the preprocessing here.
sqlparsing.setsqltext(statement)
/
Examples - Scripting programs
IF() function
In this example, the IF() function (currently not supported in Exasol) is transformed into an equivalent CASE WHEN
expression.
-- Setup: Schema
CREATE SCHEMA sql_preprocessing;
OPEN SCHEMA sql_preprocessing;
--/
CREATE SCRIPT sql_preprocessing.transformIf() AS
function processIf(sqltext)
while (true) do
local tokens = sqlparsing.tokenize(sqltext)
local ifStart = sqlparsing.find(tokens,
1,
true,
false,
sqlparsing.iswhitespaceorcomment,
'IF',
'(' )
if (ifStart==nil) then
break;
end
local ifEnd = sqlparsing.find(tokens,
ifStart[2],
true,
true,
sqlparsing.iswhitespaceorcomment,
')')
if (ifEnd==nil) then
error("if statement not ended properly")
break;
end
local commas1 = sqlparsing.find(tokens,
ifStart[2]+1,
true,
true,
sqlparsing.iswhitespaceorcomment,
',' )
if (commas1==nil) then
error("invalid if function")
break;
end
local commas2 = sqlparsing.find(tokens,
commas1[1]+1,
true,
true,
sqlparsing.iswhitespaceorcomment,
',' )
if (commas2==nil) then
error("invalid if function")
break;
end
local ifParam1=table.concat(tokens, '', ifStart[2]+1, commas1[1]-1)
local ifParam2=table.concat(tokens, '', commas1[1]+1, commas2[1]-1)
local ifParam3=table.concat(tokens, '', commas2[1]+1, ifEnd[1]-1)
local caseStmt='CASE WHEN ('..ifParam1..') != 0 \
THEN ('..ifParam2..') \
ELSE ('..ifParam3..') END '
sqltext=table.concat(tokens, '',1,
ifStart[1]-1)..caseStmt..table.concat(tokens,
'',
ifEnd[1]+1)
end
return sqltext
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessIf() AS
import( 'sql_preprocessing.transformIf', 'transformIf' )
sqlparsing.setsqltext(
transformIf.processIf(sqlparsing.getsqltext()))
/
-- syntax error, since preprocessor not activated
SELECT IF( 3+4 > 5, 6, 7 ) from dual;
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessIf;
SELECT IF( 3+4 > 5, 6, 7 ) AS col1 FROM dual;
ls command
In this example, Unix command ls is transferred to the database. This command returns either
the list of all objects within a schema or the list of all schemas if no schema is opened. Additionally, you can apply
filters (case insensitive) for example, ls '%name%' to display all objects whose name contains the text 'name'.
--/
CREATE SCRIPT sql_preprocessing.addunixcommands() AS
function processLS(input, tokens, commandPos)
local result = query("SELECT CURRENT_SCHEMA")
local current_schema = result[1][1]
local returnText = ""
local searchCol = ""
if (current_schema==null) then
returnText = "SELECT schema_name FROM exa_schemas WHERE true"
searchCol = "schema_name"
elseif (current_schema=='SYS' or current_schema=='EXA_STATISTICS') then
returnText = "SELECT object_name, object_type FROM exa_syscat \
WHERE schema_name='"..current_schema.."'"
searchCol = "object_name"
else
returnText = "SELECT object_name, object_type FROM exa_all_objects \
WHERE root_type='SCHEMA' \
AND root_name='"..current_schema.."'"
searchCol = "object_name"
end
local addFilters = {}
local lastValid = commandPos
local foundPos = sqlparsing.find(tokens,
lastValid+1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
while (not(foundPos==nil) )
do
local foundToken = tokens[foundPos[1]]
if (sqlparsing.isstringliteral(foundToken)) then
addFilters[#addFilters+1] = "UPPER("..searchCol..") \
LIKE UPPER("..foundToken .. ")"
elseif (not (sqlparsing.normalize(foundToken) == ';')) then
error("only string literals allowed as arguments for ls,\
but found '"..foundToken.."'")
end
lastValid = foundPos[1]
foundPos = sqlparsing.find(tokens,
lastValid+1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
end
if ( #addFilters > 0 ) then
local filterText = table.concat(addFilters, " OR ")
return returnText.." AND ("..filterText..")".." ORDER BY "..searchCol
else
return returnText.." ORDER BY "..searchCol
end
end
function processUnixCommands(input)
local tokens = sqlparsing.tokenize(input)
local findResult = sqlparsing.find(tokens,
1,
true,
false,
sqlparsing.iswhitespaceorcomment,
sqlparsing.isany)
if (findResult==nil) then
return input
end
local command = tokens[findResult[1]]
if (sqlparsing.normalize( command )=='LS') then
return processLS(input, tokens, findResult[1])
end
return input;
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessWithUnixTools() AS
import( 'sql_preprocessing.addunixcommands', 'unixCommands' )
sqlparsing.setsqltext(
unixCommands.processUnixCommands(sqlparsing.getsqltext()));
/
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessWithUnixTools;
OPEN SCHEMA sql_preprocessing;
LS '%unix%';
CLOSE SCHEMA;
LS;
ANY/ALL
ANY and ALL SQL constructs are currently not supported by Exasol. However, using the following script, you can add this functionality.
--/
CREATE SCRIPT sql_preprocessing.transformAnyAll() AS
function rebuildAnyAll(inputsqltext)
local sqltext = inputsqltext;
local tokens = sqlparsing.tokenize(sqltext);
local found = true;
local searchStart = 1;
-- search for sequence >|>=|<|<= ANY|ALL ( SELECT
repeat
local foundPositions =
sqlparsing.find(tokens,
searchStart,
true,
false,
sqlparsing.iswhitespaceorcomment,
function (token)
return (token=='<' or token=='<=' or token=='>'
or token=='>=' or token=='!=' or token=='<>'
or token=='=');
end, -- match <|<=|>|>=|=|!=|<>
function ( token )
local normToken = sqlparsing.normalize(token);
return (normToken=='ANY' or normToken=='SOME'
or normToken=='ALL');
end, -- match ANY|ALL
'(', -- match (
'SELECT' -- match SELECT
);
if (foundPositions==nil) then
found = false;
break;
end
local operatorPos = foundPositions[1];
local anyAllPos = foundPositions[2];
local openBracketPos = foundPositions[3];
searchStart = anyAllPos + 1
foundPositions = sqlparsing.find(tokens,
openBracketPos,
true,
true,
sqlparsing.iswhitespaceorcomment,
')');
if (foundPositions ~= nil) then
local closeBracketPos = foundPositions[1]
local operatorToken = tokens[operatorPos];
local anyOrAll = sqlparsing.normalize(tokens[anyAllPos]);
if (operatorToken=='<' or operatorToken=='<='
or operatorToken=='>' or operatorToken=='>=') then
-- now we have <|<=|>|>= ANY|ALL (SELECT <something> FROM
-- rebuild to <|<=|>|>= (SELECT MIN|MAX(<something>) FROM
local setfunction = 'MIN';
if ( ((anyOrAll=='ANY' or anyOrAll=='SOME') and
(operatorToken=='<' or operatorToken=='<=')
) or
(anyOrAll=='ALL' and (operatorToken=='>'
or operatorToken=='>=')
)
) then
setfunction = 'MAX';
end
tokens[anyAllPos] = '';
tokens[openBracketPos] =
'(SELECT ' .. setfunction .. '(anytab.anycol) FROM (';
tokens[closeBracketPos] = ') as anytab(anycol) )';
elseif (operatorToken=='=' and anyOrAll=='ALL') then
-- special rebuild for = ALL
-- rebuild to=(SELECT CASE WHEN COUNT(DISTINCT <something>)==1
-- THEN FIRST_VALUE(<something>) ELSE NULL END FROM
tokens[anyAllPos] = '';
tokens[openBracketPos] =
'(SELECT CASE WHEN COUNT(DISTINCT anytab.anycol) = 1 \
THEN FIRST_VALUE(anytab.anycol) ELSE NULL END FROM (';
tokens[closeBracketPos] = ') as anytab(anycol) )';
elseif ((operatorToken=='!=' or operatorToken=='<>')
and anyOrAll=='ALL') then
-- special rebuild for != ALL
-- rebuild to NOT IN
tokens[operatorPos] = ' NOT IN '
tokens[anyAllPos] = ''
elseif (operatorToken=='!=' and
(anyOrAll=='ANY' or anyOrAll=='SOME')) then
--special rebuild for != ANY, rebuild to
-- CASE WHEN (SELECT COUNT(DISTINCT <something>) FROM ...) == 1
-- THEN operand != (SELECT FIRST_VALUE(<something>) FROM ...)
-- ELSE operand IS NOT NULL END
--note: This case would normally require to determine the operand
-- which requires full understanding of a value expression
-- in SQL standard which is nearly impossible in
-- preprocessing (and very susceptible to errors)
-- so we evaluate the
-- SELECT COUNT(DISTINCT <something) FROM ...) == 1 here and
-- insert the correct expression
--
-- first preprocess the inner query
local queryText = table.concat(tokens,
'',
openBracketPos,
closeBracketPos)
queryText = rebuildAnyAll( queryText )
-- since the subquery was already processed we can continue
-- searching *after* the SELECT
searchStart = closeBracketPos + 1
local distinctQueryText='SELECT COUNT(DISTINCT anytab.anycol) \
FROM '..queryText..' AS anytab(anycol)'
local success, result = pquery(distinctQueryText)
if (success) then
if (result[1][1] == 1) then
tokens[anyAllPos] ='(SELECT FIRST_VALUE(anytab.anycol) \
FROM '..queryText..' AS anytab(anycol))'
else
tokens[operatorPos] = ' IS NOT NULL '
tokens[anyAllPos] = ''
end
-- clear all tokens of the SELECT
for curTokenNr=openBracketPos,closeBracketPos do
tokens[curTokenNr] = ''
end
end
end
end
until found == false;
return table.concat(tokens);
end
/
--/
CREATE SCRIPT sql_preprocessing.preprocessAnyAll AS
import('sql_preprocessing.transformAnyAll', 'anyallparser');
sqlparsing.setsqltext(
anyallparser.rebuildAnyAll(sqlparsing.getsqltext()))
/
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES 1,2,3,4,5,6,7,8,9,10;
CREATE TABLE t2 (j INT);
INSERT INTO t2 VALUES 5,6,7;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);
-- Error: [0A000] Feature not supported: comparison with quantifier ALL
ALTER SESSION SET sql_preprocessor_script=
sql_preprocessing.preprocessAnyAll;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);