SQL Preprocessor

This article explains how to use the SQL preprocessor in Exasol.

Introduction

Exasol provides a SQL preprocessor that can preprocess all executed SQL statements. By using the preprocessor, unsupported SQL constructs can be transformed into existing SQL features (see examples below). Additionally, you can introduce syntactic sugar by replacing simple constructs with more complex elements.

In Exasol 2025.1.5 and later, preprocessing can be implemented using SQL preprocessing scripts written in the same languages that support virtual schema adapter scripts, which currently includes Python 3 and Java, or using scripting programs written in Lua. Earlier versions of Exasol only support preprocessing with Lua scripting programs.

When using scripting programs written in Lua, you can get and set the original text and manipulate it within the script by using the auxiliary library sqlparsing. To learn more about the Lua scripting language, see Scripting.

The SQL preprocessor is deactivated by default. By using the statements ALTER SESSION and ALTER SYSTEM you can define a script for a session or for the whole system that is responsible for the preprocessing of all SQL commands. Before an SQL statement is passed to the actual database compiler, the preprocessor does a text transformation.

Make sure that preprocessing is correctly implemented before activating it for a whole system, since incorrect implementation of preprocessing may cause SQL statements to stop working.

You can deactivate preprocessing using the statements ALTER SESSION and ALTER SYSTEM (by setting the parameter SQL_PREPROCESSOR_SCRIPT to the NULL value), since these statements are deliberately excluded from the preprocessing.

For data security reasons, all statements that include passwords are excluded from preprocessing: CREATE USER, ALTER USER, CREATE CONNECTION, ALTER CONNECTION, IMPORT, and EXPORT if the IDENTIFIED BY clause was specified.

In the auditing table EXA_DBA_AUDIT_SQL, a separate entry for the execution of the preprocessor script is added (EXECUTE SCRIPT and the original text within a comment). The executed transformed SQL statement is listed in another entry.

Preprocessing scripts

In Exasol 2025.1.5 and later, preprocessing can be implemented using SQL preprocessing scripts written in the languages that support virtual schema adapter scripts, which currently includes Python 3 and Java. Earlier versions of Exasol only support preprocessing with Lua scripting programs.

A preprocessor script is defined using the syntax CREATE <language> PREPROCESSOR SCRIPT <name> AS .... For Lua, the existing syntax CREATE LUA SCRIPT is also supported for backwards compatibility.

For languages other than LUA, the preprocessor script must implement a callback function. The callback functions will receive the original SQL statement as a string parameter and must return the preprocessed SQL statement. For more details, see Preprocessor script callback function (Java) and Preprocessor script callback (Python).

Language Preprocessor script API
Java String adapterCall(final ExaMetadata metadata, final String sqlStatement)
Python 3 adapter_call(sql_statement)
Lua sqlparsing.getsqltext(), sqlparsing.setsqltext(string), and utility functions in sqlparsing

Use the preprocessor scripts in the same way as Lua scripting programs:

  • Set the session or system variable SQL_PREPROCESSOR_SCRIPT to the name of the preprocessor script to enable the preprocessing.

  • To stop the preprocessing, set the session or system variable SQL_PREPROCESSOR_SCRIPT to NULL.

  • To drop a preprocessor script, use the existing DROP SCRIPT <name> statement.

  • The required privileges are the same as for Lua scripting programs.

Differences between preprocessing scripts and Lua scripting programs:

  • In the system tables EXA_ALL_SCRIPTS, EXA_DBA_SCRIPTS, and EXA_USER_SCRIPTS, scripts created through the new syntax are listed with the new script type PREPROCESSOR.

  • EXECUTE SCRIPT cannot be used to call preprocessor scripts. For easier development, you can move the implementation to a different script type and import it into the preprocessor script.

SQL parsing library (Lua)

For scripting programs written in Lua, Exasol provides a library with the following parsing capabilities for SQL text manipulation:

Split into tokens

tokenize()

Identification of token types

iswhitespace()

iscomment()

iswhitespaceorcomment()

isidentifier()

iskeyword()

isstringliteral()

isnumericliteral()

isany()

Normalizing a string

normalize()

Finding token sequences

find()

Access to the SQL text

getsqltext()

setsqltext()

Details

sqlparsing.tokenize(sqlstring)

Splits an input string into an array of strings which correspond to the tokens recognized by the database compiler.

If you concatenate these tokens, you will get the original input string (including upper/lowercase, line breaks, whitespaces, and so on). Therefore, the equation table.concat(tokens)==sqlstring is valid.

The possible tokens are as follows:

  • Valid SQL identifiers, for example, test.tab or "foo"."bar"
  • Keywords, for example, SELECT
  • String literals, for example, 'abcdef'
  • Numerical literals without sign, for example, 123e4
  • Connected whitespaces corresponding to the SQL standard
  • Comments corresponding to the SQL standard (line and block comments)
  • Multi-character tokens, for example, ':=', '::', '||', '->', '>>', '<<', '>=', '<=', '<>', '!=', '^='
  • Single-character tokens, for example, '+', '-', '/', '*', '~'
Copy
--/
CREATE SCRIPT example(sql_text) AS
    local tokens = sqlparsing.tokenize(sql_text)
    for i=1,#tokens do
        print(tokens[i])  
    end
/
 
EXECUTE SCRIPT example('SELECT dummy FROM dual') WITH OUTPUT;

sqlparsing.iscomment(tokenstring)

Returns whether the input string is a comment token.

sqlparsing.iswhitespace(tokenstring)

Returns whether the input string is a whitespace token.

sqlparsing.iswhitespaceorcomment(tokenstring)

Returns whether the input string is a whitespace or comment token.

This function can be useful for function find(), because it filters all irrelevant tokens corresponding to the SQL standard.

sqlparsing.isidentifier(tokenstring)

Returns whether the input string is an identifier token.

sqlparsing.iskeyword(tokenstring)

Returns whether the input string is a SQL keyword token (for example, SELECT, FROM, TABLE).

The functions isidentifier() and iskeyword() return both true for non-reserved keywords. Therefore, you can identify non-reserved keywords.

sqlparsing.isstringliteral(tokenstring)

Returns whether the input string is a string literal token.

sqlparsing.isnumericliteral(tokenstring)

Returns whether the input string is a numeric literal token.

sqlparsing.isany(tokenstring)

Always returns true.

This can be useful if you want to find any first relevant token (as match function within the method find()).

sqlparsing.normalize(tokenstring)

Returns a normalized string for similar representations (upper/lowercase identifiers), on the basis of the following rules:

  • Regular identifiers are transformed into uppercase letters, for example, dual -> DUAL
  • Keywords are transformed in uppercase letters, for example, From -> FROM
  • Whitespace-Token of any size are replaced by a single whitespace
  • In numerical literals, an optional lowercase "e" is replaced by "E", for example, 1.2e34 -> 1.2E34

sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN])

Searches in the token list for a directly successive sequence of tokens matched by parameters match1, ... matchN, starting from positions startTokenNr, forward or backward (searchForward), and optionally only within the current level of brackets (searchSameLevel). Tokens that match by the function ignoreFunction will be ignored in the search.

Only the first occurrence is returned. In case of multiple occurrences, find can be called multiple times with a start position that indicates where the search begins.

If the searched token sequence is found, then an array of size N is returned (in case of N match elements) whose X-th entry contains the position of the token within the token list which was matched by matchX. If the token sequence was not found, the function returns nil.

Explanation of sqlparsing.find parameters:

Parameter Description

tokenlist

List of tokens which is produced by the function tokenize.

startTokenNr

Number of the first token to be considered for the search.

searchForward

Defines whether the search should be applied forward (true) or backward (false).

This affects only the direction by which the search process moves across the list. The match functions always searches forward. This means that if you search the token sequence KEYWORD, IDENTIFIER, within the token list 'select' 'abc' 'from' 'dual', and start from position 3, then 'from' 'dual' will be matched and not 'from' 'abc', even when searching backward. If you start your search at position 2, then the backward search will return 'select' 'abc', and the forward search will return 'from' 'dual'.

searchSameLevel

Defines whether the search should be limited to the current level of brackets (true) or also beyond (false).

This option applies only to the match of the first token of the sequence. Subsequent tokens can also be located in more inner bracket levels. That means that the search of the token sequence '=' '(' 'SELECT' is also possible if it is constrained to the current level, although the 'SELECT' is located in the next inner bracket level. This is especially useful for finding the corresponding closing bracket, for example of a subquery.

Example: Search the closing bracket within the token sequence 'SELECT' 't1.x' '+' '(' 'SELECT' 'min' '(' 'y' ')' 'FROM' 't2' ')' 'FROM' 't1' which corresponds to the bracket at position 4:

sqlparsing.find(tokens, 4, true, true, sqlparsing.iswhitespaceorcomment, ')').

ignoreFunction

A function of type function(string)->bool is expected. The tokens for which the function ignoreFunction returns true will be ignored by the match functions.

This means that you can specify token types that may occur within the sequence without breaking the match. In many cases, the function iswhitespaceorcomment is useful for that purpose.

match1...matchN

Specifies the searched token sequence.

These parameters should either be functions of type function(tokenstring)->bool or simple strings. A token sequence is searched where the first token matches match1, the second matches match2, and so on, while tokens in between are ignored if function ignoreFunction returns true. If a parameter is a string, then the comparison normalize(tokenstring)== normalize(matchstring) is applied.

sqlparsing.getsqltext()

Returns the current SQL statement text.

This function is only available within the main SQL Preprocessor script.

sqlparsing.setsqltext(string)

Sets the SQL statement text to a new value which will be passed to the database compiler for execution.

This function is only available within the main SQL Preprocessor script.

Best practices

For better management of the SQL preprocessor, consider the following best practices:

  • Extensively test a preprocessor script in your own session before activating it throughout the system, since errors in the preprocessing script may cause SQL statements to stop working.

  • Implement SQL processing by using separate auxiliary scripts that are integrated in one main script, which then acts as a wrapper that hands over the SQL text. For example: sqlparsing.setsqltext(myscript.preprocess(sqlparsing.getsqltext())).The functions getsqltext() and setsqltext() are only available within the preprocessing, and not in normal script executions. By separating preprocessing into several scripts, you can test the processing on several test SQL constructs (on your own daily SQL history, stored within a table) before activating the main script as a preprocessor script.

  • Make sure that all necessary privileges are granted to execute the preprocessor script. We recommend that you start a test with a user without special rights. Otherwise, certain user groups could be blocked from executing any SQL statements.

  • Make the preprocessing as simple as possible. Especially, query() and pquery() should only be used in exceptional cases if you activate preprocessing globally, since all SQL queries will be decelerated and a parallel access on similar tables increases the risk of transaction conflicts.

Examples - Preprocessor scripts

Python example using SQLGlot

Exasol does not support the non-standard syntax "SELECT TOP N ... ORDER BY ..." used by some other databases, including SQL Server. However, the SQLGlot library which is included in the PYTHON3 script language container can rewrite this syntax into a form that Exasol does support.

Copy
-- Setup: Schema
ALTER SESSION SET SQL_PREPROCESSOR_SCRIPT = NULL;
CREATE SCHEMA sql_preprocessor;
OPEN SCHEMA sql_preprocessor;
CREATE TABLE example_users(name VARCHAR(100));
INSERT INTO example_users VALUES 'A', 'B', 'C', 'D', 'E', 'F';

-- Setup: Create a preprocessor script to rewrite TSQL queries into Exasol syntax.
--/
CREATE PYTHON3 PREPROCESSOR SCRIPT sql_preprocessor.python_script AS    
import sqlglot
def adapter_call(sql_statement):
    translated = sqlglot.transpile(sql_statement, read="tsql", write="exasol")
    return translated[0]
/

-- Observed: Syntax error
SELECT TOP 5 * FROM example_users ORDER BY name;

-- Enable the preprocessor script.
ALTER SESSION SET SQL_PREPROCESSOR_SCRIPT = sql_preprocessor.python_script;

-- Observed: Success
SELECT TOP 5 * FROM example_users ORDER BY name;
Simple Java example
Copy
--/
CREATE JAVA PREPROCESSOR SCRIPT sql_preprocessor.java_preprocessor AS
class JAVA_PREPROCESSOR {
    public static String adapterCall(final ExaMetadata metadata, final String sqlStatement) throws Exception {
        // Implement the preprocessing here.
        return sqlStatement;
    }
}
/
Simple Lua example using the new syntax
Copy
--/
CREATE LUA PREPROCESSOR SCRIPT sql_preprocessor.lua_preprocessor AS
    local statement = sqlparsing.getsqltext()
    -- Implement the preprocessing here.
    sqlparsing.setsqltext(statement)
/

Examples - Scripting programs

IF() function

In this example, the IF() function (currently not supported in Exasol) is transformed into an equivalent CASE WHEN expression.

Copy
-- Setup: Schema
CREATE SCHEMA sql_preprocessing;
OPEN SCHEMA sql_preprocessing;

--/
CREATE SCRIPT sql_preprocessing.transformIf() AS
   function processIf(sqltext)
    while (true) do
    local tokens = sqlparsing.tokenize(sqltext)
    local ifStart = sqlparsing.find(tokens,
                    1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    'IF',
                    '(' )
    if (ifStart==nil) then
      break;
    end
    local ifEnd = sqlparsing.find(tokens,
                ifStart[2],
                true,
                true,
                sqlparsing.iswhitespaceorcomment,
                ')')
    if (ifEnd==nil) then
      error("if statement not ended properly")
      break;
    end
    local commas1 = sqlparsing.find(tokens,
                    ifStart[2]+1,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ',' )
    if (commas1==nil) then
      error("invalid if function")
      break;
    end
    local commas2 = sqlparsing.find(tokens,
                    commas1[1]+1,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ',' )
    if (commas2==nil) then
      error("invalid if function")
      break;
    end
    local ifParam1=table.concat(tokens, '', ifStart[2]+1, commas1[1]-1)
    local ifParam2=table.concat(tokens, '', commas1[1]+1, commas2[1]-1)
    local ifParam3=table.concat(tokens, '', commas2[1]+1, ifEnd[1]-1)
    local caseStmt='CASE WHEN ('..ifParam1..') != 0 \
        THEN ('..ifParam2..') \
        ELSE ('..ifParam3..') END '
    sqltext=table.concat(tokens, '',1,
        ifStart[1]-1)..caseStmt..table.concat(tokens,
                    '',
                    ifEnd[1]+1)
    end
    return sqltext
end
/ 
Copy
--/
CREATE SCRIPT sql_preprocessing.preprocessIf() AS
   import( 'sql_preprocessing.transformIf', 'transformIf' )
   sqlparsing.setsqltext(
    transformIf.processIf(sqlparsing.getsqltext()))
/

-- syntax error, since preprocessor not activated
SELECT IF( 3+4 > 5, 6, 7 ) from dual;
Copy
ALTER SESSION SET sql_preprocessor_script=
           sql_preprocessing.preprocessIf;
SELECT IF( 3+4 > 5, 6, 7 ) AS col1 FROM dual;
ls command

In this example, Unix command ls is transferred to the database. This command returns either the list of all objects within a schema or the list of all schemas if no schema is opened. Additionally, you can apply filters (case insensitive) for example, ls '%name%' to display all objects whose name contains the text 'name'.

Copy
--/
CREATE SCRIPT sql_preprocessing.addunixcommands() AS
  function processLS(input, tokens, commandPos)
    local result = query("SELECT CURRENT_SCHEMA")
    local current_schema = result[1][1]
    local returnText = ""
    local searchCol = ""
    if (current_schema==null) then
       returnText = "SELECT schema_name FROM exa_schemas WHERE true"
       searchCol = "schema_name"
    elseif (current_schema=='SYS' or current_schema=='EXA_STATISTICS') then
       returnText = "SELECT object_name, object_type FROM exa_syscat \
             WHERE schema_name='"..current_schema.."'"
       searchCol = "object_name"
    else
       returnText = "SELECT object_name, object_type FROM exa_all_objects \
             WHERE root_type='SCHEMA' \
                  AND root_name='"..current_schema.."'"
       searchCol = "object_name"
    end
    local addFilters = {}
    local lastValid = commandPos
    local foundPos = sqlparsing.find(tokens,
                    lastValid+1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    sqlparsing.isany)
    while (not(foundPos==nil) )
    do
       local foundToken = tokens[foundPos[1]]
       if (sqlparsing.isstringliteral(foundToken)) then
        addFilters[#addFilters+1] = "UPPER("..searchCol..") \
                          LIKE UPPER("..foundToken .. ")"
       elseif (not (sqlparsing.normalize(foundToken) == ';')) then
          error("only string literals allowed as arguments for ls,\
           but found '"..foundToken.."'")
       end
       lastValid = foundPos[1]
       foundPos = sqlparsing.find(tokens,
                      lastValid+1,
                      true,
                      false,
                      sqlparsing.iswhitespaceorcomment,
                      sqlparsing.isany)    
      end
      if ( #addFilters > 0 ) then
       local filterText = table.concat(addFilters, " OR ")
       return returnText.." AND ("..filterText..")".." ORDER BY "..searchCol
      else
       return returnText.." ORDER BY "..searchCol
      end
  end
 
  function processUnixCommands(input)
     local tokens = sqlparsing.tokenize(input)
     local findResult = sqlparsing.find(tokens,
                    1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    sqlparsing.isany)
     if (findResult==nil) then
    return input
     end
     local command = tokens[findResult[1]]
     if (sqlparsing.normalize( command )=='LS') then
    return processLS(input, tokens, findResult[1])
     end    
     return input;
end
/
Copy
--/
CREATE SCRIPT sql_preprocessing.preprocessWithUnixTools() AS
   import( 'sql_preprocessing.addunixcommands', 'unixCommands' )
   sqlparsing.setsqltext(
    unixCommands.processUnixCommands(sqlparsing.getsqltext()));
/
 
ALTER SESSION SET sql_preprocessor_script=
        sql_preprocessing.preprocessWithUnixTools;

OPEN SCHEMA sql_preprocessing;
LS '%unix%';

CLOSE SCHEMA;
LS;
ANY/ALL

ANY and ALL SQL constructs are currently not supported by Exasol. However, using the following script, you can add this functionality.

Copy
--/
CREATE SCRIPT sql_preprocessing.transformAnyAll() AS
    function rebuildAnyAll(inputsqltext)
    local sqltext = inputsqltext;
    local tokens = sqlparsing.tokenize(sqltext);
    local found = true;
    local searchStart = 1;
    -- search for sequence >|>=|<|<= ANY|ALL ( SELECT
    repeat
        local foundPositions =
        sqlparsing.find(tokens,
            searchStart,
            true,
            false,
            sqlparsing.iswhitespaceorcomment,
            function (token)
              return (token=='<' or token=='<=' or token=='>'
            or token=='>=' or token=='!=' or token=='<>'
            or token=='=');
            end, -- match <|<=|>|>=|=|!=|<>
             function ( token )
              local normToken = sqlparsing.normalize(token);
              return (normToken=='ANY' or normToken=='SOME'
            or normToken=='ALL');
           end, -- match ANY|ALL
           '(', -- match (
           'SELECT' -- match SELECT
        );
    if (foundPositions==nil) then
        found = false;
        break;
    end
    local operatorPos = foundPositions[1];
    local anyAllPos = foundPositions[2];
    local openBracketPos = foundPositions[3];
    searchStart = anyAllPos + 1
    foundPositions = sqlparsing.find(tokens,
                    openBracketPos,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ')');
    if (foundPositions ~= nil) then
       local closeBracketPos = foundPositions[1]
       local operatorToken = tokens[operatorPos];
       local anyOrAll = sqlparsing.normalize(tokens[anyAllPos]);
       if (operatorToken=='<' or operatorToken=='<='    
        or operatorToken=='>' or operatorToken=='>=') then
        -- now we have <|<=|>|>= ANY|ALL (SELECT <something> FROM
        -- rebuild to <|<=|>|>= (SELECT MIN|MAX(<something>) FROM
        local setfunction = 'MIN';
        if ( ((anyOrAll=='ANY' or anyOrAll=='SOME') and
              (operatorToken=='<' or operatorToken=='<=')
             ) or
              (anyOrAll=='ALL' and (operatorToken=='>'
                         or operatorToken=='>=')
             )
        ) then
        setfunction = 'MAX';        
       end
       tokens[anyAllPos] = '';
       tokens[openBracketPos] =
        '(SELECT ' .. setfunction .. '(anytab.anycol) FROM (';
       tokens[closeBracketPos] = ') as anytab(anycol) )';
       elseif (operatorToken=='=' and anyOrAll=='ALL') then
       -- special rebuild for = ALL
       -- rebuild to=(SELECT CASE WHEN COUNT(DISTINCT <something>)==1
       --         THEN FIRST_VALUE(<something>) ELSE NULL END FROM
       tokens[anyAllPos] = '';
       tokens[openBracketPos] =
        '(SELECT CASE WHEN COUNT(DISTINCT anytab.anycol) = 1 \
        THEN FIRST_VALUE(anytab.anycol) ELSE NULL END FROM (';     
       tokens[closeBracketPos] = ') as anytab(anycol) )';
    elseif ((operatorToken=='!=' or operatorToken=='<>')
        and anyOrAll=='ALL') then    
       -- special rebuild for != ALL
       -- rebuild to NOT IN
       tokens[operatorPos] = ' NOT IN '
       tokens[anyAllPos] = ''
       elseif (operatorToken=='!=' and
        (anyOrAll=='ANY' or anyOrAll=='SOME')) then
      --special rebuild for != ANY, rebuild to
      -- CASE WHEN (SELECT COUNT(DISTINCT <something>) FROM ...) == 1
      -- THEN operand != (SELECT FIRST_VALUE(<something>) FROM ...)
      -- ELSE operand IS NOT NULL END
      --note: This case would normally require to determine the operand
      --       which requires full understanding of a value expression
      --       in SQL standard which is nearly impossible in
      --       preprocessing (and very susceptible to errors)
      --       so we evaluate the
      --       SELECT COUNT(DISTINCT <something) FROM ...) == 1 here and
      --       insert the correct expression
      --
      -- first preprocess the inner query
      local queryText = table.concat(tokens,
                      '',
                            openBracketPos,
                      closeBracketPos)
      queryText = rebuildAnyAll( queryText )
      -- since the subquery was already processed we can continue
      -- searching *after* the SELECT
      searchStart = closeBracketPos + 1
      local distinctQueryText='SELECT COUNT(DISTINCT anytab.anycol) \
                     FROM '..queryText..' AS anytab(anycol)'
      local success, result = pquery(distinctQueryText)
      if (success) then
        if (result[1][1] == 1) then
           tokens[anyAllPos] ='(SELECT FIRST_VALUE(anytab.anycol) \
                    FROM '..queryText..' AS anytab(anycol))'
        else
           tokens[operatorPos] = ' IS NOT NULL '
           tokens[anyAllPos] = ''
        end
        -- clear all tokens of the SELECT
        for curTokenNr=openBracketPos,closeBracketPos do
          tokens[curTokenNr] = ''
        end    
          end
       end
       end
   until found == false;
   return table.concat(tokens);
 end
/
Copy
--/
CREATE SCRIPT sql_preprocessing.preprocessAnyAll AS
  import('sql_preprocessing.transformAnyAll', 'anyallparser');
  sqlparsing.setsqltext(
    anyallparser.rebuildAnyAll(sqlparsing.getsqltext()))
/
 
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES 1,2,3,4,5,6,7,8,9,10;
CREATE TABLE t2 (j INT);
INSERT INTO t2 VALUES 5,6,7;

SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);
-- Error: [0A000] Feature not supported: comparison with quantifier ALL

ALTER SESSION SET sql_preprocessor_script=
           sql_preprocessing.preprocessAnyAll;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);