SQL Preprocessor

This article explains how to use the SQL preprocessor in Exasol.

Exasol provides an SQL preprocessor that can preprocess all executed SQL statements. Using the preprocessor, unsupported SQL constructs can be transformed into existing SQL features (see examples below). Additionally, you can introduce syntactic sugar by replacing simple constructs with more complex elements.

By default, the SQL preprocessor is deactivated. Using the statements ALTER SESSION and ALTER SYSTEM, you can define a script for a session or for the whole system that is responsible for the preprocessing of all SQL commands. Before an SQL statement is passed to the actual database compiler, the preprocessor does a text transformation. Within the script, you can get and set the original text and manipulate it by using the auxiliary library (sqlparsing). For details of scripting language, see Scripting.

Be cautious before activating SQL preprocessing for a whole system, since the SQL statements may stop working if preprocessing has been incorrectly implemented.

You can deactivate the preprocessing using the statements ALTER SESSION and ALTER SYSTEM (by setting the parameter SQL_PREPROCESSOR_SCRIPT to the NULL value), as these statements are deliberately excluded from the preprocessing.

For data security reasons, all statements that include passwords are excluded (CREATE USER, ALTER USER, CREATE CONNECTION, ALTER CONNECTION, IMPORT, EXPORT if the IDENTIFIED BY clause was specified).

In the auditing table EXA_DBA_AUDIT_SQL, a separate entry for the execution of the preprocessor script is added (EXECUTE SCRIPT and the original text within a comment). The executed transformed SQL statement is listed in another entry.

Library sqlparsing

Exasol provides a library with parsing capabilities for SQL text manipulation.

Split into tokens
- tokenize()
Identification of token types
- iswhitespace()
- iscomment()
- iswhitespaceorcomment()
- isidentifier()
- iskeyword()
- isstringliteral()
- isnumericliteral()
- isany()
Normalizing a string
- normalize()
Finding token sequences
- find()
Access to the SQL text
- getsqltext()
- setsqltext()

Details

sqlparsing.tokenize(sqlstring)

Splits an input string into an array of strings which correspond to the tokens recognized by the database compiler. If you concatenate these tokens, you will get the original input string (including upper/lowercase, line breaks, whitespaces, and so on). Therefore, the equation table.concat(tokens)==sqlstring is valid.

The possible tokens are as follows:

Valid SQL identifiers, for example, test.tab or "foo"."bar"
Keywords, for example, SELECT
String literals, for example, 'abcdef'
Numerical literals without sign, for example, 123e4
Connected whitespaces corresponding to the SQL standard
Comments corresponding to the SQL standard (line and block comments)
Multi-character tokens, for example, ':=', '::', '||', '->', '>>', '<<', '>=', '<=', '<>', '!=', '^='
Single-character tokens, for example, '+', '-', '/', '*', '~'

Copy

--/
CREATE SCRIPT example(sql_text) AS
    local tokens = sqlparsing.tokenize(sql_text)
    for i=1,#tokens do
        print(tokens[i])  
    end
/
 
EXECUTE SCRIPT example('SELECT dummy FROM dual') WITH OUTPUT;

sqlparsing.iscomment(tokenstring)

Returns whether the input string is a comment token.

sqlparsing.iswhitespace(tokenstring)

Returns whether the input string is a whitespace token.

sqlparsing.iswhitespaceorcomment(tokenstring)

Returns whether the input string is a whitespace or comment token. This function can be useful for function find(), because it filters all irrelevant tokens corresponding to the SQL standard.

sqlparsing.isidentifier(tokenstring)

Returns whether the input string is an identifier token.

sqlparsing.iskeyword(tokenstring)

Returns whether the input string is a SQL keyword token (for example, SELECT, FROM, TABLE). The functions isidentifier() and iskeyword() return both true for non-reserved keywords. Therefore, you can identify non-reserved keywords.

sqlparsing.isstringliteral(tokenstring)

Returns whether the input string is a string literal token.

sqlparsing.isnumericliteral(tokenstring)

Returns whether the input string is a numeric literal token.

sqlparsing.isany(tokenstring)

Always returns true. This can be useful if you want to find any first relevant token (as match function within the method find()).

sqlparsing.normalize(tokenstring)

Returns a normalized string for similar representations (upper/lowercase identifiers), on the basis of the following rules:

Regular identifiers are transformed into uppercase letters, for example, dual -> DUAL
Keywords are transformed in uppercase letters, for example, From -> FROM
Whitespace-Token of any size are replaced by a single whitespace
In numerical literals, an optional lowercase "e" is replaced by "E", for example, 1.2e34 -> 1.2E34

sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN])

Searches in the token list, starting from positions startTokenNr, forward or backward (searchForward), and optionally only within the current level of brackets (searchSameLevel), for the directly successive sequence of tokens which are matched by parameters match1, ... matchN. In that search process, all tokens that match by function ignoreFunction will be not considered by the match functions. Please note that only the first occurrence is returned. In case of multiple occurrences, find can be called multiple times with a start position which indicates where the search begins.

If the searched token sequence is found, then an array of size N is returned (in case of N match elements) whose X-th entry contains the position of the token within the token list which was matched by matchX. If the token sequence was not found, the function returns nil.

The details of the parameters are as follows:

Paramenter	Description
tokenlist	List of tokens which is produced by the function `tokenize`.
startTokenNr	Number of the first token to be considered for the search.
searchForward	Defines whether the search should be applied forward (`true`) or backward (`false`). This affects only the direction by that the search process is moving across the list, but the match functions always search forward. That means that if you search the token sequence KEYWORD, IDENTIFIER, within the token list 'select' 'abc' 'from' 'dual', and start from position 3, then 'from' 'dual' will be matched and not 'from' 'abc', even when searching backward. If you start your search at position 2, then the backward search will return 'select' 'abc', and the forward search will return 'from' 'dual'.
searchSameLevel	Defines whether the search should be limited to the current level of brackets (`true`) or also beyond (`false`). This applies only to the match of the first token of the sequence. Subsequent tokens can also be located in more inner bracket levels. That means that the search of the token sequence '=' '(' 'SELECT' is also possible if it is constrained to the current level, although the 'SELECT' is located in the next inner bracket level. The option `searchSameLevel` is especially useful for finding the corresponding closing bracket, for example, of a subquery. Example: Search the closing bracket within the token sequence 'SELECT' 't1.x' '+' '(' 'SELECT' 'min' '(' 'y' ')' 'FROM' 't2' ')' 'FROM' 't1' which corresponds to the bracket at position 4: `sqlparsing.find(tokens, 4, true, true, sqlparsing. iswhitespaceorcomment, ')')`.
ignoreFunction	A function of type function(string)->bool is expected. The tokens for which the function `ignoreFunction` returns true will be ignored by the match functions. That means, you can specify tokens types which may occur within the sequence without breaking the match. In many cases, the function `iswhitespaceorcomment` is useful for that purpose.
match1...matchN	By `match1..matchN`, the searched token sequence is specified. These parameters should either be functions of type function(tokenstring)->bool or simple strings. A token sequence is searched where the first token matches `match1`, the second matches `match2`, and so on, while tokens in between are ignored if function `ignoreFunction` returns true. If a parameter is a string, then the comparison `normalize(tokenstring)== normalize(matchstring)` is applied.

sqlparsing.getsqltext()

Returns the current SQL statement text.

This function is only available within the main SQL Preprocessor script.

sqlparsing.setsqltext(string)

Sets the SQL statement text to a new value which will be passed to the database compiler for execution.

This function is only available within the main SQL Preprocessor script.

Best Practice

For better management of the SQL Preprocessor, you should consider the followings:

Extensively test a preprocessor script in your own session before activating it throughout the system.
Implement the SQL processing by using separate auxiliary scripts and integrated in one main script which is just a wrapper to hand over the SQL text (for example. sqlparsing.setsqltext(myscript.preprocess(sqlparsing.getsqltext()))). This is because the functions getsqltext() and setsqltext() are only available within the preprocessing and not in normal script executions. By the separation you can test the processing on several test SQL constructs (on your own daily SQL history stored within a table) before activating the main script as preprocessor script.
Ensure that all necessary privileges are granted to execute the preprocessor script. It is recommended that you start a test with a user without special rights. Otherwise, certain user groups could be blocked from executing any SQL statements.
Make the preprocessing as simple as possible. Especially, query() and pquery() should only be used in exceptional cases if you activate the preprocessing globally, because all SQL queries will be decelerated, and a parallel access on similar tables increases the risk of transaction conflicts.

Examples

This section shows some examples for Preprocessor scripts that explain the functionality.

IF() function

In this example, the IF() function (currently not supported in Exasol) is transformed into an equivalent CASE WHEN expression.

Copy

--/
CREATE SCRIPT sql_preprocessing.transformIf() AS
   function processIf(sqltext)
    while (true) do
    local tokens = sqlparsing.tokenize(sqltext)
    local ifStart = sqlparsing.find(tokens,
                    1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    'IF',
                    '(' )
    if (ifStart==nil) then
      break;
    end
    local ifEnd = sqlparsing.find(tokens,
                ifStart[2],
                true,
                true,
                sqlparsing.iswhitespaceorcomment,
                ')')
    if (ifEnd==nil) then
      error("if statement not ended properly")
      break;
    end
    local commas1 = sqlparsing.find(tokens,
                    ifStart[2]+1,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ',' )
    if (commas1==nil) then
      error("invalid if function")
      break;
    end
    local commas2 = sqlparsing.find(tokens,
                    commas1[1]+1,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ',' )
    if (commas2==nil) then
      error("invalid if function")
      break;
    end
    local ifParam1=table.concat(tokens, '', ifStart[2]+1, commas1[1]-1)
    local ifParam2=table.concat(tokens, '', commas1[1]+1, commas2[1]-1)
    local ifParam3=table.concat(tokens, '', commas2[1]+1, ifEnd[1]-1)
    local caseStmt='CASE WHEN ('..ifParam1..') != 0 \
        THEN ('..ifParam2..') \
        ELSE ('..ifParam3..') END '
    sqltext=table.concat(tokens, '',1,
        ifStart[1]-1)..caseStmt..table.concat(tokens,
                    '',
                    ifEnd[1]+1)
    end
    return sqltext
end
/

Copy

--/
CREATE SCRIPT sql_preprocessing.preprocessIf() AS
   import( 'sql_preprocessing.transformIf', 'transformIf' )
   sqlparsing.setsqltext(
    transformIf.processIf(sqlparsing.getsqltext()))
/
 
SELECT IF( 3+4 > 5, 6, 7 ) from dual;

Copy

ALTER SESSION SET sql_preprocessor_script=
           sql_preprocessing.preprocessIf;
SELECT IF( 3+4 > 5, 6, 7 ) AS col1 FROM dual;

ls command

In this example, Unix command ls is transferred to the database. This command returns either the list of all objects within a schema or the list of all schemas if no schema is opened. Additionally, you can apply filters (case insensitive) for example, ls '%name%' to display all objects whose name contains the text 'name'.

Copy

--/
CREATE SCRIPT sql_preprocessing.addunixcommands() AS
  function processLS(input, tokens, commandPos)
    local result = query("SELECT CURRENT_SCHEMA")
    local current_schema = result[1][1]
    local returnText = ""
    local searchCol = ""
    if (current_schema==null) then
       returnText = "SELECT schema_name FROM exa_schemas WHERE true"
       searchCol = "schema_name"
    elseif (current_schema=='SYS' or current_schema=='EXA_STATISTICS') then
       returnText = "SELECT object_name, object_type FROM exa_syscat \
             WHERE schema_name='"..current_schema.."'"
       searchCol = "object_name"
    else
       returnText = "SELECT object_name, object_type FROM exa_all_objects \
             WHERE root_type='SCHEMA' \
                  AND root_name='"..current_schema.."'"
       searchCol = "object_name"
    end
    local addFilters = {}
    local lastValid = commandPos
    local foundPos = sqlparsing.find(tokens,
                    lastValid+1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    sqlparsing.isany)
    while (not(foundPos==nil) )
    do
       local foundToken = tokens[foundPos[1]]
       if (sqlparsing.isstringliteral(foundToken)) then
        addFilters[#addFilters+1] = "UPPER("..searchCol..") \
                          LIKE UPPER("..foundToken .. ")"
       elseif (not (sqlparsing.normalize(foundToken) == ';')) then
          error("only string literals allowed as arguments for ls,\
           but found '"..foundToken.."'")
       end
       lastValid = foundPos[1]
       foundPos = sqlparsing.find(tokens,
                      lastValid+1,
                      true,
                      false,
                      sqlparsing.iswhitespaceorcomment,
                      sqlparsing.isany)    
      end
      if ( #addFilters > 0 ) then
       local filterText = table.concat(addFilters, " OR ")
       return returnText.." AND ("..filterText..")".." ORDER BY "..searchCol
      else
       return returnText.." ORDER BY "..searchCol
      end
  end
 
  function processUnixCommands(input)
     local tokens = sqlparsing.tokenize(input)
     local findResult = sqlparsing.find(tokens,
                    1,
                    true,
                    false,
                    sqlparsing.iswhitespaceorcomment,
                    sqlparsing.isany)
     if (findResult==nil) then
    return input
     end
     local command = tokens[findResult[1]]
     if (sqlparsing.normalize( command )=='LS') then
    return processLS(input, tokens, findResult[1])
     end    
     return input;
end
/

Copy

--/
CREATE SCRIPT sql_preprocessing.preprocessWithUnixTools() AS
   import( 'sql_preprocessing.addunixcommands', 'unixCommands' )
   sqlparsing.setsqltext(
    unixCommands.processUnixCommands(sqlparsing.getsqltext()));
/
 
ALTER SESSION SET sql_preprocessor_script=
        sql_preprocessing.preprocessWithUnixTools;

OPEN SCHEMA sql_preprocessing;
LS '%unix%';

CLOSE SCHEMA;
LS;

ANY/ALL

ANY and ALL SQL constructs are currently not supported by Exasol. However, using the following script, you can add this functionality.

Copy

--/
CREATE SCRIPT sql_preprocessing.transformAnyAll() AS
    function rebuildAnyAll(inputsqltext)
    local sqltext = inputsqltext;
    local tokens = sqlparsing.tokenize(sqltext);
    local found = true;
    local searchStart = 1;
    -- search for sequence >|>=|<|<= ANY|ALL ( SELECT
    repeat
        local foundPositions =
        sqlparsing.find(tokens,
            searchStart,
            true,
            false,
            sqlparsing.iswhitespaceorcomment,
            function (token)
              return (token=='<' or token=='<=' or token=='>'
            or token=='>=' or token=='!=' or token=='<>'
            or token=='=');
            end, -- match <|<=|>|>=|=|!=|<>
             function ( token )
              local normToken = sqlparsing.normalize(token);
              return (normToken=='ANY' or normToken=='SOME'
            or normToken=='ALL');
           end, -- match ANY|ALL
           '(', -- match (
           'SELECT' -- match SELECT
        );
    if (foundPositions==nil) then
        found = false;
        break;
    end
    local operatorPos = foundPositions[1];
    local anyAllPos = foundPositions[2];
    local openBracketPos = foundPositions[3];
    searchStart = anyAllPos + 1
    foundPositions = sqlparsing.find(tokens,
                    openBracketPos,
                    true,
                    true,
                    sqlparsing.iswhitespaceorcomment,
                    ')');
    if (foundPositions ~= nil) then
       local closeBracketPos = foundPositions[1]
       local operatorToken = tokens[operatorPos];
       local anyOrAll = sqlparsing.normalize(tokens[anyAllPos]);
       if (operatorToken=='<' or operatorToken=='<='    
        or operatorToken=='>' or operatorToken=='>=') then
        -- now we have <|<=|>|>= ANY|ALL (SELECT <something> FROM
        -- rebuild to <|<=|>|>= (SELECT MIN|MAX(<something>) FROM
        local setfunction = 'MIN';
        if ( ((anyOrAll=='ANY' or anyOrAll=='SOME') and
              (operatorToken=='<' or operatorToken=='<=')
             ) or
              (anyOrAll=='ALL' and (operatorToken=='>'
                         or operatorToken=='>=')
             )
        ) then
        setfunction = 'MAX';        
       end
       tokens[anyAllPos] = '';
       tokens[openBracketPos] =
        '(SELECT ' .. setfunction .. '(anytab.anycol) FROM (';
       tokens[closeBracketPos] = ') as anytab(anycol) )';
       elseif (operatorToken=='=' and anyOrAll=='ALL') then
       -- special rebuild for = ALL
       -- rebuild to=(SELECT CASE WHEN COUNT(DISTINCT <something>)==1
       --         THEN FIRST_VALUE(<something>) ELSE NULL END FROM
       tokens[anyAllPos] = '';
       tokens[openBracketPos] =
        '(SELECT CASE WHEN COUNT(DISTINCT anytab.anycol) = 1 \
        THEN FIRST_VALUE(anytab.anycol) ELSE NULL END FROM (';     
       tokens[closeBracketPos] = ') as anytab(anycol) )';
    elseif ((operatorToken=='!=' or operatorToken=='<>')
        and anyOrAll=='ALL') then    
       -- special rebuild for != ALL
       -- rebuild to NOT IN
       tokens[operatorPos] = ' NOT IN '
       tokens[anyAllPos] = ''
       elseif (operatorToken=='!=' and
        (anyOrAll=='ANY' or anyOrAll=='SOME')) then
      --special rebuild for != ANY, rebuild to
      -- CASE WHEN (SELECT COUNT(DISTINCT <something>) FROM ...) == 1
      -- THEN operand != (SELECT FIRST_VALUE(<something>) FROM ...)
      -- ELSE operand IS NOT NULL END
      --note: This case would normally require to determine the operand
      --       which requires full understanding of a value expression
      --       in SQL standard which is nearly impossible in
      --       preprocessing (and very susceptible to errors)
      --       so we evaluate the
      --       SELECT COUNT(DISTINCT <something) FROM ...) == 1 here and
      --       insert the correct expression
      --
      -- first preprocess the inner query
      local queryText = table.concat(tokens,
                      '',
                            openBracketPos,
                      closeBracketPos)
      queryText = rebuildAnyAll( queryText )
      -- since the subquery was already processed we can continue
      -- searching *after* the SELECT
      searchStart = closeBracketPos + 1
      local distinctQueryText='SELECT COUNT(DISTINCT anytab.anycol) \
                     FROM '..queryText..' AS anytab(anycol)'
      local success, result = pquery(distinctQueryText)
      if (success) then
        if (result[1][1] == 1) then
           tokens[anyAllPos] ='(SELECT FIRST_VALUE(anytab.anycol) \
                    FROM '..queryText..' AS anytab(anycol))'
        else
           tokens[operatorPos] = ' IS NOT NULL '
           tokens[anyAllPos] = ''
        end
        -- clear all tokens of the SELECT
        for curTokenNr=openBracketPos,closeBracketPos do
          tokens[curTokenNr] = ''
        end    
          end
       end
       end
   until found == false;
   return table.concat(tokens);
 end
/

Copy

--/
CREATE SCRIPT sql_preprocessing.preprocessAnyAll AS
  import('sql_preprocessing.transformAnyAll', 'anyallparser');
  sqlparsing.setsqltext(
    anyallparser.rebuildAnyAll(sqlparsing.getsqltext()))
/
 
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES 1,2,3,4,5,6,7,8,9,10;
CREATE TABLE t2 (j INT);
INSERT INTO t2 VALUES 5,6,7;

SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);
Error: [0A000] Feature not supported: comparison with quantifier ALL

ALTER SESSION SET sql_preprocessor_script=
           sql_preprocessing.preprocessAnyAll;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);

SQL Preprocessor

Library sqlparsing

Details

sqlparsing.tokenize(sqlstring)

sqlparsing.iscomment(tokenstring)

sqlparsing.iswhitespace(tokenstring)

sqlparsing.iswhitespaceorcomment(tokenstring)

sqlparsing.isidentifier(tokenstring)

sqlparsing.iskeyword(tokenstring)

sqlparsing.isstringliteral(tokenstring)

sqlparsing.isnumericliteral(tokenstring)

sqlparsing.isany(tokenstring)

sqlparsing.normalize(tokenstring)

sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN])

sqlparsing.getsqltext()

sqlparsing.setsqltext(string)

Best Practice

Examples

IF() function

ls command

ANY/ALL

PRODUCT

RESOURCES