SQL Preprocessor

Exasol provides an SQL Preprocessor that can preprocess all executed SQL statements. Using the preprocessor, unsupported SQL constructs can be transformed into existing SQL features (see examples below). Additionally, you can introduce syntactic sugar by replacing simple constructs with more complex elements.

By default, the SQL Preprocessor is deactivated. Using the statements ALTER SESSION and ALTER SYSTEM, you can define a script for a session or for the whole system that is responsible for the preprocessing of all SQL commands. Before an SQL statement is passed to the actual database compiler, the preprocessor does a text transformation. Within the script, you can get and set the original text and manipulate it by using the auxiliary library (sqlparsing). For details of scripting language, see Scripting.

The SQL Preprocessor provides you flexibility to extend the SQL language of Exasol. However, you should also be cautious before activating such an SQL manipulation throughout the system. The SQL statements may stop working if it is not handled correctly.

You can deactivate the preprocessing using the statements ALTER SESSION and ALTER SYSTEM (by setting the parameter SQL_PREPROCESSOR_SCRIPT to the NULL value), these statements are deliberately excluded from the preprocessing. For data security reasons, all statements which include passwords are excluded (CREATE USER, ALTER USER, CREATE CONNECTION, ALTER CONNECTION, IMPORT, EXPORT if the IDENTIFIED BY clause was specified).

In the auditing table EXA_DBA_AUDIT_SQL, a separate entry for the execution of the preprocessor script is added (EXECUTE SCRIPT and the original text within a comment). The executed transformed SQL statement is listed in another entry.

Library sqlparsing

Exasol provides a library with parsing capabilities for SQL text manipulation.

  • Split into tokens
    • tokenize()
  • Identification of token types
    • iswhitespace()
    • iscomment()
    • iswhitespaceorcomment()
    • isidentifier()
    • iskeyword()
    • isstringliteral()
    • isnumericliteral()
    • isany()
  • Normalizing a string
    • normalize()
  • Finding token sequences
    • find()
  • Access to the SQL text
    • getsqltext()
    • setsqltext()

Details

sqlparsing.tokenize(sqlstring)

Splits an input string into an array of strings which correspond to the tokens recognized by the database compiler. If you concatenate these tokens, you will get the original input string (including upper/lower case, line breaks, whitespaces, and so on). Therefore, the equation table.concat(tokens)==sqlstring is valid.

The possible tokens are as follows:

  • Valid SQL identifiers, for example, test.tab or "foo"."bar"
  • Keywords, for example, SELECT
  • String literals, for example, 'abcdef'
  • Numerical literals without sign, for example, 123e4
  • Connected whitespaces corresponding to the SQL standard
  • Comments corresponding to the SQL standard (line and block comments)
  • Multi-character tokens, for example, ':=', '::', '||', '->', '>>', '<<', '>=', '<=', '<>', '!=', '^='
  • Single-character tokens, for example, '+', '-', '/', '*', '~'
--/
CREATE SCRIPT example(sql_text) AS
    local tokens = sqlparsing.tokenize(sql_text)
    for i=1,#tokens do
        print(tokens[i])  
    end
/
 
EXECUTE SCRIPT example('SELECT dummy FROM dual') WITH OUTPUT;

OUTPUT
---------------------
SELECT
 
dummy
 
FROM
 
dual

sqlparsing.iscomment(tokenstring)

Returns whether the input string is a comment token.

sqlparsing.iswhitespace(tokenstring)

Returns whether the input string is a whitespace token.

sqlparsing.iswhitespaceorcomment(tokenstring)

Returns whether the input string is a whitespace or comment token. This function can be useful for function find(), because it filters all irrelevant tokens corresponding to the SQL standard.

sqlparsing.isidentifier(tokenstring)

Returns whether the input string is an identifier token.

sqlparsing.iskeyword(tokenstring)

Returns whether the input string is a SQL keyword token (for example, SELECT, FROM, TABLE). The functions isidentifier() and iskeyword() return both true for non-reserved keywords. Therefore, you can identify non-reserved keywords.

sqlparsing.isstringliteral(tokenstring)

Returns whether the input string is a string literal token.

sqlparsing.isnumericliteral(tokenstring)

Returns whether the input string is a numeric literal token.

sqlparsing.isany(tokenstring)

Always returns true. This can be useful if you want to find any first relevant token (as match function within the method find()).

sqlparsing.normalize(tokenstring)

Returns a normalized string for similar representations (upper/lower case identifiers), on the basis of the following rules:

  • Regular identifiers are transformed into upper-case letters, for example, dual -> DUAL
  • Keywords are transformed in upper-case letters, for example, From -> FROM
  • Whitespace-Token of any size are replaced by a single whitespace
  • In numerical literals, an optional lower-case 'e' is replaced by 'E', for example, 1.2e34 -> 1.2E34

sqlparsing.find(tokenlist, startTokenNr, searchForward, searchSameLevel, ignoreFunction, match1, [match2, ... matchN])

Searches in the token list, starting from positions startTokenNr, forward or backward (searchForward), and optionally only within the current level of brackets (searchSameLevel), for the directly successive sequence of tokens which are matched by parameters match1, ... matchN. In that search process, all tokens that match by function ignoreFunction will be not considered by the match functions.

If the searched token sequence is found, then an array of size N is returned (in case of N match elements) whose X-th entry contains the position of the token within the token list which was matched by matchX. If the token sequence was not found, the function returns nil.

The details of the parameters are as follows:

tokenlist

List of tokens which is produced by the function tokenize.

startTokenNr

Number of the first token to be considered for the search.

searchForward

Defines whether the search should be applied forward (true) or backward (false). This affects only the direction by that the search process is moving across the list, but the match functions always search forward.

That means that if you search the token sequence KEYWORD, IDENTIFIER, within the token list 'select' 'abc' 'from' 'dual', and start from position 3, then 'from' 'dual' will be matched and not 'from' 'abc', even when searching backward. If you start your search at position 2, then the backward search will return 'select' 'abc', and the forward search will return 'from' 'dual'.

searchSameLevel

Defines whether the search should be limited to the current level of brackets (true) or also beyond (false). This applies only to the match of the first token of the sequence. Subsequent tokens can also be located in more inner bracket levels. That means that the search of the token sequence '=' '(' 'SELECT' is also possible if it is constrained to the current level, although the 'SELECT' is located in the next inner bracket level. The option searchSameLevel is especially useful for finding the corresponding closing bracket, for example, of a subquery.

Example: Search the closing bracket within the token sequence 'SELECT' 't1.x' '+' '(' 'SELECT' 'min' '(' 'y' ')' 'FROM' 't2' ')' 'FROM' 't1' which corresponds to the bracket at position 4: sqlparsing.find(tokens, 4, true, true, sqlparsing. iswhitespaceorcomment, ')').

ignoreFunction

A function of type function(string)->bool is expected. The tokens for which the function ignoreFunction returns true will be ignored by the match functions. That means, you can specify tokens types which may occur within the sequence without breaking the match. In many cases, the function iswhitespaceorcomment is useful for that purpose.

match1...matchN

By match1..matchN, the searched token sequence is specified. These parameters should either be functions of type function(tokenstring)->bool or simple strings. A token sequence is searched where the first token matches match1, the second matches match2, and so on, while tokens in between are ignored if function ignoreFunction returns true. If a parameter is a string, then the comparison normalize(tokenstring)== normalize(matchstring) is applied.

sqlparsing.getsqltext()

Returns the current SQL statement text.

This function is only available within the main SQL Preprocessor script.

sqlparsing.setsqltext(string)

Sets the SQL statement text to a new value which will be passed to the database compiler for execution.

This function is only available within the main SQL Preprocessor script.

Best Practice

For better management of the SQL Preprocessor, you should consider the followings:

  • Extensively test a preprocessor script in your own session before activating it throughout the system.
  • Implement the SQL processing by using separate auxiliary scripts and integrated in one main script which is just a wrapper to hand over the SQL text (for example. sqlparsing.setsqltext(myscript.preprocess( sqlparsing.getsqltext()))). This is because the functions getsqltext() and setsqltext() are only available within the preprocessing and not in normal script executions. By the separation you can test the processing on several test SQL constructs (on your own daily SQL history stored within a table) before activating the main script as preprocessor script.
  • Ensure that all necessary privileges are granted to execute the preprocessor script. It is recommended that you start a test with a user without special rights. Otherwise, certain user groups could be blocked for executing any SQL statements.
  • Make the preprocessing as simple as possible. Especially, query() and pquery() should only be used in exceptional cases if you activate the preprocessing globally, because all SQL queries will be decelerated, and a parallel access on similar tables increases the risk of transaction conflicts.

Examples

This section shows some examples for Preprocessor scripts that explain the functionality.

IF() function

In this example, the IF() function (currently not supported in Exasol) is transformed into an equivalent CASEWHEN expression.

--/
CREATE SCRIPT transformIf() AS
   function processIf(sqltext)
    while (true) do
	local tokens = sqlparsing.tokenize(sqltext)
	local ifStart = sqlparsing.find(tokens,
					1,
					true,
					false,
					sqlparsing.iswhitespaceorcomment,
					'IF',
					'(' )
	if (ifStart==nil) then
	  break;
	end
	local ifEnd = sqlparsing.find(tokens,
				ifStart[2],
				true,
				true,
				sqlparsing.iswhitespaceorcomment,
				')')
	if (ifEnd==nil) then
	  error("if statement not ended properly")
	  break;
	end
	local commas1 = sqlparsing.find(tokens,
					ifStart[2]+1,
					true,
					true,
					sqlparsing.iswhitespaceorcomment,
					',' )
	if (commas1==nil) then
	  error("invalid if function")
	  break;
	end
	local commas2 = sqlparsing.find(tokens,
					commas1[1]+1,
					true,
					true,
					sqlparsing.iswhitespaceorcomment,
					',' )
	if (commas2==nil) then
	  error("invalid if function")
	  break;
	end
	local ifParam1=table.concat(tokens, '', ifStart[2]+1, commas1[1]-1)
	local ifParam2=table.concat(tokens, '', commas1[1]+1, commas2[1]-1)
	local ifParam3=table.concat(tokens, '', commas2[1]+1, ifEnd[1]-1)
	local caseStmt='CASE WHEN ('..ifParam1..') != 0 \
		THEN ('..ifParam2..') \
		ELSE ('..ifParam3..') END '
	sqltext=table.concat(tokens, '',1,
		ifStart[1]-1)..caseStmt..table.concat(tokens,
					'',
					ifEnd[1]+1)
	end
	return sqltext
end
/ 

--/
CREATE SCRIPT sql_preprocessing.preprocessIf() AS
   import( 'sql_preprocessing.transformIf', 'transformIf' )
   sqlparsing.setsqltext(
	transformIf.processIf(sqlparsing.getsqltext()))
/
 
SELECT IF( 3+4 > 5, 6, 7 ) from dual;
Error: [42000] syntax error, unexpected IF_ [line 1, column 8]

ALTER SESSION SET sql_preprocessor_script=
		   sql_preprocessing.preprocessIf;
SELECT IF( 3+4 > 5, 6, 7 ) AS col1 FROM dual;

COL1
----
6

ls command

In this example, Unix command ls is transferred to the database. This command returns either the list of all objects within a schema or the list of all schemas if no schema is opened. Additionally, you can apply filters (case insensitive) for example, ls '%name%' to display all objects whose name contains the text 'name'.

--/
CREATE SCRIPT sql_preprocessing.addunixcommands() AS
  function processLS(input, tokens, commandPos)
	local result = query("SELECT CURRENT_SCHEMA")
	local current_schema = result[1][1]
	local returnText = ""
	local searchCol = ""
	if (current_schema==null) then
	   returnText = "SELECT schema_name FROM exa_schemas WHERE true"
	   searchCol = "schema_name"
	elseif (current_schema=='SYS' or current_schema=='EXA_STATISTICS') then
	   returnText = "SELECT object_name, object_type FROM exa_syscat \
			 WHERE schema_name='"..current_schema.."'"
	   searchCol = "object_name"
	else
	   returnText = "SELECT object_name, object_type FROM exa_all_objects \
			 WHERE root_type='SCHEMA' \
	     		 AND root_name='"..current_schema.."'"
	   searchCol = "object_name"
	end
	local addFilters = {}
	local lastValid = commandPos
	local foundPos = sqlparsing.find(tokens,
					lastValid+1,
					true,
					false,
					sqlparsing.iswhitespaceorcomment,
					sqlparsing.isany)
	while (not(foundPos==nil) )
	do
	   local foundToken = tokens[foundPos[1]]
	   if (sqlparsing.isstringliteral(foundToken)) then
		addFilters[#addFilters+1] = "UPPER("..searchCol..") \
					      LIKE UPPER("..foundToken .. ")"
	   elseif (not (sqlparsing.normalize(foundToken) == ';')) then
	      error("only string literals allowed as arguments for ls,\
		   but found '"..foundToken.."'")
	   end
	   lastValid = foundPos[1]
	   foundPos = sqlparsing.find(tokens,
				      lastValid+1,
				      true,
				      false,
				      sqlparsing.iswhitespaceorcomment,
				      sqlparsing.isany)	
      end
      if ( #addFilters > 0 ) then
	   local filterText = table.concat(addFilters, " OR ")
	   return returnText.." AND ("..filterText..")".." ORDER BY "..searchCol
      else
	   return returnText.." ORDER BY "..searchCol
      end
  end
 
  function processUnixCommands(input)
     local tokens = sqlparsing.tokenize(input)
     local findResult = sqlparsing.find(tokens,
					1,
					true,
					false,
					sqlparsing.iswhitespaceorcomment,
					sqlparsing.isany)
     if (findResult==nil) then
	return input
     end
     local command = tokens[findResult[1]]
     if (sqlparsing.normalize( command )=='LS') then
	return processLS(input, tokens, findResult[1])
     end    
     return input;
end
/
 
--/
CREATE SCRIPT sql_preprocessing.preprocessWithUnixTools() AS
   import( 'sql_preprocessing.addunixcommands', 'unixCommands' )
   sqlparsing.setsqltext(
	unixCommands.processUnixCommands(sqlparsing.getsqltext()));
/
 
ALTER SESSION SET sql_preprocessor_script=
		sql_preprocessing.preprocessWithUnixTools;

OPEN SCHEMA sql_preprocessing;
LS '%unix%';
 
OBJECT_NAME 		    OBJECT_TYPE
-------------------------- ---------------
ADDUNIXCOMMANDS 	    SCRIPT
PREPROCESSWITHUNIXTOOLS    SCRIPT

CLOSE SCHEMA;
LS;
 
SCHEMA_NAME
--------------------------
SCHEMA_1
SCHEMA_2
SCHEMA_3
SQL_PREPROCESSING

ANY/ALL

ANY and ALL SQL constructs are currently not supported by Exasol. However, using the following script, you can add this functionality.

--/
CREATE SCRIPT sql_preprocessing.transformAnyAll() AS
    function rebuildAnyAll(inputsqltext)
	local sqltext = inputsqltext;
	local tokens = sqlparsing.tokenize(sqltext);
	local found = true;
	local searchStart = 1;
	-- search for sequence >|>=|<|<= ANY|ALL ( SELECT
	repeat
	    local foundPositions =
		sqlparsing.find(tokens,
		    searchStart,
		    true,
		    false,
		    sqlparsing.iswhitespaceorcomment,
		    function (token)
		      return (token=='<' or token=='<=' or token=='>'
			or token=='>=' or token=='!=' or token=='<>'
			or token=='=');
		    end, -- match <|<=|>|>=|=|!=|<>
	 	    function ( token )
		      local normToken = sqlparsing.normalize(token);
		      return (normToken=='ANY' or normToken=='SOME'
			or normToken=='ALL');
		   end, -- match ANY|ALL
		   '(', -- match (
		   'SELECT' -- match SELECT
		);
	if (foundPositions==nil) then
	    found = false;
	    break;
	end
	local operatorPos = foundPositions[1];
	local anyAllPos = foundPositions[2];
	local openBracketPos = foundPositions[3];
	searchStart = anyAllPos + 1
	foundPositions = sqlparsing.find(tokens,
					openBracketPos,
					true,
					true,
					sqlparsing.iswhitespaceorcomment,
					')');
	if (foundPositions ~= nil) then
	   local closeBracketPos = foundPositions[1]
	   local operatorToken = tokens[operatorPos];
	   local anyOrAll = sqlparsing.normalize(tokens[anyAllPos]);
	   if (operatorToken=='<' or operatorToken=='<='	
		or operatorToken=='>' or operatorToken=='>=') then
		-- now we have <|<=|>|>= ANY|ALL (SELECT <something> FROM
		-- rebuild to <|<=|>|>= (SELECT MIN|MAX(<something>) FROM
		local setfunction = 'MIN';
		if ( ((anyOrAll=='ANY' or anyOrAll=='SOME') and
		      (operatorToken=='<' or operatorToken=='<=')
		     ) or
		      (anyOrAll=='ALL' and (operatorToken=='>'
					     or operatorToken=='>=')
		     )
		) then
		setfunction = 'MAX';		
	   end
	   tokens[anyAllPos] = '';
	   tokens[openBracketPos] =
		'(SELECT ' .. setfunction .. '(anytab.anycol) FROM (';
	   tokens[closeBracketPos] = ') as anytab(anycol) )';
       elseif (operatorToken=='=' and anyOrAll=='ALL') then
	   -- special rebuild for = ALL
	   -- rebuild to=(SELECT CASE WHEN COUNT(DISTINCT <something>)==1
	   -- 		THEN FIRST_VALUE(<something>) ELSE NULL END FROM
	   tokens[anyAllPos] = '';
	   tokens[openBracketPos] =
		'(SELECT CASE WHEN COUNT(DISTINCT anytab.anycol) = 1 \
		THEN FIRST_VALUE(anytab.anycol) ELSE NULL END FROM (';	 
	   tokens[closeBracketPos] = ') as anytab(anycol) )';
	elseif ((operatorToken=='!=' or operatorToken=='<>')
		and anyOrAll=='ALL') then	
	   -- special rebuild for != ALL
	   -- rebuild to NOT IN
	   tokens[operatorPos] = ' NOT IN '
	   tokens[anyAllPos] = ''
       elseif (operatorToken=='!=' and
		(anyOrAll=='ANY' or anyOrAll=='SOME')) then
	  --special rebuild for != ANY, rebuild to
	  -- CASE WHEN (SELECT COUNT(DISTINCT <something>) FROM ...) == 1
	  -- THEN operand != (SELECT FIRST_VALUE(<something>) FROM ...)
	  -- ELSE operand IS NOT NULL END
	  --note: This case would normally require to determine the operand
	  -- 	  which requires full understanding of a value expression
	  -- 	  in SQL standard which is nearly impossible in
	  -- 	  preprocessing (and very susceptible to errors)
	  -- 	  so we evaluate the
	  -- 	  SELECT COUNT(DISTINCT <something) FROM ...) == 1 here and
	  -- 	  insert the correct expression
	  --
	  -- first preprocess the inner query
	  local queryText = table.concat(tokens,
					  '',
				      	  openBracketPos,
					  closeBracketPos)
	  queryText = rebuildAnyAll( queryText )
	  -- since the subquery was already processed we can continue
	  -- searching *after* the SELECT
	  searchStart = closeBracketPos + 1
	  local distinctQueryText='SELECT COUNT(DISTINCT anytab.anycol) \
	  			   FROM '..queryText..' AS anytab(anycol)'
	  local success, result = pquery(distinctQueryText)
	  if (success) then
		if (result[1][1] == 1) then
		   tokens[anyAllPos] ='(SELECT FIRST_VALUE(anytab.anycol) \
					FROM '..queryText..' AS anytab(anycol))'
		else
		   tokens[operatorPos] = ' IS NOT NULL '
		   tokens[anyAllPos] = ''
		end
		-- clear all tokens of the SELECT
		for curTokenNr=openBracketPos,closeBracketPos do
		  tokens[curTokenNr] = ''
		end	
	      end
	   end
       end
   until found == false;
   return table.concat(tokens);
 end
/
 
--/
CREATE SCRIPT sql_preprocessing.preprocessAnyAll AS
  import('sql_preprocessing.transformAnyAll', 'anyallparser');
  sqlparsing.setsqltext(
    anyallparser.rebuildAnyAll(sqlparsing.getsqltext()))
/
 
CREATE TABLE t1 (i INT);
INSERT INTO t1 VALUES 1,2,3,4,5,6,7,8,9,10;
CREATE TABLE t2 (j INT);
INSERT INTO t2 VALUES 5,6,7;

SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);
Error: [0A000] Feature not supported: comparison with quantifier ALL

ALTER SESSION SET sql_preprocessor_script=
		   sql_preprocessing.preprocessAnyAll;
SELECT i FROM t1 WHERE i < ALL(SELECT j FROM t2);

I
-------------------
		   1
		   2
		   3
		   4