Lua

This article explains how to use Lua 5.4 for UDF scripting in Exasol.

For additional guidelines on using Lua with Exasol, see the Exasol Lua Style Guide.

For more information about the Lua language, refer to the official Lua documentation.

run() and cleanup() Methods

The method run() is called for each input tuple (SCALAR) or each group (SET). Its parameter is a kind of execution context and provides access to the data and the iterator in case of a SET script.

To initialize expensive steps (for example, opening external connections), you can write code outside the run() method, this code is executed once at the beginning by each virtual machine. For deinitialization purpose, the method cleanup() is called once for each virtual machine at the end of the execution.

Parameters

The internal Lua data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:

Datatype SQL  Datatype Lua

DOUBLE

number

DECIMAL and INTEGER

decimal

BOOLEAN

boolean

VARCHAR and CHAR

string

Other

Not supported

The decimal type is not a standard Lua datatype. For more details about the Lua datatypes, see General Script Language.

NULL values are represented by the special constant NULL.

The input parameters can be addressed by their names, for example, ctx.my_input. ctx refers to the name of the context parameter that is passed to the run() method.

However, you can also use a dynamic number of parameters through the notation (...), for example, CREATE LUA SCALAR SCRIPT my_script (...). The parameters can then be accessed through an index, for example, ctx[1]. The number of parameters and their data types (both determined during the call of the script) are part of the metadata.

Metadata

The following metadata can be accessed via global variables:

Metadata Description

exa.meta.database_name

Database name

exa.meta.database_version

Database version

exa.meta.script_language

Name and version of the script language

exa.meta.script_name

Name of the script

exa.meta.script_schema

Schema in which the script is stored

exa.meta.current_schema

Schema that is currently opened

exa.meta.script_code

Code of the script

exa.meta.session_id

Session ID

exa.meta.statement_id

Statement ID within the session

exa.meta.current_user

Current user

exa.meta.scope_user

Scope user (current_user or the owner of a view if the UDF script is called within a view)

exa.meta.node_count

Number of cluster nodes

exa.meta.node_id

Local node ID starting with 0

exa.meta.vm_id

Unique ID for the local machine (the IDs of the virtual machines have no relation to each other)

exa.meta.input_type

Type of the input data (SCALAR or SET)

exa.meta.input_column_count

Number of input columns

exa.meta.input_columns[]

Array including the following information: {name, type, sql_type, precision, scale, length}

exa.meta.output_type

Type of the output data (RETURNS or EMITS)

exa.meta.output_column_count

Number of output columns

exa.meta.output_columns[]

Array including the following information: {name, type, sql_type, precision, scale, length}

Data iterator

For scripts having multiple input tuples per call (keyword SET), you can iterate through that data using the method next(). The method is accessible through the context. Initially, the iterator points to the first input row. Therefore, a repeat...until loop can be used to iterate through the data (see examples).

If the input data is empty, then the run() method will not be called, and similar to aggregate functions, the NULL value is returned as result for example, SELECT MAX(x) FROM t WHERE false.

The method reset() resets the iterator to the first input element. If required for your algorithm, you can use it to do multiple iterations through the data.

The method size() returns the number of input values.

emit()

You can return multiple output tuples per call (keyword EMITS) through the method emit().

The method expects as many parameters as output columns were defined. In the case of dynamic output parameters, use an array using unpack(), for example, ctx.emit(unpack({1,"a"})).

Import other scripts

You can import the other scripts through the method exa.import(), for example as shown below.

Syntax
exa.import("<schema>.<script>", "<alias>")
Example
--/
CREATE OR REPLACE LUA SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
function helloWorld()
  return "Hello Lua World!"
end
/

--/
CREATE OR REPLACE LUA SCALAR SCRIPT TEST.MYHELLOWORLD() 
   RETURNS VARCHAR(2000) AS

exa.import("LIB.MYLIB","MYLIB")

function run(ctx)
  return MYLIB.helloWorld()
end
/

select TEST.MYHELLOWORLD();

You can also import scripting programs, however, their parameters will be ignored.

Access connection definitions

The data that has been specified when defining connections with CREATE CONNECTION is available in LUA UDF scripts through the method exa.get_connection("<connection_ name>"). The result is a Lua table with the following fields:

Fields Description

type

The type of the connection definition. Currently only the type PASSWORD is used.

address

The part of the connection definition that follows the TO keyword in the CREATE CONNECTION command.

user

The part of the connection definition that follows the USER keyword in the CREATE CONNECTION command.

password

The part of the connection definition that follows the IDENTIFIED BY keyword in the CREATE CONNECTION command.

Database interaction in adapter scripts

In adapter scripts the functions exa.query_no_preprocessing and exa.pquery_no_preprocessing can be used to execute queries. The functions work similar to the query and pquery functions, but with the following restrictions:

  • Only allowed during the execution of the statements CREATE VIRTUAL SCHEMA, ALTER VIRTUAL SCHEMA, and DROP VIRTUAL SCHEMA, during the Virtual Schema API request types createVirtualSchema, refresh, setProperties, and dropVirtualSchema.
  • The queries always run with the privileges of the virtual schema owner.
  • No preprocessor script is used for the queries. This makes the queries safe from modification by the user of the virtual schema.
  • Only SELECT queries are allowed.
  • Not allowed in batch mode.
  • No query caching.

Auxiliary libraries

The following table lists the supported libraries:

Libraries Description
math Mathematical calculations
table Table operations
string Text processing
unicode.utf8 Similar functions like string, but supports unicode data
socket Support for internet access through HTTP, FTP, or SMTP
lxp XML parsing
cjson JSON parsing

For more information about the libraries, see Libraries.

The modules math, table, string, and unicode.utf8 are already provided in the namespace. You can load the others through require().
You cannot install additional Lua libraries.

Dynamic output parameters callback function

If the UDF script is defined with dynamic output parameters and the output parameters cannot be determined (by specifying EMITS in the query or through INSERT INTO SELECT), you can implement the database call to the function default_output_columns(). The expected return value is a String with the names and types of the output columns, "a int, b varchar(100)". For an explanation of when the method is called, see Dynamic input and output parameters. The method will be executed only once on a single node.

You can access metadata using exa.meta variables. For example, to find out the number and types of input columns.

User defined import callback function

To support a user defined import you can implement the callback method generate_sql_for_import_spec(import_spec). For syntax, see Dynamic input and output parameters and the IMPORT statement. The parameter import_spec contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement which will retrieve the data to be imported. For more information about using UDFs in IMPORT statements, see IMPORT.

import_spec is a Lua table with the following fields:

Fields Description
parameters Parameters specified in the IMPORT statement. For example the parameter FOO can be obtained by accessing import_spec.parameters.FOO. The value nil is returned if the accessed parameter was not specified.
is_subselect

This is true, if the IMPORT is used inside a SELECT statement and not inside an IMPORT INTO table statement.

subselect_column_names

This is only defined, if is_subselect is true and the user specified the target column names and types. It returns a list of strings with the names of all specified columns. The value nil is returned if the target columns are not specified.

subselect_column_types

This is only defined, if is_subselect is true and the user specified the target columns and names. It returns a list of strings with the types of all specified columns. The types are returned in SQL format (for example, "VARCHAR(100)"). The value nil is returned if the target columns are not specified.

connection_name

This returns the name of the connection, if specified. Otherwise it returns nil. The UDF script can then obtain the connection information through exa.get_connection(name).

connection This is only defined, if the user provided connection information. It returns a Lua table similar to such that is returned by exa.get_connection(name). Returns nil if no connection information are specified.

If a password was specified, it will be transferred into clear-text and might be shown in the logs. It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from the connection_name field). The actual connection information can be obtained through exa.get_connection(name).

User defined export callback function

To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec). For syntax, see Dynamic input and output parameters and the EXPORT statement. The parameter export_spec contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT statement which will generate the data to be exported. The FROM part of that string can be a dummy table (for example, DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully. For more information about using UDFs in EXPORT statements, see EXPORT.

export_spec is a Lua table with the following fields:

Fields Description
parameters

Parameters specified in the EXPORT statement. For example the parameter FOO could be obtained by accessing export_spec.parameters.FOO. The value nil is returned if the accessed parameter was not specified.

source_column_names List of column names of the resulting table that should be exported.
has_truncate Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.
has_replace Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.
created_by String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.
connection_name

This returns the name of the connection, if it was specified. Otherwise it returns nil. The UDF script can then obtain the connection information through exa.get_connection(name).

connection

This is defined only if the user provides connection information. It returns a Lua table similar to exa.get_connection(name). Returns nil if no connection information is specified.

If a password was specified, it will be transferred into clear-text and might be shown in the logs. It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from the connection_name field). The actual connection information can be obtained through exa.get_connection(name).

Example
/*
  This example loads from a webserver
     and processes the following file goalies.xml:

     <?xml version='1.0' encoding='UTF-8'?>
     <users>
         <user active="1">
             <first_name>Manuel</first_name>
             <last_name>Neuer</last_name>
         </user>
         <user active="1">
             <first_name>Joe</first_name>
             <last_name>Hart</last_name>
         </user>
         <user active="0">
             <first_name>Oliver</first_name>
             <last_name>Kahn</last_name>
         </user>
     </users>
*/

--/
CREATE LUA SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
require("lxp")
HTTP = require("socket.http")
local in_user_tag = false;
local in_first_name_tag = false;
local in_last_name_tag = false;
local current = {}
local users = {}

p = lxp.new(
    {StartElement = function(p, tag, attr)
      if tag == "user" and attr.active == "1" then in_user_tag = true; end
      if tag == "first_name" then in_first_name_tag = true; end
      if tag == "last_name" then in_last_name_tag = true; end
    end,
    EndElement = function(p, tag)
      if tag == "user" then in_user_tag = false; end
      if tag == "first_name" then in_first_name_tag = false; end
      if tag == "last_name" then in_last_name_tag = false; end
    end,
    CharacterData = function(p, txt)
      if in_user_tag then
        if in_first_name_tag then current.first_name = txt; end
        if in_last_name_tag then current.last_name = txt; end
      end
      if current.first_name and current.last_name then
        users[#users+1] = current
        current = {}
      end
    end})
    
function run(ctx)
    content = http.request(ctx.url)
    p:parse(content); p:parse(); p:close();
    for i=1, #users do
      ctx.emit(users[i].first_name, users[i].last_name)
    end
end
/

SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;

Adapter script callback function

For virtual schemas an adapter script must define the function adapter_call(request_json). The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. For the Virtual Schema API Documentation, see Information for Developers. The callback function will be executed only on a single node.