Lua

This article explains how to use Lua 5.4 for UDF scripting in Exasol.

Lua is integrated in Exasol and has the smallest process overhead of all the supported scripting languages. If overall performance is the most important criteria, we recommend that you use Lua.

For additional guidelines on using Lua with Exasol, see the Exasol Lua Style Guide on GitHub.

For more information about the Lua language, refer to the official Lua documentation.

Parameters

The internal Lua data types and the database SQL types are not identical, which means that casts must be done for the input and output data. Data types that are not included in the following table are not supported.

SQL data type Lua data type

DOUBLE

number

DECIMAL

decimal

INTEGER decimal

The decimal type is not a standard Lua data type. For more details about Lua data types, see General Script Language.

BOOLEAN

boolean

VARCHAR

string

CHAR string

NULL values are represented by the special constant NULL.

The input parameters can be addressed by their names. For example, ctx.my_input. ctx refers to the name of the context parameter that is passed to the run() method. You can also use a dynamic number of parameters through the notation (...) (for example, CREATE LUA SCALAR SCRIPT my_script (...)). The parameters can then be accessed through an index (for example, ctx[1]).

The number of parameters and their respective data types are determined when the script is called, and are part of the metadata.

Metadata

The following metadata can be accessed through global variables:

Metadata Description

exa.meta.database_name

Database name

exa.meta.database_version

Database version

exa.meta.script_language

Name and version of the script language

exa.meta.script_name

Name of the script

exa.meta.script_schema

Schema in which the script is stored

exa.meta.current_schema

Schema that is currently opened

exa.meta.script_code

Code of the script

exa.meta.session_id

Session ID

exa.meta.statement_id

Statement ID within the session

exa.meta.current_user

Current user

exa.meta.scope_user

Scope user (current_user or the owner of a view if the UDF script is called within a view)

exa.meta.node_count

Number of cluster nodes

exa.meta.node_id

Local node ID starting with 0

exa.meta.vm_id

Unique ID for the local machine (the IDs of the virtual machines have no relation to each other)

exa.meta.input_type

Type of the input data (SCALAR or SET)

exa.meta.input_column_count

Number of input columns

exa.meta.input_columns[]

Array including the following information: {name, type, sql_type, precision, scale, length}

exa.meta.output_type

Type of the output data (RETURNS or EMITS)

exa.meta.output_column_count

Number of output columns

exa.meta.output_columns[]

Array including the following information: {name, type, sql_type, precision, scale, length}

Methods run() and cleanup()

The method run() is called for each input tuple (SCALAR) or for each group (SET). Its parameter is a kind of execution context and provides access to the data and the iterator in case of a SET script.

To initialize expensive steps (for example, opening external connections), you can write code outside the run() method, this code is executed once at the beginning by each virtual machine. For deinitialization purpose, the method cleanup() is called once for each virtual machine at the end of the execution.

Method emit()

You can use the method emit() to return multiple output tuples per call (keyword EMITS).

This method expects as many parameters as output columns were defined. In the case of dynamic output parameters, use an array using unpack(), for example, ctx.emit(unpack({1,"a"})).

Data iterator

For scripts that have multiple input tuples per call (keyword SET), you can iterate through that data using the method next() which is accessible through the context. The iterator initially points to the first input row. A repeat...until loop can thus be used to iterate through the data.

If the input data is empty, the method run() is not called and the return value is NULL. For example, SELECT MAX(x) FROM t WHERE false.

The method reset() resets the iterator to the first input element. If required for your algorithm, you can use this method to do multiple iterations through the data.

The method size() returns the number of input values.

Import other scripts

You can import other scripts using the method exa.import().

Scripting programs can also be imported, but their parameters will be ignored.

Syntax
exa.import("<schema>.<script>", "<alias>")
Example
--/
CREATE OR REPLACE LUA SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
function helloWorld()
  return "Hello Lua World!"
end
/

--/
CREATE OR REPLACE LUA SCALAR SCRIPT TEST.MYHELLOWORLD() 
   RETURNS VARCHAR(2000) AS

exa.import("LIB.MYLIB","MYLIB")

function run(ctx)
  return MYLIB.helloWorld()
end
/

select TEST.MYHELLOWORLD();

Access connection definitions

The data specified when you define a connection with CREATE CONNECTION is available in LUA UDF scripts through the method exa.get_connection("<connection_ name>"). The result is a Lua table with the following fields:

Field Description

type

The type of the connection definition. Currently only the type PASSWORD is used.

address

The part of the connection definition that follows the TO keyword in the CREATE CONNECTION command.

user

The part of the connection definition that follows the USER keyword in the CREATE CONNECTION command.

password

The part of the connection definition that follows the IDENTIFIED BY keyword in the CREATE CONNECTION command.

Database interaction in adapter scripts

In adapter scripts, the functions exa.query_no_preprocessing and exa.pquery_no_preprocessing can be used to execute queries. These functions work similar to the query and pquery functions, but with the following restrictions:

  • The functions are only allowed during the execution of the statements CREATE VIRTUAL SCHEMA, ALTER VIRTUAL SCHEMA, and DROP VIRTUAL SCHEMA, and during the virtual schema API request types createVirtualSchema, refresh, setProperties, and dropVirtualSchema.
  • The queries always run with the privileges of the virtual schema owner.
  • No preprocessor script is used for the queries. This makes the queries safe from modification by the user of the virtual schema.
  • Only SELECT queries are allowed.
  • Batch mode is not allowed.
  • Query caching is not used.

Auxiliary libraries and modules

The following Lua libraries and modules are supported:

Library/module Description
math Mathematical calculations
table Table operations
string Text processing
unicode.utf8 Similar to string, but supports Unicode encoding.
luasocket LuaSocket is a library that provides support for the TCP and UDP transport layers, and contains a set of modules that add support for the HTTP, FTP, and SMTP protocols for applications that communicate over the internet.
luaexpat LuaExpat is a SAX XML parser based on the Expat library that contains the lxp module.
lua-cjson Lua CJSON is a module that provides JSON support for Lua.
luaossl LuaOSSL is an OpenSSL module for Lua.
luasec LuaSec is a binding for OpenSSL to provide TLS/SSL communication. It takes an established TCP connection and creates a secure session between the peers.

The math, table, string, and unicode.utf8 modules are already provided in the namespace. To load the other modules, use require(). It is not possible to install additional Lua libraries.

Dynamic output parameters callback function

If the UDF script is defined with dynamic output parameters and the output parameters cannot be determined by specifying EMITS in the query or through INSERT INTO SELECT, you can use the function default_output_columns(). The expected return value is a string with the names and types of the output columns, for example: "a int, b varchar(100)".

You can access metadata such as the number and types of input columns with the callback function by using exa.meta variables.

The method will be executed only once on a single node.

For more information, see Dynamic input and output parameters.

User defined import callback function

To support a user defined import, you can implement the callback method generate_sql_for_import_spec(import_spec). For an explanation of the syntax, see Dynamic input and output parameters and IMPORT.

The parameter import_spec contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement which will retrieve the data to be imported.

For more information about using UDFs in IMPORT statements, see Using IMPORT with UDFs.

import_spec is a Lua table with the following fields:

Field Description
parameters

Returns parameters specified in the IMPORT statement. For example, the parameter FOO can be obtained by accessing import_spec.parameters.FOO.

Returns nil If a parameter was not specified.

is_subselect

Boolean which is is true if the IMPORT is used inside a SELECT statement and not inside an IMPORT INTO table statement.

subselect_column_names

Returns a list of strings with the names of all specified columns.

Only defined if is_subselect is true and the user specified the target column names and types.

Returns nil if target columns are not specified.

subselect_column_types

Returns a list of strings with the types of all specified columns in SQL format. For example, "VARCHAR(100)".

Only defined if is_subselect is true and the user specified the target columns and names.

Returns nil if target columns are not specified.

connection_name

Returns the name of the connection. The UDF script can then obtain the connection information through exa.get_connection(name).

Returns nil if connection information is not specified.

connection

Returns a Lua table similar to that returned by exa.get_connection(name).

Returns nil if connection information is not specified.

If you specify a password it will be transmitted as cleartext, which means it will be visible in the logs. We recommend that you always create a CONNECTION first and then specify only the connection name, which can then be obtained from the connection_name field. The actual connection information can be obtained through exa.get_connection(name).

User defined export callback function

To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec). For syntax, see Dynamic input and output parameters and the EXPORT statement.

The parameter export_spec contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT statement which will generate the data to be exported. Because the export command is aware which table should be exported, the FROM part of the string can be a dummy table (for example, DUAL), but it must be specified for the SQL string to be successfully compiled.

For more information about using UDFs in EXPORT statements, see Using EXPORT with UDFs.

export_spec is a Lua table with the following fields:

Fields Description
parameters

Returns parameters specified in the EXPORT statement. For example, the parameter FOO could be obtained by accessing export_spec.parameters.FOO.

Returns nil if the accessed parameter was not specified.

source_column_names List of column names of the resulting table that should be exported.
has_truncate Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.
has_replace Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.
created_by String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.
connection_name

Returns the name of the connection. The UDF script can then obtain the connection information through exa.get_connection(name).

Only defined if the user provided connection information, otherwise it returns nil.

Returns nil if connection information is not specified.

connection

Returns a Lua table similar to that returned by exa.get_connection(name).

Only defined if the user provided connection information.

Returns nil if connection information is not specified.

If you specify a password it will be transmitted as cleartext, which means it will be visible in the logs. We recommend that you always create a CONNECTION first and then specify only the connection name, which can then be obtained from the connection_name field. The actual connection information can be obtained through exa.get_connection(name).

Example 1

This example shows how to convert an XML document to structured data using the XML parser module.

CREATE OR REPLACE TABLE XML_USERS(xml VARCHAR(2000));
INSERT INTO XML_USERS VALUES(
'<?xml version="1.0" encoding="UTF-8"?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>');

CREATE OR REPLACE LUA SCALAR SCRIPT process_users(xml VARCHAR(2000))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS

require("lxp")
local in_user_tag = false;
local in_first_name_tag = false;
local in_last_name_tag = false;
local current = {}
local users = {}

p = lxp.new(
    {StartElement = function(p, tag, attr)
        if tag == "user" and attr.active == "1" then in_user_tag = true; end
        if tag == "first_name" then in_first_name_tag = true; end
        if tag == "last_name" then in_last_name_tag = true; end
    end,
    EndElement = function(p, tag)
        if tag == "user" then in_user_tag = false; end
        if tag == "first_name" then in_first_name_tag = false; end
        if tag == "last_name" then in_last_name_tag = false; end
    end,
    CharacterData = function(p, txt)
        if in_user_tag then
        if in_first_name_tag then current.first_name = txt; end
        if in_last_name_tag then current.last_name = txt; end
    end
    if current.first_name and current.last_name then
        users[#users+1] = current
        current = {}
    end
    end})

function run(ctx)
    p:parse(ctx.xml); p:parse(); p:close();
    for i=1, #users do
        ctx.emit(users[i].first_name, users[i].last_name)
    end
end
/

SELECT process_users(xml) from xml_users;
Example 2

This example shows how to implement secure communication with an external service. The database opens a secure HTTPS communication channel to a server, then sends a request and processes the response from the server.

CREATE OR REPLACE LUA SCALAR SCRIPT server_response(url VARCHAR(500))
EMITS (response VARCHAR(10000)) AS

socket = require("socket")
ssl = require("ssl")
                      

-- HTTP simple response parse
--  Status line ( HTTP/1.0 200 OK )
--  Headers (multiple headers are allowed, Content-Length handling, etc) 
--  Body  
function parse_http_response(response)
    local status_line, headers, body = "", {}, ""
    local header_data = {}
    local body_start_index = 1

    -- Extract status line
    local header_end_index = response:find("\r\n")
    status_line = response:sub(1, header_end_index - 1)
    body_start_index = header_end_index + 2

    -- Extract headers
    while true do
        local line_end = response:find("\r\n", body_start_index)
        if not line_end or line_end == body_start_index then
            body_start_index = line_end and (line_end + 2) or body_start_index
            break
        end

        local line = response:sub(body_start_index, line_end - 1)
        body_start_index = line_end + 2

        local key, value = line:match("^(.-):%s*(.*)")
        if key and value then
            local lower_key = key:lower()
            headers[lower_key] = value
            header_data[#header_data + 1] = line
        end
    end

    -- Read body
    if headers["transfer-encoding"] == "chunked" then
        local chunks = {}
        while true do
            local chunk_size_end = response:find("\r\n", body_start_index)
            if not chunk_size_end then break end

            local chunk_size_str = response:sub(body_start_index, chunk_size_end - 1)
            local chunk_size = tonumber(chunk_size_str, 16)
            body_start_index = chunk_size_end + 2

            if chunk_size == 0 then break end

            local chunk = response:sub(body_start_index, body_start_index + chunk_size - 1)
            table.insert(chunks, chunk)
            body_start_index = body_start_index + chunk_size + 2  -- Skip over chunk and CRLF
        end
        body = table.concat(chunks)
    elseif headers["content-length"] then
        local content_length = tonumber(headers["content-length"])
        body = response:sub(body_start_index, body_start_index + content_length - 1)
    else
        body = response:sub(body_start_index)
    end

    return {
        status_line = status_line,
        headers = headers,
        header_data = table.concat(header_data, "\n"),
        body = body
    }
end

-- Function to handle HTTPS requests (using socket.tcp and ssl.wrap)
function simple_https_request(request_params)
    local parsed_url = require("socket.url").parse(request_params.url)

    local peer = socket.tcp()
    peer:connect(parsed_url.host, parsed_url.port or 443)

    -- TLS/SSL client parameters
    local params = {
      mode = "client",
      protocol = "tlsv1_2",
      verify = "none", -- No verification
      options = "all",
    }

    -- TLS/SSL initialization
    local peer = assert( ssl.wrap(peer, params) )
    assert(peer:dohandshake())

    local request = string.format("%s %s HTTP/1.1\r\nHost: %s\r\n"
        request_params.method or "GET",
        parsed_url.path or "/",
        parsed_url.host)

    -- Add headers
    for k,v in pairs(request_params.headers or {})do
        request = request .. string.format("%s: %s\r\n", k, v)
    end

    request = request .. "Connection: close\r\n\r\n"

    peer:send(request)

    -- Read response
    local raw_response = {}
    while true do
        local chunk, status, partial = peer:receive(1024)
        if chunk then
            table.insert(raw_response,chunk)
        elseif partial and #partial>0 then
            table.insert(raw_response,partial)
            break
        end
        if status == "closed" then break end
    end

    peer:close()

    return parse_http_response(table.concat(raw_response)) 
end

function run(ctx)

    local request_params={
        url = ctx.url,
        method = "GET",
        headers = {["User-Agent"]="Simple LuaSocket HTTPS Client"},
    }

    response = simple_https_request(request_params)
    ctx.emit(response.status_line)
    ctx.emit(response.header_data)
    ctx.emit(string.sub(response.body,1,100))

end
/

SELECT server_response ('https://www.example.com') FROM DUAL;

Adapter script callback function

For virtual schemas, an adapter script must define the function adapter_call(request_json). The parameter is a JSON string containing the virtual schema API request. The return value must also be a JSON string containing the response. The callback function will be executed only on a single node.

For documentation and examples for the virtual schema API, see Information for Developers on GitHub.