Lua
This article explains how to use Lua 5.4 for UDF scripting in Exasol.
Lua is integrated in Exasol and has the smallest process overhead of all the supported scripting languages. If overall performance is the most important criteria, we recommend that you use Lua.
For additional guidelines on using Lua with Exasol, see the Exasol Lua Style Guide on GitHub.
For more information about the Lua language, refer to the official Lua documentation.
Parameters
The internal Lua data types and the database SQL types are not identical, which means that casts must be done for the input and output data. Data types that are not included in the following table are not supported.
SQL data type | Lua data type |
---|---|
|
|
|
|
INTEGER
|
decimal
The decimal type is not a standard Lua data type. For more details about Lua data types, see General Script Language. |
|
|
|
|
CHAR
|
string
|
NULL
values are represented by the special constant NULL
.
The input parameters can be addressed by their names. For example, ctx.my_input
. ctx
refers to the name of the context parameter that is passed to the run()
method. You can also use a dynamic number of parameters through the notation (...)
(for example, CREATE LUA SCALAR SCRIPT my_script (...)
). The parameters can then be accessed
through an index (for example, ctx[1]
).
The number of parameters and their respective data types are determined when the script is called, and are part of the metadata.
Metadata
The following metadata can be accessed through global variables:
Metadata | Description |
---|---|
|
Database name |
|
Database version |
|
Name and version of the script language |
|
Name of the script |
|
Schema in which the script is stored |
|
Schema that is currently opened |
|
Code of the script |
|
Session ID |
|
Statement ID within the session |
|
Current user |
|
Scope user ( |
|
Number of cluster nodes |
|
Local node ID starting with 0 |
|
Unique ID for the local machine (the IDs of the virtual machines have no relation to each other) |
|
Type of the input data ( |
|
Number of input columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
|
Type of the output data ( |
|
Number of output columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
Methods run() and cleanup()
The method run()
is called for each input tuple (SCALAR) or for each group (SET). Its
parameter is a kind of execution context and provides access to the data and the iterator in
case of a SET script.
To initialize expensive steps (for example, opening external connections), you can write code outside
the run()
method, this code is executed once at the beginning by each virtual machine.
For deinitialization purpose, the method cleanup()
is called once
for each virtual machine at the end of the execution.
Method emit()
You can use the method emit()
to return multiple output tuples per call (keyword EMITS).
This method expects as many parameters as output columns were defined. In the case of
dynamic output parameters, use an array using unpack()
, for example, ctx.emit(unpack({1,"a"}))
.
Data iterator
For scripts that have multiple input tuples per call (keyword SET), you can iterate through
that data using the method next()
which is accessible through the context. The iterator initially points to the first input row. A repeat...until
loop can thus be used to iterate through the data.
If the input data is empty, the method run()
is not called and the return value is NULL
. For example, SELECT MAX(x) FROM t WHERE
false
.
The method reset()
resets the iterator to the first input element.
If required for your algorithm, you can use this method to do multiple iterations through the data.
The method size()
returns the number of input values.
Import other scripts
You can import other scripts using the method exa.import()
.
Scripting programs can also be imported, but their parameters will be ignored.
Syntax
Example
--/
CREATE OR REPLACE LUA SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
function helloWorld()
return "Hello Lua World!"
end
/
--/
CREATE OR REPLACE LUA SCALAR SCRIPT TEST.MYHELLOWORLD()
RETURNS VARCHAR(2000) AS
exa.import("LIB.MYLIB","MYLIB")
function run(ctx)
return MYLIB.helloWorld()
end
/
select TEST.MYHELLOWORLD();
Access connection definitions
The data specified when you define a connection with CREATE CONNECTION
is available in LUA UDF
scripts through the method exa.get_connection("<connection_
name>")
. The result is a Lua table with the following fields:
Field | Description |
---|---|
|
The type of the connection definition. Currently only the type |
|
The part of the connection definition that follows the |
|
The part of the connection definition that follows the |
|
The part of the connection definition that follows the |
Database interaction in adapter scripts
In adapter scripts, the functions exa.query_no_preprocessing
and exa.pquery_no_preprocessing
can be used to execute queries. These functions work similar to the query and pquery functions, but with the following restrictions:
- The functions are only allowed during the execution of the statements
CREATE VIRTUAL SCHEMA
,ALTER VIRTUAL SCHEMA
, andDROP VIRTUAL SCHEMA
, and during the virtual schema API request typescreateVirtualSchema
,refresh
,setProperties
, anddropVirtualSchema
. - The queries always run with the privileges of the virtual schema owner.
- No preprocessor script is used for the queries. This makes the queries safe from modification by the user of the virtual schema.
- Only
SELECT
queries are allowed. - Batch mode is not allowed.
- Query caching is not used.
Auxiliary libraries and modules
The following Lua libraries and modules are supported:
Library/module | Description |
---|---|
math
|
Mathematical calculations |
table
|
Table operations |
string
|
Text processing |
unicode.utf8
|
Similar to string , but supports Unicode encoding. |
luasocket
|
LuaSocket is a library that provides support for the TCP and UDP transport layers, and contains a set of modules that add support for the HTTP, FTP, and SMTP protocols for applications that communicate over the internet. |
luaexpat
|
LuaExpat is a SAX XML parser based on the Expat library that contains the lxp module. |
lua-cjson |
Lua CJSON is a module that provides JSON support for Lua. |
luaossl
|
LuaOSSL is an OpenSSL module for Lua. |
luasec
|
LuaSec is a binding for OpenSSL to provide TLS/SSL communication. It takes an established TCP connection and creates a secure session between the peers. |
The math
, table
, string
, and unicode.utf8
modules are
already provided in the namespace. To load the other modules, use require()
. It is not possible to install additional Lua libraries.
Dynamic output parameters callback function
If the UDF script is defined with dynamic output parameters and the output parameters
cannot be determined by specifying EMITS
in the query or through INSERT INTO SELECT
, you can use the function default_output_columns()
. The expected return value is a string with the names and types of the output columns, for example: "a int, b varchar(100)"
.
You can access metadata such as the number and types of input columns with the callback function by using exa.meta
variables.
The method will be executed only once on a single node.
For more information, see Dynamic input and output parameters.
User defined import callback function
To support a user defined import, you can implement the callback method generate_sql_for_import_spec(import_spec)
. For an explanation of the syntax, see Dynamic input and output parameters and IMPORT.
The parameter import_spec
contains all information about the executed IMPORT FROM SCRIPT
statement. The
function has to generate and return a SELECT
statement which will retrieve the data
to be imported.
For more information about using UDFs in IMPORT statements, see Using IMPORT with UDFs.
import_spec
is a Lua table with the following fields:
Field | Description |
---|---|
parameters
|
Returns parameters specified in the Returns |
is_subselect
|
Boolean which is is true if the |
subselect_column_names
|
Returns a list of strings with the names of all specified columns. Only defined if Returns |
subselect_column_types
|
Returns
a list of strings with the types of all specified
columns in SQL format. For example, Only defined if Returns |
connection_name
|
Returns the name of the connection. The UDF script
can then obtain the connection information through Returns |
connection
|
Returns a Lua table similar to that
returned by Returns |
If you specify a password it will be transmitted as cleartext, which means it will be visible in the logs. We recommend that you always create a CONNECTION
first and then specify only the connection name, which can then be
obtained from the connection_name
field. The actual connection
information can be obtained through exa.get_connection(name)
.
User defined export callback function
To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec)
. For syntax, see Dynamic input and output parameters and the EXPORT statement.
The parameter export_spec
contains all information about the executed EXPORT INTO SCRIPT
statement. The function has to generate and return a SELECT
statement which will generate the data to be exported. Because the export command is aware which table should be exported, the FROM
part of the string can be a dummy table (for example, DUAL
), but it must be specified for the SQL string to be successfully compiled.
For more information about using UDFs in EXPORT statements, see Using EXPORT with UDFs.
export_spec
is a Lua table with the following fields:
Fields | Description |
---|---|
parameters
|
Returns parameters specified in the Returns |
source_column_names
|
List of column names of the resulting table that should be exported. |
has_truncate
|
Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.
|
has_replace
|
Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.
|
created_by
|
String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.
|
connection_name
|
Returns the name of the connection. The UDF script
can then obtain the connection information through Only defined if the user provided connection information, otherwise it returns nil. Returns |
connection
|
Returns a Lua table similar to that
returned by Only defined if the user provided connection information. Returns |
If you specify a password it will be transmitted as cleartext, which means it will be visible in the logs. We recommend that you always create a CONNECTION
first and then specify only the connection name, which can then be
obtained from the connection_name
field. The actual connection
information can be obtained through exa.get_connection(name)
.
Example 1
This example shows how to convert an XML document to structured data using the XML parser module.
CREATE OR REPLACE TABLE XML_USERS(xml VARCHAR(2000));
INSERT INTO XML_USERS VALUES(
'<?xml version="1.0" encoding="UTF-8"?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>');
CREATE OR REPLACE LUA SCALAR SCRIPT process_users(xml VARCHAR(2000))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
require("lxp")
local in_user_tag = false;
local in_first_name_tag = false;
local in_last_name_tag = false;
local current = {}
local users = {}
p = lxp.new(
{StartElement = function(p, tag, attr)
if tag == "user" and attr.active == "1" then in_user_tag = true; end
if tag == "first_name" then in_first_name_tag = true; end
if tag == "last_name" then in_last_name_tag = true; end
end,
EndElement = function(p, tag)
if tag == "user" then in_user_tag = false; end
if tag == "first_name" then in_first_name_tag = false; end
if tag == "last_name" then in_last_name_tag = false; end
end,
CharacterData = function(p, txt)
if in_user_tag then
if in_first_name_tag then current.first_name = txt; end
if in_last_name_tag then current.last_name = txt; end
end
if current.first_name and current.last_name then
users[#users+1] = current
current = {}
end
end})
function run(ctx)
p:parse(ctx.xml); p:parse(); p:close();
for i=1, #users do
ctx.emit(users[i].first_name, users[i].last_name)
end
end
/
SELECT process_users(xml) from xml_users;
Example 2
This example shows how to implement secure communication with an external service. The database opens a secure HTTPS communication channel to a server, then sends a request and processes the response from the server.
CREATE OR REPLACE LUA SCALAR SCRIPT server_response(url VARCHAR(500))
EMITS (response VARCHAR(10000)) AS
socket = require("socket")
ssl = require("ssl")
-- HTTP simple response parse
-- Status line ( HTTP/1.0 200 OK )
-- Headers (multiple headers are allowed, Content-Length handling, etc)
-- Body
function parse_http_response(response)
local status_line, headers, body = "", {}, ""
local header_data = {}
local body_start_index = 1
-- Extract status line
local header_end_index = response:find("\r\n")
status_line = response:sub(1, header_end_index - 1)
body_start_index = header_end_index + 2
-- Extract headers
while true do
local line_end = response:find("\r\n", body_start_index)
if not line_end or line_end == body_start_index then
body_start_index = line_end and (line_end + 2) or body_start_index
break
end
local line = response:sub(body_start_index, line_end - 1)
body_start_index = line_end + 2
local key, value = line:match("^(.-):%s*(.*)")
if key and value then
local lower_key = key:lower()
headers[lower_key] = value
header_data[#header_data + 1] = line
end
end
-- Read body
if headers["transfer-encoding"] == "chunked" then
local chunks = {}
while true do
local chunk_size_end = response:find("\r\n", body_start_index)
if not chunk_size_end then break end
local chunk_size_str = response:sub(body_start_index, chunk_size_end - 1)
local chunk_size = tonumber(chunk_size_str, 16)
body_start_index = chunk_size_end + 2
if chunk_size == 0 then break end
local chunk = response:sub(body_start_index, body_start_index + chunk_size - 1)
table.insert(chunks, chunk)
body_start_index = body_start_index + chunk_size + 2 -- Skip over chunk and CRLF
end
body = table.concat(chunks)
elseif headers["content-length"] then
local content_length = tonumber(headers["content-length"])
body = response:sub(body_start_index, body_start_index + content_length - 1)
else
body = response:sub(body_start_index)
end
return {
status_line = status_line,
headers = headers,
header_data = table.concat(header_data, "\n"),
body = body
}
end
-- Function to handle HTTPS requests (using socket.tcp and ssl.wrap)
function simple_https_request(request_params)
local parsed_url = require("socket.url").parse(request_params.url)
local peer = socket.tcp()
peer:connect(parsed_url.host, parsed_url.port or 443)
-- TLS/SSL client parameters
local params = {
mode = "client",
protocol = "tlsv1_2",
verify = "none", -- No verification
options = "all",
}
-- TLS/SSL initialization
local peer = assert( ssl.wrap(peer, params) )
assert(peer:dohandshake())
local request = string.format("%s %s HTTP/1.1\r\nHost: %s\r\n",
request_params.method or "GET",
parsed_url.path or "/",
parsed_url.host)
-- Add headers
for k,v in pairs(request_params.headers or {})do
request = request .. string.format("%s: %s\r\n", k, v)
end
request = request .. "Connection: close\r\n\r\n"
peer:send(request)
-- Read response
local raw_response = {}
while true do
local chunk, status, partial = peer:receive(1024)
if chunk then
table.insert(raw_response,chunk)
elseif partial and #partial>0 then
table.insert(raw_response,partial)
break
end
if status == "closed" then break end
end
peer:close()
return parse_http_response(table.concat(raw_response))
end
function run(ctx)
local request_params={
url = ctx.url,
method = "GET",
headers = {["User-Agent"]="Simple LuaSocket HTTPS Client"},
}
response = simple_https_request(request_params)
ctx.emit(response.status_line)
ctx.emit(response.header_data)
ctx.emit(string.sub(response.body,1,100))
end
/
SELECT server_response ('https://www.example.com') FROM DUAL;
Adapter script callback function
For virtual schemas, an adapter script must define the function adapter_call(request_json)
. The parameter is a JSON string containing the virtual schema API request. The return value must also be a JSON string containing the response. The callback function will be executed only on a single node.
For documentation and examples for the virtual schema API, see Information for Developers on GitHub.