Lua
This article explains how to use Lua 5.4 for UDF scripting in Exasol.
For additional guidelines on using Lua with Exasol, see the Exasol Lua Style Guide.
For more information about the Lua language, refer to the official Lua documentation.
run() and cleanup() Methods
The method run()
is called for each input tuple (SCALAR
) or each group (SET
). Its
parameter is a kind of execution context and provides access to the data and the iterator in
case of a SET script.
To initialize expensive steps (for example, opening external connections), you can write code outside
the run()
method, this code is executed once at the beginning by each virtual machine.
For deinitialization purpose, the method cleanup()
is called once
for each virtual machine at the end of the execution.
Parameters
The internal Lua data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:
Datatype SQL | Datatype Lua |
---|---|
|
|
|
|
|
|
|
|
|
|
The decimal type is not a standard Lua datatype. For more details about the Lua datatypes, see General Script Language.
NULL
values are represented by the special constant NULL
.
The input parameters can be addressed by their names, for example, ctx.my_input
. ctx
refers to the name of the context parameter that is passed to the run()
method.
However, you can also use a dynamic number of parameters through the notation (...)
, for example, CREATE
LUA SCALAR SCRIPT my_script (...)
. The parameters can then be accessed
through an index, for example, ctx[1]
. The number of parameters and their data types (both determined
during the call of the script) are part of the metadata.
Metadata
The following metadata can be accessed via global variables:
Metadata | Description |
---|---|
|
Database name |
|
Database version |
|
Name and version of the script language |
|
Name of the script |
|
Schema in which the script is stored |
|
Schema that is currently opened |
|
Code of the script |
|
Session ID |
|
Statement ID within the session |
|
Current user |
|
Scope user ( |
|
Number of cluster nodes |
|
Local node ID starting with 0 |
|
Unique ID for the local machine (the IDs of the virtual machines have no relation to each other) |
|
Type of the input data ( |
|
Number of input columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
|
Type of the output data ( |
|
Number of output columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
Data iterator
For scripts having multiple input tuples per call (keyword SET), you can iterate through
that data using the method next()
. The method is accessible through the context. Initially, the iterator points to the first input row. Therefore, a repeat...until
loop can be used to iterate through the data (see examples).
If the input data is empty, then the run()
method will not be called,
and similar to aggregate functions, the NULL value is returned as
result for example, SELECT MAX(x) FROM t WHERE
false
.
The method reset()
resets the iterator to the first input element.
If required for your algorithm, you can use it to do multiple iterations through the data.
The method size()
returns the number of input values.
emit()
You can return multiple output tuples per call (keyword EMITS) through the method emit()
.
The method expects as many parameters as output columns were defined. In the case of
dynamic output parameters, use an array using unpack()
, for example, ctx.emit(unpack({1,"a"}))
.
Import other scripts
You can import the other scripts through the method exa.import()
, for example as shown below.
Syntax
Example
--/
CREATE OR REPLACE LUA SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
function helloWorld()
return "Hello Lua World!"
end
/
--/
CREATE OR REPLACE LUA SCALAR SCRIPT TEST.MYHELLOWORLD()
RETURNS VARCHAR(2000) AS
exa.import("LIB.MYLIB","MYLIB")
function run(ctx)
return MYLIB.helloWorld()
end
/
select TEST.MYHELLOWORLD();
You can also import scripting programs, however, their parameters will be ignored.
Access connection definitions
The data that has been specified when defining connections with CREATE CONNECTION
is available in LUA UDF
scripts through the method exa.get_connection("<connection_
name>")
. The result is a Lua table with the following fields:
Fields | Description |
---|---|
|
The type of the connection definition. Currently only the type |
|
The part of the connection definition that follows the |
|
The part of the connection definition that follows the |
|
The part of the connection definition that follows the |
Database interaction in adapter scripts
In adapter scripts the functions exa.query_no_preprocessing
and exa.pquery_no_preprocessing
can be used to execute queries. The functions work similar to the query and pquery functions, but with the following restrictions:
- Only allowed during the execution of the statements
CREATE VIRTUAL SCHEMA
,ALTER VIRTUAL SCHEMA
, andDROP VIRTUAL SCHEMA
, during the Virtual Schema API request typescreateVirtualSchema
,refresh
,setProperties
, anddropVirtualSchema
. - The queries always run with the privileges of the virtual schema owner.
- No preprocessor script is used for the queries. This makes the queries safe from modification by the user of the virtual schema.
- Only
SELECT
queries are allowed. - Not allowed in batch mode.
- No query caching.
Auxiliary libraries
The following table lists the supported libraries:
Libraries | Description |
---|---|
math
|
Mathematical calculations |
table
|
Table operations |
string
|
Text processing |
unicode.utf8
|
Similar functions like string, but supports unicode data |
socket
|
Support for internet access through HTTP, FTP, or SMTP |
lxp
|
XML parsing |
cjson |
JSON parsing |
For more information about the libraries, see Libraries.
The modules math
, table
, string
, and unicode.utf8
are
already provided in the namespace. You can load the others through require()
.
You cannot install additional Lua libraries.
Dynamic output parameters callback function
If the UDF script is defined with dynamic output parameters and the output parameters
cannot be determined (by specifying EMITS
in the query or through INSERT INTO SELECT
), you can implement the database call to the function default_output_columns()
. The expected return value is a String with the names and types of the output columns, "a int, b varchar(100)". For an explanation of when the method is called, see Dynamic input and output parameters. The method will be executed only once on a single node.
You can access metadata using exa.meta
variables. For example,
to find out the number and types of input columns.
User defined import callback function
To support a user defined import you can implement the callback method generate_sql_for_import_spec(import_spec)
. For syntax, see Dynamic input and output parameters and the IMPORT statement. The parameter import_spec
contains all information about the executed IMPORT FROM SCRIPT
statement. The
function has to generate and return a SELECT
statement which will retrieve the data
to be imported. For more information about using UDFs in IMPORT statements, see IMPORT.
import_spec
is a Lua table with the following fields:
Fields | Description |
---|---|
parameters
|
Parameters specified in the IMPORT statement. For
example the parameter FOO can be obtained by accessing
import_spec.parameters.FOO . The
value nil is returned if the accessed parameter was
not specified. |
is_subselect
|
This is true, if the |
subselect_column_names
|
This is only defined, if |
subselect_column_types
|
This is only defined, if |
connection_name
|
This returns the name of the connection, if
specified. Otherwise it returns |
connection
|
This is only defined, if the user provided connection
information. It returns a Lua table similar to such that
is returned by exa.get_connection(name) .
Returns nil if no connection information are specified. |
If a password was specified, it will be transferred into clear-text and
might be shown in the logs. It is recommended to create a CONNECTION
before and to specify only the connection name (can be
obtained from the connection_name
field). The actual connection
information can be obtained through exa.get_connection(name)
.
User defined export callback function
To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec)
. For syntax, see Dynamic input and output parameters and the EXPORT statement. The parameter export_spec
contains all information about the executed EXPORT INTO SCRIPT
statement. The function has to generate and return a SELECT
statement which will generate the data to be exported. The FROM
part of that string can be a dummy table (for example, DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully. For more information about using UDFs in EXPORT statements, see EXPORT.
export_spec
is a Lua table with the following fields:
Fields | Description |
---|---|
parameters
|
Parameters specified in the |
source_column_names
|
List of column names of the resulting table that should be exported. |
has_truncate
|
Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.
|
has_replace
|
Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.
|
created_by
|
String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.
|
connection_name
|
This returns the name of the connection, if it was specified. Otherwise it returns |
connection
|
This is defined only if the user provides connection information. It returns a Lua table similar to |
If a password was specified, it will be transferred into clear-text and
might be shown in the logs. It is recommended to create a CONNECTION
before and to specify only the connection name (can be
obtained from the connection_name
field). The actual connection
information can be obtained through exa.get_connection(name)
.
Example
/*
This example loads from a webserver
and processes the following file goalies.xml:
<?xml version='1.0' encoding='UTF-8'?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>
*/
--/
CREATE LUA SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
require("lxp")
HTTP = require("socket.http")
local in_user_tag = false;
local in_first_name_tag = false;
local in_last_name_tag = false;
local current = {}
local users = {}
p = lxp.new(
{StartElement = function(p, tag, attr)
if tag == "user" and attr.active == "1" then in_user_tag = true; end
if tag == "first_name" then in_first_name_tag = true; end
if tag == "last_name" then in_last_name_tag = true; end
end,
EndElement = function(p, tag)
if tag == "user" then in_user_tag = false; end
if tag == "first_name" then in_first_name_tag = false; end
if tag == "last_name" then in_last_name_tag = false; end
end,
CharacterData = function(p, txt)
if in_user_tag then
if in_first_name_tag then current.first_name = txt; end
if in_last_name_tag then current.last_name = txt; end
end
if current.first_name and current.last_name then
users[#users+1] = current
current = {}
end
end})
function run(ctx)
content = http.request(ctx.url)
p:parse(content); p:parse(); p:close();
for i=1, #users do
ctx.emit(users[i].first_name, users[i].last_name)
end
end
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;
Adapter script callback function
For virtual schemas an adapter script must define the function adapter_call(request_json)
. The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. For the Virtual Schema API Documentation, see Information for Developers. The callback function will be executed only on a single node.