Lua
This section provides the details about Lua
run() and cleanup() Methods
The method run() is called for each input tuple (SCALAR) or each group (SET). Its parameter is a kind of execution context and provides access to the data and the iterator in case of a SET script.
To initialize expensive steps (for example, opening external connections), you can write code outside the run() method, this code is executed once at the beginning by each virtual machine. For deinitialization purpose, the method cleanup() is called once for each virtual machine at the end of the execution.
Parameters
The internal Lua data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:
Datatype SQL | Datatype Lua |
---|---|
DOUBLE |
number |
DECIMAL and INTEGER |
decimal |
BOOLEAN |
boolean |
VARCHAR and CHAR |
string |
Other |
Not supported |
You can consider the details about Lua types described in Scripting, especially for the special type decimal.
NULL values are represented by the special constant NULL.
The input parameters can be addressed by their names, for example, ctx.my_input. ctx refers to the name of the context parameter that is passed to the run() method.
However, you can also use a dynamic number of parameters through the notation (...), for example, CREATE LUA SCALAR SCRIPT my_script (...). The parameters can then be accessed through an index, for example, ctx[1]. The number of parameters and their data types (both determined during the call of the script) are part of the metadata.
Metadata
The following metadata can be accessed via global variables:
Metadata | Description |
---|---|
exa.meta.database_name |
Database name |
exa.meta.database_version |
Database version |
exa.meta.script_language |
Name and version of the script language |
exa.meta.script_name |
Name of the script |
exa.meta.script_schema |
Schema in which the script is stored |
exa.meta.current_schema |
Schema which is currently opened |
exa.meta.script_code |
Code of the script |
exa.meta.session_id |
Session ID |
exa.meta.statement_id |
Statement ID within the session |
exa.meta.current_user |
Current user |
exa.meta.scope_user |
Scope user (current_user or the owner of a view if the udf script is called within a view) |
exa.meta.node_count |
Number of cluster nodes |
exa.meta.node_id |
Local node ID starting with 0 |
exa.meta.vm_id |
Unique ID for the local machine (the ids of the virtual machines have no relation to each other) |
exa.meta.input_type |
Type of the input data (SCALAR or SET) |
exa.meta.input_column_count |
Number of input columns |
exa.meta.input_columns[] |
Array including the following information: {name, type, sql_type, precision, scale, length} |
exa.meta.output_type |
Type of the output data (RETURNS or EMITS) |
exa.meta.output_column_count |
Number of output columns |
exa.meta.output_columns[] |
Array including the following information: {name, type, sql_type, precision, scale, length} |
Data Iterator
For scripts having multiple input tuples per call (keyword SET), you can iterate through that data using the method next(). The method is accessible through the context. Initially, the iterator points to the first input row. Therefore, a repeat...until loop can be used to iterate through the data (see examples).
If the input data is empty, then the run() method will not be called, and similar to aggregate functions, the NULL value is returned as result for example, SELECT MAX(x) FROM t WHERE false.
The method reset() resets the iterator to the first input element. If required for your algorithm, you can use it to do multiple iterations through the data.
The method size() returns the number of input values.
emit()
You can return multiple output tuples per call (keyword EMITS) through the method emit().
The method expects as many parameters as output columns were defined. In the case of dynamic output parameters, use an array using unpack(), for example, ctx.emit(unpack({1,"a"})).
Import of Other Scripts
You can import the other scripts through the method exa.import(), for example as shown below.
Syntax
Example
--/
CREATE OR REPLACE LUA SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
function helloWorld()
return "Hello Lua World!"
end
/
--/
CREATE OR REPLACE LUA SCALAR SCRIPT TEST.MYHELLOWORLD()
RETURNS VARCHAR(2000) AS
exa.import("LIB.MYLIB","MYLIB")
function run(ctx)
return MYLIB.helloWorld()
end
/
select TEST.MYHELLOWORLD();
You can also import scripting programs, however, their parameters will be ignored.
Accessing Connection Definitions
The data that has been specified when defining connections with CREATE CONNECTION is available in LUA UDF scripts through the method exa.get_connection("<connection_ name>"). The result is a Lua table with the following fields:
Fields | Description |
---|---|
type |
The type of the connection definition. Currently only the type PASSWORD is used. |
address |
The part of the connection definition that follows the TO keyword in the CREATE CONNECTION command. |
user |
The part of the connection definition that follows the USER keyword in the CREATE CONNECTION command. |
password |
The part of the connection definition that follows the IDENTIFIED BY keyword in the CREATE CONNECTION command. |
Auxiliary Libraries
The following table lists the supported libraries:
Libraries | Description |
---|---|
math | Mathematical calculations |
table | Table operations |
string | Text processing |
unicode.utf8 | Similar functions like string, but supports unicode data |
socket | Support for internet access through HTTP, FTP, or SMTP |
lxp | XML parsing |
cjson | JSON parsing |
For more information about the libraries, see Libraries.
The modules math, table, string, and unicode.utf8 are
already provided in the namespace. You can load the others through require().
You cannot install additional Lua libraries.
Dynamic Output Parameters Callback Function
If the UDF script is defined with dynamic output parameters and the output parameters cannot be determined (by specifying EMITS in the query or through INSERT INTO SELECT), you can implement the database call to the function default_output_columns(). The expected return value is a String with the names and types of the output columns, "a int, b varchar(100)". For an explanation of when the method is called, see Dynamic Input and Output Parameters. The method will be executed only once on a single node.
You can access metadata using exa.meta variables. For example, to find out the number and types of input columns.
User Defined Import Callback Function
To support a user defined import you can implement the callback method generate_sql_for_import_spec(import_spec). For syntax, see Dynamic Input and Output Parameters and the IMPORT statement. The parameter import_spec contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement which will retrieve the data to be imported. For more information about using UDFs in IMPORT statements, see Import.
import_spec is a Lua table with the following fields:
Fields | Description |
---|---|
parameters | Parameters specified in the IMPORT statement. For example the parameter FOO can be obtained by accessing import_spec.parameters.FOO. The value nil is returned if the accessed parameter was not specified. |
is_subselect |
This is true, if the IMPORT is used inside a SELECT statement and not inside an IMPORT INTO table statement. |
subselect_column_names |
This is only defined, if is_subselect is true and the user specified the target column names and types. It returns a list of strings with the names of all specified columns. The value nil is returned if the target columns are not specified. |
subselect_column_types |
This is only defined, if is_subselect is true and the user specified the target columns and names. It returns a list of strings with the types of all specified columns. The types are returned in SQL format (for example, "VARCHAR(100)"). The value nil is returned if the target columns are not specified. |
connection_name |
This returns the name of the connection, if specified. Otherwise it returns nil. The UDF script can then obtain the connection information through exa.get_connection(name). |
connection | This is only defined, if the user provided connection information. It returns a Lua table similar to such that is returned by exa.get_connection(name). Returns nil if no connection information are specified. |
If a password was specified, it will be transferred into clear-text and might be shown in the logs. It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from the connection_name field). The actual connection information can be obtained through exa.get_connection(name).
User Defined Export Callback Function
To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec). For syntax, see Dynamic Input and Output Parameters and the EXPORT statement. The parameter export_spec contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT statement which will generate the data to be exported. The FROM part of that string can be a dummy table (for example, DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully. For more information about using UDFs in EXPORT statements, see Export.
export_spec is a Lua table with the following fields:
Fields | Description |
---|---|
parameters |
Parameters specified in the EXPORT statement. For example the parameter FOO could be obtained by accessing export_spec.parameters.FOO. The value nil is returned if the accessed parameter was not specified. |
source_column_names | List of column names of the resulting table that should be exported. |
has_truncate | Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer. |
has_replace | Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer. |
created_by | String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer. |
connection_name |
This returns the name of the connection, if it was specified. Otherwise it returns nil. The UDF script can then obtain the connection information through exa.get_connection(name). |
connection |
This is defined only if the user provides connection information. It returns a Lua table similar to exa.get_connection(name). Returns nil if no connection information is specified. |
If a password was specified, it will be transferred into clear-text and might be shown in the logs. It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from the connection_name field). The actual connection information can be obtained through exa.get_connection(name).
Example
/*
This example loads from a webserver
and processes the following file goalies.xml:
<?xml version='1.0' encoding='UTF-8'?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>
*/
--/
CREATE LUA SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
require("lxp")
HTTP = require("socket.http")
local in_user_tag = false;
local in_first_name_tag = false;
local in_last_name_tag = false;
local current = {}
local users = {}
p = lxp.new(
{StartElement = function(p, tag, attr)
if tag == "user" and attr.active == "1" then in_user_tag = true; end
if tag == "first_name" then in_first_name_tag = true; end
if tag == "last_name" then in_last_name_tag = true; end
end,
EndElement = function(p, tag)
if tag == "user" then in_user_tag = false; end
if tag == "first_name" then in_first_name_tag = false; end
if tag == "last_name" then in_last_name_tag = false; end
end,
CharacterData = function(p, txt)
if in_user_tag then
if in_first_name_tag then current.first_name = txt; end
if in_last_name_tag then current.last_name = txt; end
end
if current.first_name and current.last_name then
users[#users+1] = current
current = {}
end
end})
function run(ctx)
content = http.request(ctx.url)
p:parse(content); p:parse(); p:close();
for i=1, #users do
ctx.emit(users[i].first_name, users[i].last_name)
end
end
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;