Java
This article explains how to use Java for UDF scripting in Exasol.
The following describes using Java 11 with OpenJDK 11. For more information about the Java language, refer to the Official Java Documentation.
Java main class
Guidelines for using Java:
-
Do not use inline Java. It is difficult to test, has no IDE support, and is slow.
-
Always use separate JARs for your Java UDF scripts.
-
If you build an external JAR, it depends on the UDF script's API for Java. You must include the UDF script's API with the Java package. That package can be found on Exasol's artifactory.
-
By default, the script main class (which includes the methods
run()
and optionallyinit()
andcleanup()
) must be named exactly like the name of the script. Make sure to consider the general rules for identifiers. -
You can also specify the script class explicitly using the keyword
%scriptclass
. For example,%scriptclass com.mycompany.MyScriptClass;
).
All classes that are defined directly in the script go implicitly inside the
Java package package com.exasol
. The statement package
com.exasol;
is implicitly added to the beginning of the script code.
All callback functions (run()
, init()
, cleanup()
, getDefaultOutputColumns()
,
and generateSqlForImportSpec()
) have to be
implemented within the script main class.
Exasol UDF API for Java
The Exasol UDF API for Java library facilitates the development of Java code. All released versions of the library artifact as well as examples of use with different package managers can be found on Maven Central.
The library is used by specifying the groupId, artifactId, and version as a dependency in your project.
Examples
Replace the version number used in the following examples (1.0.2) with the latest released version. For information about the latest version number, refer to Maven Central.
Maven:
<dependency>
<groupId>com.exasol</groupId>
<artifactId>udf-api-java</artifactId>
<version>1.0.2</version>
</dependency>
Gradle (Groovy):
Gradle (Kotlin):
An example of a complete Maven POM file can be found here: https://github.com/exasol/virtual-schema-common-java/blob/main/pom.xml.
API documentation
The full documentation for the API is available here: https://exasol.github.io/udf-api-java/com/exasol/package-summary.html.
run(), init() and cleanup() methods
The method static <Type> run(ExaMetadata, ExaIterator)
is called for each input
tuple (SCALAR) or each group (SET). Its parameter is a metadata object (ExaMetadata) and an iterator
for the data access (ExaIterator).
To initialize expensive steps (opening external connections), you can define the method static
void init(ExaMetadata)
. This will be executed once at the start by each virtual machine. For deinitialization purposes, the method static void cleanup(ExaMetadata)
exists which
is called once for each virtual machine, at the end of execution.
Data Iterator
The following methods are provided in the class ExaIterator:
next()
emit(Object... values)
reset()
size()
getInteger(String name)
andgetInteger(int column)
getLong(String name)
andgetLong(int column)
getBigDecimal(String name)
andgetBigDecimal(int column)
getDouble(String name)
andgetDouble(int column)
getString(String name)
andgetString(int column)
getBoolean(String name)
andgetBoolean(int column)
getDate(String name)
andgetDate(int column)
getTimestamp(String name)
andgetTimestamp(int column)
Usage notes
-
If the input data is empty, the
run()
method will not be called and the NULL value is returned as the result (similar to aggregate functions). For example:SELECT MAX(x) FROM t WHERE false
. -
The
next()
method allows you to iterate through data in scripts that have multiple input tuples per call (keyword SET). The iterator initially points to the first input row. For iterating you can use awhile(true)
loop which is aborted in caseif (!ctx.next())
.The
next()
method is only applicable to SET scripts. -
The
emit()
method allows you to return multiple output tuples per call (keyword EMITS).The
emit()
method expects as many parameters as output columns were defined. In the case of dynamic output parameters, it is handy in Java to use an Object Array. For example:iter.emit(new Object[] {1,"a"})
).The
emit()
method is only allowed for SET scripts. -
The
reset()
method resets the iterator to the first input element. This allows you to do multiple iterations through the data if this is necessary for your algorithm.The
reset()
method is only allowed for SET scripts. -
The
size()
method returns the number of input rows in scripts.In SET scripts the method returns the number of rows for the current group.
In SCALAR scripts the size is always 1, since the input to a scalar script is a single row.
Parameters
The internal Java data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:
Datatype SQL | Datatype Java |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The value null
is the equivalent of the SQL NULL
.
The input data can be accessed through get()
methods. For example, ctx.getString("url")
, where
ctx
refers to the name of the context parameter that is passed to the run()
method.
You can use a dynamic number of parameters through the notation (...)
, for example, CREATE JAVA SCALAR
SCRIPT my_script (...)
. The parameters can then be accessed through an index,
ctx.getString(0)
for the first parameter. The number of parameters and their data types, which are both
determined during the call of the script, are part of the metadata.
Metadata
You can access the following metadata through the object ExaMetadata
.
Metadata | Description |
---|---|
|
Database name |
|
Database version |
|
Name and version of the language |
String getScriptName()
|
Name of the script |
String getScriptSchema()
|
Schema in which the script is stored |
String getCurrentSchema()
|
Schema which is currently opened |
String getScriptCode()
|
Text of the script |
String getSessionId()
|
Session ID |
long getStatementId()
|
Statement ID within the session |
long getCurrentUser()
|
Current user |
long getScopeUser()
|
Scope user (getCurrentUser() or the owner of a view if the udf script is called within a view) |
long getNodeCount()
|
Number of nodes in the cluster |
long getNodeId()
|
Local node i, starting with 0 |
String getVmId()
|
Unique ID for the local machine (the ids of the virtual machines have no relation to each other) |
BigInteger getMemoryLimit()
|
Memory limit of the Java process |
String getInputType()
|
Type of the input data (SCALAR or SET) |
long getInputColumnCount()
|
Number of input columns |
String getInputColumnName(int column)
|
Name of an input column |
Class getInputColumnType(int column)
|
Type of an input column |
String getInputColumnSqlType(int column)
|
SQL type of an input column |
long getInputColumnPrecision(int column)
|
Precision of an input column |
long getInputColumnScale(int column)
|
Scale of an input column |
long getInputColumnLength(int column)
|
Length of an input column |
String getOutputType()
|
Type of the output data (RETURNS or EMITS) |
long getOutputColumnCount()
|
Number of output columns |
String getOutputColumnName(int column)
|
Name of an output column |
Class getOutputColumnType(int column)
|
Type of an output column |
String getOutputColumnSqlType(int column)
|
SQL type of an output column |
long getOutputColumnPrecision(
int column)
|
Precision of an output column |
long getOutputColumnScale(int column)
|
Scale of an output column |
long getOutputColumnLength(int column)
|
Length of an output column |
Class importScript(String name)
|
Import of another script |
Exceptions
The following Exasol-specific exceptions can be thrown:
ExaCompilationException
ExaDataTypeException
ExaIterationException
Import of other scripts
Besides importScript()
method of class ExaMetadata
, other scripts can be imported
using the keyword %import
. After that, the script is accessible
in the namespace (for example, OTHER_SCRIPT.my_method()
).
Syntax
Examples
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.JAVADEMO() RETURNS VARCHAR(2000) AS
class JAVADEMO {
static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
return "Minimal Java UDF";
}
}
/
SELECT TEST.JAVADEMO();
CREATE SCHEMA IF NOT EXISTS LIB;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT LIB.MYLIB() RETURNS VARCHAR(2000) AS class MYLIB {
public static String helloWorld(){
return "Hello Java World!";
}
}
/
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS
%import LIB.MYLIB;
public class MYHELLOWORLD {
public static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
return MYLIB.helloWorld();
}
}
/
select TEST.MYHELLOWORLD();
Access connection definitions
The data that has been specified when defining connections with CREATE CONNECTION
is available
in Java UDF scripts through method ExaMetadata.getConnection(String connectionName)
. The result is a Java object that implements the Interface com.exasol.ExaConnectionInformation
which features the following methods:
Methods | Description |
---|---|
|
The type of the connection definition. |
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
The following example shows how to access the connection definitions in a Java UDF.
Example
CREATE CONNECTION MY_CONNECTION
TO 'http://my-server.com'
USER 'my_user'
IDENTIFIED BY 'my_password';
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT RETURN_CONNECTION()
EMITS (TYPE_FIELD VARCHAR(2000000),ADDRESS_FIELD VARCHAR(2000000),USER_FIELD VARCHAR(2000000),PW_FIELD VARCHAR(2000000)) AS
public class RETURN_CONNECTION {
public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
String connectionName = "MY_CONNECTION";
ExaConnectionInformation connection = exa.getConnection(connectionName);
ExaConnectionInformation.ConnectionType type=connection.getType();
String address = connection.getAddress();
String userName = connection.getUser();
String pass = connection.getPassword();
ctx.emit(type.toString(),address,userName,pass);
}
}
/
SELECT RETURN_CONNECTION() FROM DUAL;
JVM options
To enable tuning script performance depending on its memory requirements, the %jvmoption
keyword can be used to specify the following Java VM options:
- Initial heap size (-Xms)
- Maximum heap size (-Xmx)
- Thread stack size (-Xss)
Example, %jvmoption -Xms128m -Xmx1024m -Xss512k;
This sets the initial Java heap size to 128 MiB, maximum heap size to 1024 MiB, and thread stack size to 512 KiB.
If multiple values are given for a single option (for example, resulting from the import of another script), the last value will be used.
Dynamic output parameters callback function
If the UDF script was defined with dynamic output parameters and the output parameters can not be
determined (by specifying EMITS
in the query or through INSERT INTO SELECT
), the database calls the method public static String getDefaultOutputColumns(ExaMetadata exa)
which you can implement. The expected return value is a String with the names and types of the output
columns, for example, "a int, b varchar(100)"
. See Dynamic input and output parameters for an
explanation when this method is called including examples.
The method will be executed only once on a single node.
This function should be implemented within the script's main class.
User Defined Import Callback Function
To support a user defined import you can implement the callback method public static String generateSqlForImportSpec(ExaMetadata meta, ExaImportSpecification importSpec)
. See Dynamic input and output parameters and the IMPORT for the syntax. The parameter importSpec
contains all information about the executed IMPORT FROM SCRIPT
statement. The function has to generate and return a SELECT
statement which will retrieve the data to be imported.
importSpec
is an object with the following fields:
Fields | Description |
---|---|
|
Parameters specified in the IMPORT statement. |
|
This is true, if the |
|
If |
|
If |
|
This method returns true if a connection was specified. The
UDF script can then obtain the connection information through
|
|
If |
|
This returns true if connection information was provided. The
UDF script can then obtain the connection information through
|
|
If The password is transferred into plaintext and could be visible in the logs. Therefore, It is recommended to create a This function should be implemented within the script's main class. |
User Defined Export Callback Function
To support a user defined export, you can implement the callback method public static String generateSqlForExportSpec(ExaMetadata meta, ExaExportSpecification export_spec)
. See Dynamic input and output parameters and the EXPORT for the syntax. The parameter export_spec
contains all information about the executed EXPORT INTO SCRIPT
statement. The function has to generate and return a SELECT
statement which will generate the data to be exported. The FROM
part of that string can be a dummy table (DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully.
export_spec
is an object with the following fields:
Fields | Description |
---|---|
|
Parameters specified in the |
|
List of column names of the resulting table that shall be exported. |
|
Boolean value from the |
|
Boolean value from the |
|
Boolean value from the |
|
String value from the |
|
This method returns true if a connection was specified. |
|
If |
|
This returns true if connection information was provided. The
UDF script can then obtain the connection information through
|
|
If The password is transferred into plaintext and could be visible in the logs. Therefore, It is recommended to create a This function should be implemented within the script's main class. |
Example
/*
This example loads from a webserver
and processes the following file goalies.xml:
<?xml version='1.0' encoding='UTF-8'?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>
*/
--/
CREATE JAVA SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class PROCESS_USERS {
public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
URL url = new URL(ctx.getString("url"));
URLConnection conn = url.openConnection();
DocumentBuilder docBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = docBuilder.parse(conn.getInputStream());
NodeList nodes =
doc.getDocumentElement().getElementsByTagName("user");
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.ELEMENT_NODE)
continue;
Element elem = (Element)nodes.item(i);
if (!elem.getAttribute("active").equals("1"))
continue;
Node name = elem.getElementsByTagName("first_name").item(0);
String firstName = name.getChildNodes().item(0).getNodeValue();
name = elem.getElementsByTagName("last_name").item(0);
String lastName = name.getChildNodes().item(0).getNodeValue();
ctx.emit(firstName, lastName);
}
}
}
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;
Adapter script callback function
For virtual schemas an adapter script must define the function String adapterCall(final ExaMetadata metadata, final String requestJson)
. The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. For the Virtual Schema API Documentation, see Information for Developers. The callback function will be executed only on a single node.