Java
This article explains how to use Java for UDF scripting in Exasol.
Exasol supports Java 11 with OpenJDK 11 and Java 17 with OpenJDK 17.
The script language container (SLC) for Java is provided in different flavors for Java 11 and Java 17, with two different versions of the script options parser. Version 1 is the default parser for Java 11, and version 2 is the default parser for Java 17.
-
For more information and downloads, see Exasol Script Languages.
-
To learn more about the differences between the two script option parser versions, see Exasol UDF Script Options: User Documentation.
-
To learn how to migrate UDF scripts from version 1 to version 2, see the Migration Guide.
For more information about the Java language, refer to the Java Documentation.
Java main class
Usage notes
-
Do not use inline Java. It is difficult to test, has no IDE support, and is slow.
-
Always use separate JARs for your Java UDF scripts.
-
If you build an external JAR, you must include the Exasol UDF API for Java with the Java package.
-
All classes that are defined directly in the script go implicitly inside the
com.exasol
Java package.The statement
package com.exasol;
is implicitly added to the beginning of the script code. -
The callback functions
run()
,init()
,cleanup()
,getDefaultOutputColumns()
, andgenerateSqlForImportSpec()
must be implemented within the main class in the script. -
The name of the script main class must be identical to the name of the script and must follow the general rules for identifiers.
-
You can specify the script class explicitly using the keyword
%scriptclass
. For example,%scriptclass com.mycompany.MyScriptClass;
).
For more details about the syntax for %jvmoption
, %jar
, %import
, %scriptclass
, and other script options, see Exasol UDF Script Options: User Documentation.
Exasol UDF API for Java
The Exasol UDF API for Java library facilitates the development of Java code. All released versions of the library artifact as well as examples of use with different package managers can be found on Maven Central.
The library is used by specifying the groupId
, artifactId
, and version
as a dependency in your project.
Examples
Replace the version number used in the examples (1.0.5) with the number of the latest released version on Maven Central.
Example 1 - Maven:
<dependency>
<groupId>com.exasol</groupId>
<artifactId>udf-api-java</artifactId>
<version>1.0.5</version>
</dependency>
Example 2 - Gradle (Groovy):
Example 3 - Gradle (Kotlin):
Example of a complete Maven POM file
https://github.com/exasol/virtual-schema-common-java/blob/main/pom.xml.
API documentation
https://exasol.github.io/udf-api-java/com/exasol/package-summary.html.
Parameters
The internal Java data types are not identical to SQL data types, which means that casts are necessary for input and output data.
SQL data type | Java data type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Java literal null
is the equivalent of the SQL value NULL
.
The input data can be accessed through get()
functions. For example, ctx.getString("url")
, where
ctx
refers to the name of the context parameter that is passed to the run()
function.
You can use a dynamic number of parameters through the notation (...)
, for example, CREATE JAVA SCALAR
SCRIPT my_script (...)
. The parameters can then be accessed through an index where
ctx.getString(0)
is the first parameter.
The number of parameters and their data types are part of the metadata and are determined during the call of the script.
Metadata
You can access the following metadata through the object ExaMetadata
:
Metadata | Description |
---|---|
|
Database name |
|
Database version |
|
Name and version of the language |
String getScriptName()
|
Name of the script |
String getScriptSchema()
|
Schema in which the script is stored |
String getCurrentSchema()
|
Schema which is currently opened |
String getScriptCode()
|
Text of the script |
String getSessionId()
|
Session ID |
long getStatementId()
|
Statement ID within the session |
long getCurrentUser()
|
Current user |
long getScopeUser()
|
Scope user (getCurrentUser() or the owner of a view if the udf script is called within a view) |
long getNodeCount()
|
Number of nodes in the cluster |
long getNodeId()
|
Local node i, starting with 0 |
String getVmId()
|
Unique ID for the local machine (the ids of the virtual machines have no relation to each other) |
BigInteger getMemoryLimit()
|
Memory limit of the running UDF process |
String getInputType()
|
Type of the input data (SCALAR or SET) |
long getInputColumnCount()
|
Number of input columns |
String getInputColumnName(int column)
|
Name of an input column |
Class getInputColumnType(int column)
|
Type of an input column |
String getInputColumnSqlType(int column)
|
SQL type of an input column |
long getInputColumnPrecision(int column)
|
Precision of an input column |
long getInputColumnScale(int column)
|
Scale of an input column |
long getInputColumnLength(int column)
|
Length of an input column |
String getOutputType()
|
Type of the output data (RETURNS or EMITS) |
long getOutputColumnCount()
|
Number of output columns |
String getOutputColumnName(int column)
|
Name of an output column |
Class getOutputColumnType(int column)
|
Type of an output column |
String getOutputColumnSqlType(int column)
|
SQL type of an output column |
long getOutputColumnPrecision(
int column)
|
Precision of an output column |
long getOutputColumnScale(int column)
|
Scale of an output column |
long getOutputColumnLength(int column)
|
Length of an output column |
Class importScript(String name)
|
Import of another script |
Callback functions
run(), init(), cleanup()
The callback function static <Type> run(ExaMetadata, ExaIterator)
is called for each input
tuple (SCALAR) or each group (SET). Its parameter consists of a metadata object, ExaMetadata
, and an iterator
for the data access, ExaIterator
.
The callback function static void init(ExaMetadata)
can be used to initialize expensive steps (opening external connections). This function is called once for each virtual machine at the start of execution.
The callback function static void cleanup(ExaMetadata)
can be used for deinitialization purposes. This function is called once for each virtual machine at the end of execution.
Dynamic output parameters callback function
If the UDF script was defined with dynamic output parameters, and if the output parameters cannot be
determined by specifying EMITS
in the query or through INSERT INTO SELECT
, the database calls the following function that you can implement:
public static String getDefaultOutputColumns(ExaMetadata exa)
The expected return value is a string with the names and types of the output
columns. For example, "a int, b varchar(100)"
.
For more details and examples, see Dynamic input and output parameters.
The function must be implemented within the script main class.
The function will be executed only once on a single node.
User defined import callback function
To support a user defined import you can implement the following callback function:
public static String generateSqlForImportSpec(ExaMetadata meta, ExaImportSpecification importSpec)
.
For details about the syntax, see Dynamic input and output parameters and IMPORT.
The importSpec
parameter contains all information about the executed IMPORT FROM SCRIPT
statement. The function has to generate and return a SELECT
statement that will retrieve the data to be imported.
The importSpec
parameter is an object with the following fields:
Fields | Description |
---|---|
|
Parameters specified in the |
|
Returns true if the |
|
Returns the names of all specified
columns if |
|
Returns the types of all specified
columns if The types are returned in SQL format, for example |
|
Returns true if a connection was specified. The
UDF script can then obtain the connection information through
|
|
Returns the name of the
specified connection if |
|
Returns true if connection information was provided. The
UDF script can then obtain the connection information through
|
|
Returns
the connection information provided by the user If This function must be implemented within the main class in the script. |
Passwords in the connection information will be transferred as plaintext and can be visible in the logs. We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials, and in the script only specify the the
name of the connection, which can be obtained from the connection_name
field. The actual
connection information can then be obtained through ExaMetadata.getConnection(connectionName)
.
See also Access connection definitions.
User defined export callback function
To support a user defined export you can implement the following callback function:
public static String generateSqlForExportSpec(ExaMetadata meta, ExaExportSpecification export_spec)
.
For details about the syntax, see Dynamic input and output parameters and EXPORT.
The export_spec
parameter contains all information about the executed EXPORT INTO SCRIPT
statement. The function has to generate and return a SELECT
statement that will generate the data to be exported. The FROM
part of that string is always required but can be a dummy table (DUAL) since the export command is aware which table should be exported.
The export_spec
parameter is an object with the following fields:
Fields | Description |
---|---|
|
Parameters specified in the |
|
List of column names of the resulting table that shall be exported. |
|
Boolean value from the |
|
Boolean value from the |
|
Boolean value from the |
|
String value from the |
|
Returns true if a connection was specified. |
|
Returns the name of the
specified connection if |
|
Returns true if connection information was provided. The
UDF script can then obtain the connection information through
|
|
Returns
the connection information provided by the user if This function must be implemented within the main class in the script. |
Passwords in the connection information will be transferred as plaintext and can be visible in the logs. We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials, and in the script only specify the the
name of the connection, which can be obtained from the connection_name
field. The actual
connection information can then be obtained through ExaMetadata.getConnection(connectionName)
.
See also Access connection definitions.
Example:
/*
This example loads from a webserver
and processes the following file goalies.xml:
<?xml version='1.0' encoding='UTF-8'?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>
*/
--/
CREATE JAVA SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class PROCESS_USERS {
public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
URL url = new URL(ctx.getString("url"));
URLConnection conn = url.openConnection();
DocumentBuilder docBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = docBuilder.parse(conn.getInputStream());
NodeList nodes =
doc.getDocumentElement().getElementsByTagName("user");
for (int i = 0; i < nodes.getLength(); i++) {
if (nodes.item(i).getNodeType() != Node.ELEMENT_NODE)
continue;
Element elem = (Element)nodes.item(i);
if (!elem.getAttribute("active").equals("1"))
continue;
Node name = elem.getElementsByTagName("first_name").item(0);
String firstName = name.getChildNodes().item(0).getNodeValue();
name = elem.getElementsByTagName("last_name").item(0);
String lastName = name.getChildNodes().item(0).getNodeValue();
ctx.emit(firstName, lastName);
}
}
}
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;
Adapter script callback function
For virtual schemas, an adapter script must define the following function:
String adapterCall(final ExaMetadata metadata, final String requestJson)
.
The parameter in this callback function is a JSON string containing the virtual schema API request. The return value must also be a JSON string containing the response.
The function will be executed only once on a single node.
For virtual schema API documentation, see Information for Developers.
Data iterator
The following functions are provided in the ExaIterator
class:
next()
emit(Object... values)
reset()
size()
getInteger(String name)
andgetInteger(int column)
getLong(String name)
andgetLong(int column)
getBigDecimal(String name)
andgetBigDecimal(int column)
getDouble(String name)
andgetDouble(int column)
getString(String name)
andgetString(int column)
getBoolean(String name)
andgetBoolean(int column)
getDate(String name)
andgetDate(int column)
getTimestamp(String name)
andgetTimestamp(int column)
Usage notes
-
If the input data is empty, the
run()
function will not be called and the NULL value is returned as the result (similar to aggregate functions). For example:SELECT MAX(x) FROM t WHERE false
. -
The
next()
function allows you to iterate through data in scripts that have multiple input tuples per call (keyword SET). The iterator initially points to the first input row. For iterating you can use awhile(true)
loop that is aborted in caseif (!ctx.next())
.The
next()
function is only applicable to SET scripts. -
The
emit()
function allows you to return multiple output tuples per call (keyword EMITS).This function expects as many parameters as there are defined output columns. In the case of dynamic output parameters, it is handy in Java to use an object array. For example:
iter.emit(new Object[]{1,"a"})
).The
emit()
function is only allowed for SET scripts. -
The
reset()
function resets the iterator to the first input element. This allows you to do multiple iterations through the data if this is necessary for your algorithm.The
reset()
function is only allowed for SET scripts. -
The
size()
function returns the number of input rows in scripts.-
In SET scripts the function returns the number of rows for the current group.
-
In SCALAR scripts the size is always 1, since the input to a scalar script is a single row.
-
Access connection definitions
When you define a connection with CREATE CONNECTION
, the data in the connection object can be obtained by a Java UDF script through the function ExaMetadata.getConnection(String connectionName)
. The result is a Java object that implements the Interface com.exasol.ExaConnectionInformation
, which features the following functions:
Function | Description |
---|---|
|
The type of the connection definition.
|
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
Example:
The following example shows how to access the connection definitions in a Java UDF.
CREATE CONNECTION MY_CONNECTION
TO 'http://my-server.com'
USER 'my_user'
IDENTIFIED BY 'my_password';
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT RETURN_CONNECTION()
EMITS (TYPE_FIELD VARCHAR(2000000),ADDRESS_FIELD VARCHAR(2000000),USER_FIELD VARCHAR(2000000),PW_FIELD VARCHAR(2000000)) AS
public class RETURN_CONNECTION {
public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
String connectionName = "MY_CONNECTION";
ExaConnectionInformation connection = exa.getConnection(connectionName);
ExaConnectionInformation.ConnectionType type=connection.getType();
String address = connection.getAddress();
String userName = connection.getUser();
String pass = connection.getPassword();
ctx.emit(type.toString(),address,userName,pass);
}
}
/
SELECT RETURN_CONNECTION() FROM DUAL;
JVM options
To enable tuning script performance depending on its memory requirements, you can use the %jvmoption
keyword to specify the following Java VM options:
- Initial heap size (
-Xms
) - Maximum heap size (
-Xmx
) - Thread stack size (
-Xss
)
Example:
This example sets the initial Java heap size to 128 MiB, maximum heap size to 1024 MiB, and thread stack size to 512 KiB.
%jvmoption -Xms128m -Xmx1024m -Xss512k;
If multiple values are given for a single option (for example, resulting from the import of another script), the last value will be used.
Import other scripts
In addition to the importScript()
function of class ExaMetadata
, you can import other scripts by using the keyword %import
. When a script has been imported it will be accessible
in the namespace. For example: OTHER_SCRIPT.my_method()
.
Syntax
Examples:
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.JAVADEMO() RETURNS VARCHAR(2000) AS
class JAVADEMO {
static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
return "Minimal Java UDF";
}
}
/
SELECT TEST.JAVADEMO();
CREATE SCHEMA IF NOT EXISTS LIB;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT LIB.MYLIB() RETURNS VARCHAR(2000) AS class MYLIB {
public static String helloWorld(){
return "Hello Java World!";
}
}
/
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS
%import LIB.MYLIB;
public class MYHELLOWORLD {
public static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
return MYLIB.helloWorld();
}
}
/
select TEST.MYHELLOWORLD();
Exceptions
The following Exasol-specific exceptions can be thrown:
-
ExaCompilationException
-
This exception happens if the Java code of the UDF cannot be compiled. Possible reasons are:
- Invalid Java code
- Missing dependencies (%jar options)
- Invalid script options
-
ExaDataTypeException
-
This exception happens if the input/output data types of the UDF declaration do not match with the Java source code. For example:
-
--/
CREATE OR REPLACE java SCALAR SCRIPT test() RETURNS DOUBLE AS
class TEST {
static String run(ExaMetadata exa, ExaIterator ctx) throws Exception
{
return "Hello World";
}
};
/results in:
-
ExaIterationException
-
This exception happens if the UDF tries to access the data iterator after the iteration has ended (calling
next
after the iterator returnedfalse
).