Java

This article explains how to use Java for UDF scripting in Exasol.

Exasol supports Java 11 with OpenJDK 11 and Java 17 with OpenJDK 17.

The script language container (SLC) for Java is provided in different flavors for Java 11 and Java 17, with two different versions of the script options parser. Version 1 is the default parser for Java 11, and version 2 is the default parser for Java 17.

For more information about the Java language, refer to the Java Documentation.

Java main class

Usage notes

  • Do not use inline Java. It is difficult to test, has no IDE support, and is slow.

  • Always use separate JARs for your Java UDF scripts.

  • If you build an external JAR, you must include the Exasol UDF API for Java with the Java package.

  • All classes that are defined directly in the script go implicitly inside the com.exasol Java package.

    The statement package com.exasol; is implicitly added to the beginning of the script code.

  • The callback functions run(), init(), cleanup(), getDefaultOutputColumns(), and generateSqlForImportSpec() must be implemented within the main class in the script.

  • The name of the script main class must be identical to the name of the script and must follow the general rules for identifiers.

  • You can specify the script class explicitly using the keyword %scriptclass. For example, %scriptclass com.mycompany.MyScriptClass;).

For more details about the syntax for %jvmoption, %jar, %import, %scriptclass, and other script options, see Exasol UDF Script Options: User Documentation.

Exasol UDF API for Java

The Exasol UDF API for Java library facilitates the development of Java code. All released versions of the library artifact as well as examples of use with different package managers can be found on Maven Central.

The library is used by specifying the groupId, artifactId, and version as a dependency in your project.

Examples

Replace the version number used in the examples (1.0.5) with the number of the latest released version on Maven Central.

Example 1 - Maven:
<dependency>
    <groupId>com.exasol</groupId>
    <artifactId>udf-api-java</artifactId>
    <version>1.0.5</version>
</dependency>
Example 2 - Gradle (Groovy):
implementation 'com.exasol:udf-api-java:1.0.5'
Example 3 - Gradle (Kotlin):
implementation("com.exasol:udf-api-java:1.0.5")

Example of a complete Maven POM file

https://github.com/exasol/virtual-schema-common-java/blob/main/pom.xml.

API documentation

https://exasol.github.io/udf-api-java/com/exasol/package-summary.html.

Parameters

The internal Java data types are not identical to SQL data types, which means that casts are necessary for input and output data.

SQL data type Java data type

DECIMAL(p,0)

Integer, Long, BigDecimal

DECIMAL(p,s)

BigDecimal

DOUBLE

Double

DATE

java.sql.Date

TIMESTAMP

java.sql.Timestamp

BOOLEAN

Boolean

VARCHAR and CHAR

String

Other

not supported

The Java literal null is the equivalent of the SQL value NULL.

The input data can be accessed through get() functions. For example, ctx.getString("url"), where ctx refers to the name of the context parameter that is passed to the run() function.

You can use a dynamic number of parameters through the notation (...), for example, CREATE JAVA SCALAR SCRIPT my_script (...). The parameters can then be accessed through an index where ctx.getString(0) is the first parameter.

The number of parameters and their data types are part of the metadata and are determined during the call of the script.

Metadata

You can access the following metadata through the object ExaMetadata:

Metadata Description

String getDatabaseName()

Database name

String getDatabaseVersion()

Database version

String getScriptLanguage()

Name and version of the language

String getScriptName()

Name of the script

String getScriptSchema()

Schema in which the script is stored

String getCurrentSchema()

Schema which is currently opened

String getScriptCode()

Text of the script

String getSessionId()

Session ID

long getStatementId()

Statement ID within the session

long getCurrentUser()

Current user

long getScopeUser()

Scope user (getCurrentUser() or the owner of a view if the udf script is called within a view)

long getNodeCount()

Number of nodes in the cluster

long getNodeId()

Local node i, starting with 0

String getVmId()

Unique ID for the local machine (the ids of the virtual machines have no relation to each other)

BigInteger getMemoryLimit()

Memory limit of the running UDF process

String getInputType()

Type of the input data (SCALAR or SET)

long getInputColumnCount()

Number of input columns

String getInputColumnName(int column)

Name of an input column

Class getInputColumnType(int column)

Type of an input column

String getInputColumnSqlType(int column)

SQL type of an input column

long getInputColumnPrecision(int column)

Precision of an input column

long getInputColumnScale(int column)

Scale of an input column

long getInputColumnLength(int column)

Length of an input column

String getOutputType()

Type of the output data (RETURNS or EMITS)

long getOutputColumnCount()

Number of output columns

String getOutputColumnName(int column)

Name of an output column

Class getOutputColumnType(int column)

Type of an output column

String getOutputColumnSqlType(int column)

SQL type of an output column

long getOutputColumnPrecision( int column)

Precision of an output column

long getOutputColumnScale(int column)

Scale of an output column

long getOutputColumnLength(int column)

Length of an output column

Class importScript(String name)

Import of another script

Callback functions

run(), init(), cleanup()

The callback function static <Type> run(ExaMetadata, ExaIterator) is called for each input tuple (SCALAR) or each group (SET). Its parameter consists of a metadata object, ExaMetadata, and an iterator for the data access, ExaIterator.

The callback function static void init(ExaMetadata) can be used to initialize expensive steps (opening external connections). This function is called once for each virtual machine at the start of execution.

The callback function static void cleanup(ExaMetadata) can be used for deinitialization purposes. This function is called once for each virtual machine at the end of execution.

Dynamic output parameters callback function

If the UDF script was defined with dynamic output parameters, and if the output parameters cannot be determined by specifying EMITS in the query or through INSERT INTO SELECT, the database calls the following function that you can implement:

public static String getDefaultOutputColumns(ExaMetadata exa)

The expected return value is a string with the names and types of the output columns. For example, "a int, b varchar(100)".

For more details and examples, see Dynamic input and output parameters.

The function must be implemented within the script main class.

The function will be executed only once on a single node.

User defined import callback function

To support a user defined import you can implement the following callback function:

public static String generateSqlForImportSpec(ExaMetadata meta, ExaImportSpecification importSpec).

For details about the syntax, see Dynamic input and output parameters and IMPORT.

The importSpec parameter contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement that will retrieve the data to be imported.

The importSpec parameter is an object with the following fields:

Fields Description

Map<String, String> getParameters()

Parameters specified in the IMPORT statement.

boolean isSubselect()

Returns true if the IMPORT statement is used inside a SELECT statement and not inside an IMPORT INTO table statement.

List<String> getSubselectColumnNames()

Returns the names of all specified columns if isSubselect() is true and the user specified the target column names and types.

List<String> getSubselectColumnTypes()

Returns the types of all specified columns if isSubselect() is true and the user specified the target column names and types.

The types are returned in SQL format, for example "VARCHAR(100)".

boolean hasConnectionName()

Returns true if a connection was specified. The UDF script can then obtain the connection information through ExaMetadata.getConnection(name).

String getConnectionName()

Returns the name of the specified connection if hasConnection() is true.

boolean hasConnectionInformation()

Returns true if connection information was provided. The UDF script can then obtain the connection information through getConnectionInformation().

ExaConnectionInformation getConnectionInformation()

Returns the connection information provided by the user If hasConnectionInformation() is true.

This function must be implemented within the main class in the script.

Passwords in the connection information will be transferred as plaintext and can be visible in the logs. We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials, and in the script only specify the the name of the connection, which can be obtained from the connection_name field. The actual connection information can then be obtained through ExaMetadata.getConnection(connectionName).

See also Access connection definitions.

User defined export callback function

To support a user defined export you can implement the following callback function:

public static String generateSqlForExportSpec(ExaMetadata meta, ExaExportSpecification export_spec).

For details about the syntax, see Dynamic input and output parameters and EXPORT.

The export_spec parameter contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT statement that will generate the data to be exported. The FROM part of that string is always required but can be a dummy table (DUAL) since the export command is aware which table should be exported.

The export_spec parameter is an object with the following fields:

Fields Description

Map<String, String> getParameters()

Parameters specified in the EXPORT statement.

List<String> getSourceColumnNames()

List of column names of the resulting table that shall be exported.

boolean hasTruncate()

Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.

boolean hasReplace()

Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.

boolean hasCreatedBy()

Boolean value from the EXPORT command option that defines whether a creation text was specified or not.

String getCreatedBy()

String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.

boolean hasConnectionName()

Returns true if a connection was specified.

String getConnectionName()

Returns the name of the specified connection if hasConnection() is true.

boolean hasConnectionInformation()

Returns true if connection information was provided. The UDF script can then obtain the connection information through getConnectionInformation().

ExaConnectionInformation getConnectionInformation()

Returns the connection information provided by the user if hasConnectionInformation() is true.

This function must be implemented within the main class in the script.

Passwords in the connection information will be transferred as plaintext and can be visible in the logs. We recommend that you use CREATE CONNECTION to create a connection object to store connection strings and credentials, and in the script only specify the the name of the connection, which can be obtained from the connection_name field. The actual connection information can then be obtained through ExaMetadata.getConnection(connectionName).

See also Access connection definitions.

Example:
/*
    This example loads from a webserver
    and processes the following file goalies.xml:

    <?xml version='1.0' encoding='UTF-8'?>
    <users>
        <user active="1">
        <first_name>Manuel</first_name>
        <last_name>Neuer</last_name>
        </user>
        <user active="1">
        <first_name>Joe</first_name>
        <last_name>Hart</last_name>
        </user>
        <user active="0">
        <first_name>Oliver</first_name>
        <last_name>Kahn</last_name>
        </user>
    </users>
*/
 
--/
    CREATE JAVA SCALAR SCRIPT process_users(url VARCHAR(500))
    EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
  
    import java.net.URL;
    import java.net.URLConnection;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.NodeList;
    import org.w3c.dom.Node;
    import org.w3c.dom.Element;
   
  public class PROCESS_USERS {
      public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
      URL url = new URL(ctx.getString("url"));
      URLConnection conn = url.openConnection();
      DocumentBuilder docBuilder =
      DocumentBuilderFactory.newInstance().newDocumentBuilder();
      Document doc = docBuilder.parse(conn.getInputStream());
      NodeList nodes =
      doc.getDocumentElement().getElementsByTagName("user");
      for (int i = 0; i < nodes.getLength(); i++) {
             if (nodes.item(i).getNodeType() != Node.ELEMENT_NODE)
             continue;
             Element elem = (Element)nodes.item(i);
             if (!elem.getAttribute("active").equals("1"))
                continue;
             Node name = elem.getElementsByTagName("first_name").item(0);
         String firstName = name.getChildNodes().item(0).getNodeValue();
         name = elem.getElementsByTagName("last_name").item(0);
         String lastName = name.getChildNodes().item(0).getNodeValue();
         ctx.emit(firstName, lastName);
      }
       }
}
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;

Adapter script callback function

For virtual schemas, an adapter script must define the following function:

String adapterCall(final ExaMetadata metadata, final String requestJson).

The parameter in this callback function is a JSON string containing the virtual schema API request. The return value must also be a JSON string containing the response.

The function will be executed only once on a single node.

For virtual schema API documentation, see Information for Developers.

Data iterator

The following functions are provided in the ExaIterator class:

  • next()
  • emit(Object... values)
  • reset()
  • size()
  • getInteger(String name) and getInteger(int column)
  • getLong(String name) and getLong(int column)
  • getBigDecimal(String name) and getBigDecimal(int column)
  • getDouble(String name) and getDouble(int column)
  • getString(String name) and getString(int column)
  • getBoolean(String name) and getBoolean(int column)
  • getDate(String name) and getDate(int column)
  • getTimestamp(String name) and getTimestamp(int column)

Usage notes

  • If the input data is empty, the run() function will not be called and the NULL value is returned as the result (similar to aggregate functions). For example: SELECT MAX(x) FROM t WHERE false.

  • The next() function allows you to iterate through data in scripts that have multiple input tuples per call (keyword SET). The iterator initially points to the first input row. For iterating you can use a while(true) loop that is aborted in case if (!ctx.next()).

    The next() function is only applicable to SET scripts.

  • The emit() function allows you to return multiple output tuples per call (keyword EMITS).

    This function expects as many parameters as there are defined output columns. In the case of dynamic output parameters, it is handy in Java to use an object array. For example: iter.emit(new Object[]{1,"a"})).

    The emit() function is only allowed for SET scripts.

  • The reset() function resets the iterator to the first input element. This allows you to do multiple iterations through the data if this is necessary for your algorithm.

    The reset() function is only allowed for SET scripts.

  • The size() function returns the number of input rows in scripts.

    • In SET scripts the function returns the number of rows for the current group.

    • In SCALAR scripts the size is always 1, since the input to a scalar script is a single row.

Access connection definitions

When you define a connection with CREATE CONNECTION, the data in the connection object can be obtained by a Java UDF script through the function ExaMetadata.getConnection(String connectionName). The result is a Java object that implements the Interface com.exasol.ExaConnectionInformation, which features the following functions:

Function Description

ExaConnectionInformation.ConnectionType ExaConnectionInformation.getType()

The type of the connection definition.

ConnectionType is an enumeration that only contains the PASSWORD entry.

String ExaConnectionInformation.getAddress()

The part of the connection definition that followed the TO keyword in the CREATE CONNECTION command.

String ExaConnectionInformation.getUser()

The part of the connection definition that followed the USER keyword in the CREATE CONNECTION command.

String ExaConnectionInformation.getPassword()

The part of the connection definition that followed the IDENTIFIED BY keyword in the CREATE CONNECTION command.

Example:

The following example shows how to access the connection definitions in a Java UDF.

CREATE CONNECTION MY_CONNECTION
TO 'http://my-server.com'
USER 'my_user'
IDENTIFIED BY 'my_password';

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT RETURN_CONNECTION()
EMITS (TYPE_FIELD VARCHAR(2000000),ADDRESS_FIELD VARCHAR(2000000),USER_FIELD VARCHAR(2000000),PW_FIELD VARCHAR(2000000)) AS
public class RETURN_CONNECTION {
    public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
        String connectionName = "MY_CONNECTION";
        ExaConnectionInformation connection = exa.getConnection(connectionName);
        ExaConnectionInformation.ConnectionType type=connection.getType();
        String address = connection.getAddress();
        String userName = connection.getUser();
        String pass = connection.getPassword();
        ctx.emit(type.toString(),address,userName,pass);
    }
}
/

SELECT RETURN_CONNECTION() FROM DUAL;

JVM options

To enable tuning script performance depending on its memory requirements, you can use the %jvmoption keyword to specify the following Java VM options:

  • Initial heap size (-Xms)
  • Maximum heap size (-Xmx)
  • Thread stack size (-Xss)
Example:

This example sets the initial Java heap size to 128 MiB, maximum heap size to 1024 MiB, and thread stack size to 512 KiB.

%jvmoption -Xms128m -Xmx1024m -Xss512k;

If multiple values are given for a single option (for example, resulting from the import of another script), the last value will be used.

Import other scripts

In addition to the importScript() function of class ExaMetadata, you can import other scripts by using the keyword %import. When a script has been imported it will be accessible in the namespace. For example: OTHER_SCRIPT.my_method().

Syntax
%import <schema>.<script>
Examples:
CREATE SCHEMA IF NOT EXISTS TEST;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.JAVADEMO() RETURNS VARCHAR(2000) AS

class JAVADEMO {
  static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
    return "Minimal Java UDF";
  }
}
/

SELECT TEST.JAVADEMO();

CREATE SCHEMA IF NOT EXISTS LIB;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT LIB.MYLIB() RETURNS VARCHAR(2000) AS class MYLIB {
 public static String helloWorld(){
        return "Hello Java World!";
 }
}
/

CREATE SCHEMA IF NOT EXISTS TEST;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS 

%import LIB.MYLIB;

public class MYHELLOWORLD {
 public static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
 return MYLIB.helloWorld();
 }
}
/

select TEST.MYHELLOWORLD();

Exceptions

The following Exasol-specific exceptions can be thrown:

ExaCompilationException

This exception happens if the Java code of the UDF cannot be compiled. Possible reasons are:

  • Invalid Java code
  • Missing dependencies (%jar options)
  • Invalid script options

ExaDataTypeException

This exception happens if the input/output data types of the UDF declaration do not match with the Java source code. For example:

--/
CREATE OR REPLACE java SCALAR SCRIPT test() RETURNS DOUBLE AS 
class TEST {
    static String run(ExaMetadata exa, ExaIterator ctx) throws Exception
    {
        return "Hello World";
    }
};
/

results in:

SQL Error [22002]: VM error: F-UDF-CL-LIB-1127: F-UDF-CL-SL-JAVA-1002: F-UDF-CL-SL-JAVA-1013: 
com.exasol.ExaDataTypeException: E-UDF-CL-SL-JAVA-1114: emit column 'RETURN' is of type DOUBLE but data given have type java.lang.String (Session: 1832716691549061120)

ExaIterationException

This exception happens if the UDF tries to access the data iterator after the iteration has ended (calling next after the iterator returned false).