Java

This article explains how to use Java for UDF scripting in Exasol.

The following describes using Java 11 with OpenJDK 11. For more information about the Java language, refer to the Official Java Documentation.

Java main class

Guidelines for using Java:

  • Do not use inline Java. It is difficult to test, has no IDE support, and is slow.

  • Always use separate JARs for your Java UDF scripts.

  • If you build an external JAR, it depends on the UDF script's API for Java. You must include the UDF script's API with the Java package. That package can be found on Exasol's artifactory.

  • By default, the script main class (which includes the methods run() and optionally init() and cleanup()) must be named exactly like the name of the script. Make sure to consider the general rules for identifiers.

  • You can also specify the script class explicitly using the keyword %scriptclass. For example, %scriptclass com.mycompany.MyScriptClass;).

All classes that are defined directly in the script go implicitly inside the Java package package com.exasol. The statement package com.exasol; is implicitly added to the beginning of the script code.

All callback functions (run(), init(), cleanup(), getDefaultOutputColumns(), and generateSqlForImportSpec()) have to be implemented within the script main class.

Exasol UDF API for Java

The Exasol UDF API for Java library facilitates the development of Java code. All released versions of the library artifact as well as examples of use with different package managers can be found on Maven Central.

The library is used by specifying the groupId, artifactId, and version as a dependency in your project.

Examples

Replace the version number used in the following examples (1.0.2) with the latest released version. For information about the latest version number, refer to Maven Central.

Maven:

<dependency>
    <groupId>com.exasol</groupId>
    <artifactId>udf-api-java</artifactId>
    <version>1.0.2</version>
</dependency>

Gradle (Groovy):

implementation 'com.exasol:udf-api-java:1.0.2'

Gradle (Kotlin):

implementation("com.exasol:udf-api-java:1.0.2")

An example of a complete Maven POM file can be found here: https://github.com/exasol/virtual-schema-common-java/blob/main/pom.xml.

API documentation

The full documentation for the API is available here: https://exasol.github.io/udf-api-java/com/exasol/package-summary.html.

run(), init() and cleanup() methods

The method static <Type> run(ExaMetadata, ExaIterator) is called for each input tuple (SCALAR) or each group (SET). Its parameter is a metadata object (ExaMetadata) and an iterator for the data access (ExaIterator).

To initialize expensive steps (opening external connections), you can define the method static void init(ExaMetadata). This will be executed once at the start by each virtual machine. For deinitialization purposes, the method static void cleanup(ExaMetadata) exists which is called once for each virtual machine, at the end of execution.

Data Iterator

The following methods are provided in the class ExaIterator:

  • next()
  • emit(Object... values)
  • reset()
  • size()
  • getInteger(String name) and getInteger(int column)
  • getLong(String name) and getLong(int column)
  • getBigDecimal(String name) and getBigDecimal(int column)
  • getDouble(String name) and getDouble(int column)
  • getString(String name) and getString(int column)
  • getBoolean(String name) and getBoolean(int column)
  • getDate(String name) and getDate(int column)
  • getTimestamp(String name) and getTimestamp(int column)

Usage notes

  • If the input data is empty, the run() method will not be called and the NULL value is returned as the result (similar to aggregate functions). For example: SELECT MAX(x) FROM t WHERE false.

  • The next() method allows you to iterate through data in scripts that have multiple input tuples per call (keyword SET). The iterator initially points to the first input row. For iterating you can use a while(true) loop which is aborted in case if (!ctx.next()).

    The next() method is only applicable to SET scripts.

  • The emit() method allows you to return multiple output tuples per call (keyword EMITS).

    The emit() method expects as many parameters as output columns were defined. In the case of dynamic output parameters, it is handy in Java to use an Object Array. For example: iter.emit(new Object[] {1,"a"})).

    The emit() method is only allowed for SET scripts.

  • The reset() method resets the iterator to the first input element. This allows you to do multiple iterations through the data if this is necessary for your algorithm.

    The reset() method is only allowed for SET scripts.

  • The size() method returns the number of input rows in scripts.

    In SET scripts the method returns the number of rows for the current group.

    In SCALAR scripts the size is always 1, since the input to a scalar script is a single row.

Parameters

The internal Java data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:

Datatype SQL  Datatype Java

DECIMAL(p,0)

Integer, Long, BigDecimal

DECIMAL(p,s)

BigDecimal

DOUBLE

Double

DATE

java.sql.Date

TIMESTAMP

java.sql.Timestamp

BOOLEAN

Boolean

VARCHAR and CHAR

String

Other

not supported

The value null is the equivalent of the SQL NULL.

The input data can be accessed through get() methods. For example, ctx.getString("url"), where ctx refers to the name of the context parameter that is passed to the run() method.

You can use a dynamic number of parameters through the notation (...), for example, CREATE JAVA SCALAR SCRIPT my_script (...). The parameters can then be accessed through an index, ctx.getString(0) for the first parameter. The number of parameters and their data types, which are both determined during the call of the script, are part of the metadata.

Metadata

You can access the following metadata through the object ExaMetadata.

Metadata Description

String getDatabaseName()

Database name

String getDatabaseVersion()

Database version

String getScriptLanguage()

Name and version of the language

String getScriptName()

Name of the script

String getScriptSchema()

Schema in which the script is stored

String getCurrentSchema()

Schema which is currently opened

String getScriptCode()

Text of the script

String getSessionId()

Session ID

long getStatementId()

Statement ID within the session

long getCurrentUser()

Current user

long getScopeUser()

Scope user (getCurrentUser() or the owner of a view if the udf script is called within a view)

long getNodeCount()

Number of nodes in the cluster

long getNodeId()

Local node i, starting with 0

String getVmId()

Unique ID for the local machine (the ids of the virtual machines have no relation to each other)

BigInteger getMemoryLimit()

Memory limit of the Java process

String getInputType()

Type of the input data (SCALAR or SET)

long getInputColumnCount()

Number of input columns

String getInputColumnName(int column)

Name of an input column

Class getInputColumnType(int column)

Type of an input column

String getInputColumnSqlType(int column)

SQL type of an input column

long getInputColumnPrecision(int column)

Precision of an input column

long getInputColumnScale(int column)

Scale of an input column

long getInputColumnLength(int column)

Length of an input column

String getOutputType()

Type of the output data (RETURNS or EMITS)

long getOutputColumnCount()

Number of output columns

String getOutputColumnName(int column)

Name of an output column

Class getOutputColumnType(int column)

Type of an output column

String getOutputColumnSqlType(int column)

SQL type of an output column

long getOutputColumnPrecision( int column)

Precision of an output column

long getOutputColumnScale(int column)

Scale of an output column

long getOutputColumnLength(int column)

Length of an output column

Class importScript(String name)

Import of another script

Exceptions

The following Exasol-specific exceptions can be thrown:

  • ExaCompilationException
  • ExaDataTypeException
  • ExaIterationException

Import of other scripts

Besides importScript() method of class ExaMetadata, other scripts can be imported using the keyword %import. After that, the script is accessible in the namespace (for example, OTHER_SCRIPT.my_method()).

Syntax
%import <schema>.<script>
Examples
CREATE SCHEMA IF NOT EXISTS TEST;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.JAVADEMO() RETURNS VARCHAR(2000) AS

class JAVADEMO {
  static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
    return "Minimal Java UDF";
  }
}
/

SELECT TEST.JAVADEMO();

CREATE SCHEMA IF NOT EXISTS LIB;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT LIB.MYLIB() RETURNS VARCHAR(2000) AS class MYLIB {
 public static String helloWorld(){
        return "Hello Java World!";
 }
}
/

CREATE SCHEMA IF NOT EXISTS TEST;

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS 

%import LIB.MYLIB;

public class MYHELLOWORLD {
 public static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
 return MYLIB.helloWorld();
 }
}
/

select TEST.MYHELLOWORLD();

Access connection definitions

The data that has been specified when defining connections with CREATE CONNECTION is available in Java UDF scripts through method ExaMetadata.getConnection(String connectionName). The result is a Java object that implements the Interface com.exasol.ExaConnectionInformation which features the following methods:

Methods Description

ExaConnectionInformation.ConnectionType ExaConnectionInformation.getType()

The type of the connection definition. ConnectionType is an enumeration which currently only contains the entry PASSWORD.

String ExaConnectionInformation.getAddress()

The part of the connection definition that followed the TO keyword in the CREATE CONNECTION command.

String ExaConnectionInformation.getUser()

The part of the connection definition that followed the USER keyword in the CREATE CONNECTION command.

String ExaConnectionInformation.getPassword()

The part of the connection definition that followed the IDENTIFIED BY keyword in the CREATE CONNECTION command.

The following example shows how to access the connection definitions in a Java UDF.

Example
CREATE CONNECTION MY_CONNECTION
TO 'http://my-server.com'
USER 'my_user'
IDENTIFIED BY 'my_password';

--/
CREATE OR REPLACE JAVA SCALAR SCRIPT RETURN_CONNECTION()
EMITS (TYPE_FIELD VARCHAR(2000000),ADDRESS_FIELD VARCHAR(2000000),USER_FIELD VARCHAR(2000000),PW_FIELD VARCHAR(2000000)) AS
public class RETURN_CONNECTION {
    public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
        String connectionName = "MY_CONNECTION";
        ExaConnectionInformation connection = exa.getConnection(connectionName);
        ExaConnectionInformation.ConnectionType type=connection.getType();
        String address = connection.getAddress();
        String userName = connection.getUser();
        String pass = connection.getPassword();
        ctx.emit(type.toString(),address,userName,pass);
    }
}
/

SELECT RETURN_CONNECTION() FROM DUAL;

Integrate your own JAR packages

For details, see Adding New Packages to Existing Script Languages.

JVM options

To enable tuning script performance depending on its memory requirements, the %jvmoption keyword can be used to specify the following Java VM options:

  • Initial heap size (-Xms)
  • Maximum heap size (-Xmx)
  • Thread stack size (-Xss)

Example, %jvmoption -Xms128m -Xmx1024m -Xss512k;

This sets the initial Java heap size to 128 MiB, maximum heap size to 1024 MiB, and thread stack size to 512 KiB.

If multiple values are given for a single option (for example, resulting from the import of another script), the last value will be used.

Dynamic output parameters callback function

If the UDF script was defined with dynamic output parameters and the output parameters can not be determined (by specifying EMITS in the query or through INSERT INTO SELECT), the database calls the method public static String getDefaultOutputColumns(ExaMetadata exa) which you can implement. The expected return value is a String with the names and types of the output columns, for example, "a int, b varchar(100)". See Dynamic input and output parameters for an explanation when this method is called including examples.

The method will be executed only once on a single node.

This function should be implemented within the script's main class.

User Defined Import Callback Function

To support a user defined import you can implement the callback method public static String generateSqlForImportSpec(ExaMetadata meta, ExaImportSpecification importSpec). See Dynamic input and output parameters and the IMPORT for the syntax. The parameter importSpec contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement which will retrieve the data to be imported.

importSpec is an object with the following fields:

Fields Description

Map<String, String> getParameters()

Parameters specified in the IMPORT statement.

boolean isSubselect()

This is true, if the IMPORT is used inside a SELECT statement and not inside an IMPORT INTO table statement.

List<String> getSubselectColumnNames()

If isSubselect() is true and the user specified the target column names and types, this returns the names of all specified columns.

List<String> getSubselectColumnTypes()

If isSubselect() is true and the user specified the target column names and types, this returns the types of all specified columns. The types are returned in SQL format ("VARCHAR(100)").

boolean hasConnectionName()

This method returns true if a connection was specified. The UDF script can then obtain the connection information through ExaMetadata.getConnection(name).

String getConnectionName()

If hasConnection() is true, this returns the name of the specified connection.

boolean hasConnectionInformation()

This returns true if connection information was provided. The UDF script can then obtain the connection information through getConnectionInformation().

ExaConnectionInformation getConnectionInformation()

If hasConnectionInformation() is true, this returns the connection information provided by the user. See above in this table for the definition of ExaConnectionInformation.

The password is transferred into plaintext and could be visible in the logs. Therefore, It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from connection_name field). The actual connection information can be obtained through ExaMetadata.getConnection(connectionName).

This function should be implemented within the script's main class.

User Defined Export Callback Function

To support a user defined export, you can implement the callback method public static String generateSqlForExportSpec(ExaMetadata meta, ExaExportSpecification export_spec). See Dynamic input and output parameters and the EXPORT for the syntax. The parameter export_spec contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT statement which will generate the data to be exported. The FROM part of that string can be a dummy table (DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully.

export_spec is an object with the following fields:

Fields Description

Map<String, String> getParameters()

Parameters specified in the EXPORT statement.

List<String> getSourceColumnNames()

List of column names of the resulting table that shall be exported.

boolean hasTruncate()

Boolean value from the EXPORT command option that defines whether the content of the target table should be truncated before the data transfer.

boolean hasReplace()

Boolean value from the EXPORT command option that defines whether the target table should be deleted before the data transfer.

boolean hasCreatedBy()

Boolean value from the EXPORT command option that defines whether a creation text was specified or not.

String getCreatedBy()

String value from the EXPORT command option that defines the creation text that is executed in the target system before the data transfer.

boolean hasConnectionName()

This method returns true if a connection was specified.

String getConnectionName()

If hasConnection() is true, this returns the name of the specified connection.

boolean hasConnectionInformation()

This returns true if connection information was provided. The UDF script can then obtain the connection information through getConnectionInformation().

ExaConnectionInformation getConnectionInformation()

If hasConnectionInformation() is true, this returns the connection information provided by the user. See above in this table for the definition of ExaConnectionInformation.

The password is transferred into plaintext and could be visible in the logs. Therefore, It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from connection_name field). The actual connection information can be obtained through ExaMetadata.getConnection(connectionName).

This function should be implemented within the script's main class.

Example
/*
    This example loads from a webserver
    and processes the following file goalies.xml:

    <?xml version='1.0' encoding='UTF-8'?>
    <users>
        <user active="1">
        <first_name>Manuel</first_name>
        <last_name>Neuer</last_name>
        </user>
        <user active="1">
        <first_name>Joe</first_name>
        <last_name>Hart</last_name>
        </user>
        <user active="0">
        <first_name>Oliver</first_name>
        <last_name>Kahn</last_name>
        </user>
    </users>
*/
 
--/
    CREATE JAVA SCALAR SCRIPT process_users(url VARCHAR(500))
    EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
  
    import java.net.URL;
    import java.net.URLConnection;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.NodeList;
    import org.w3c.dom.Node;
    import org.w3c.dom.Element;
   
  public class PROCESS_USERS {
      public static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
      URL url = new URL(ctx.getString("url"));
      URLConnection conn = url.openConnection();
      DocumentBuilder docBuilder =
      DocumentBuilderFactory.newInstance().newDocumentBuilder();
      Document doc = docBuilder.parse(conn.getInputStream());
      NodeList nodes =
      doc.getDocumentElement().getElementsByTagName("user");
      for (int i = 0; i < nodes.getLength(); i++) {
             if (nodes.item(i).getNodeType() != Node.ELEMENT_NODE)
             continue;
             Element elem = (Element)nodes.item(i);
             if (!elem.getAttribute("active").equals("1"))
                continue;
             Node name = elem.getElementsByTagName("first_name").item(0);
         String firstName = name.getChildNodes().item(0).getNodeValue();
         name = elem.getElementsByTagName("last_name").item(0);
         String lastName = name.getChildNodes().item(0).getNodeValue();
         ctx.emit(firstName, lastName);
      }
       }
}
/
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;

Adapter script callback function

For virtual schemas an adapter script must define the function String adapterCall(final ExaMetadata metadata, final String requestJson). The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. For the Virtual Schema API Documentation, see Information for Developers. The callback function will be executed only on a single node.