Adding New Packages to Existing Script Languages

This article explains how to expand existing script languages with additional packages or libraries.

A script language container (SLC) consists of a Linux container with a complete Linux distribution and all required libraries. You can install the libraries and other dependencies to an SLC with a package manager such as apt or pip. This simplifies installation of packages with many dependencies.

Exasol provides different flavors of SLCs for Python, R, and Java. A flavor is essentially a build description that contains a number of files and a description of how they are connected. Exasol also provides various packages for these languages. You can choose to use one of our pre-built SLCs, or you can extend one of our flavors with additional packages and build your own SLC. How to do this is described in this article.

You can download the current releases of our pre-built SLCs from our GitHub repository, where you will also find additional documentation about the SLCs and their compatibility with different versions of Exasol.

The Lua script language is not expandable and is natively compiled into the Exasol database software.

Use a pre-built script language container (SLC)

Step 1: Download a pre-built SLC

Download the current version of the pre-built SLC from the Releases page.

Step 2: Upload the SLC to the BucketFS in Exasol

For instructions on how to upload the SLC to BucketFS, see also Manage Buckets and Files in BucketFS.

The following example using curl defines an environment for the examples in the subsequent steps:

local_container_file = "path/to/container.tar.gz" 
bucketfs_host = "192.168.6.75" 
bucketfs_port = "1234" 
write_password = "write-password" 
bucketfs_name = "bfsdefault"
bucket_name = "default" 
path_in_bucket = "path/in/bucket/" 
container_file_in_bucket = "container.tar.gz"
container_name = "container"
protocol = "HTTP" # or HTTPS 

curl -vX PUT -T "$local_container_file" "$protocol://w:$write_password@$bucketfs_host:$bucketfs_port/$bucket_name/$path_in_bucket$container_file_in_bucket

The uploaded SLC must have the file extension .tar.gz, .tar.bz2, or .zip. Files with other file extensions will not be extracted and the contents will not be available in BucketFS.

The path_in_bucket value must end with a forward slash / if it is used. This value is optional and can be left empty.

Step 3: Activate the new SLC

When the SLC is uploaded to BucketFS it will be distributed and extracted to all nodes (this will typically take a few minutes). When the SLC has been installed on all nodes, you can activate and use it.

To activate a new SLC you must edit the session parameter SCRIPT_LANGUAGES. This parameter defines a list of aliases for script language/container combinations. You can change a script alias as needed, either session-wide using ALTER SESSION or system-wide using ALTER SYSTEM. Changing a script alias is especially useful if you have multiple language versions in parallel, or when migrating from one version to another.

The current session and system parameters are found in the EXA_PARAMETERS system table:

SELECT session_value FROM EXA_PARAMETERS
WHERE parameter_name='SCRIPT_LANGUAGES';

The default values are internal macros to make the parameter compact and to dynamically use the last installed version. For example, the alias for bultin_java would look like the following: 

JAVA=localzmq+protobuf:///bfsdefault/default/EXAClusterOS/ScriptLanguages-release-standard-exasol-<version>/?lang=java#/buckets/bfsdefault/default/EXAClusterOS/ScriptLanguages-release-standard-exasol-<version>/exaudf/exaudfclient

Best practices:

  • Always activate and test the SLC in your session before you activate it system-wide.

  • Add new aliases for non-standard SLCs, and only overwrite the built-in defined aliases if necessary.

Define your own script aliases

For SLCs derived from the pre-built SLCs, you can build the ALTER SESSION statement using the exaslct tool with the command generate-language-activation and specifying the bucket details. This method is less error-prone than writing the statement from scratch. For more information, see Exasol Script Languages Container Tool.

The SCRIPT_LANGUAGES parameter is a string that lists all active script aliases separated with spaces. Each script alias consists of an ALIAS_NAME and either a LANGUAGE_URL or the builtin_* macro.

The syntax for the SCRIPT_LANGUAGES parameter is as follows:

<ALIAS_NAME_1>=<LANGUAGE_URL_1> <ALIAS_NAME_2>=<LANGUAGE_URL2_> ...'

The LANGUAGE_URL contains the following information for the database:

  • path_to_linux_container: BucketFS path to the Linux container that is used for the root file system of the UDF.
  • path_to_client: BucketFS path to the script client. Usually, this path should be a sub-directory of the path to the Linux container.
  • client_param_list: Parameter list for the script client.

The complete script alias is as follows:

<alias>=localzmq+protobuf:///<path_to_linux_container>/[?<client_param_list>]#<path_to_client>

For SLCs derived from the pre-built SLCs you can use the following template to create your language URL:

<alias>=localzmq+protobuf:///<bucketfs_name>/<bucket_name>/<path_in_bucket><container_name>/?lang=<language>#buckets/<bucketfs_name>/<bucket_name>/<path_in_bucket><container_name>/exaudf/exaudfclient

The following language parameter options are supported: 

  • java
  • r
  • python

You can also generate the language URL for the alias using exaslct, which is the tool used by Exasol to build the pre-built containers. For more information and downloads, see Exasol Script Languages Container Tool.

The script alias must not contain spaces.

The container_name is the file name of the SLC in BucketFS, without the file extension. The file extenstion must be omitted since BucketFS extracts archives with the file extensions .tar.gz, .tar.bz2, or .zip.

The path_in_bucket value must end with a forward slash / if it is used. This value is optional and can be left empty.

Example

Based on the contents and examples above, we get the following ALTER SESSION command: 

ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=builtin_python3 R=builtin_r JAVA=builtin_java MY_PYTHON=localzmq+protobuf:///bfsdefault/default/path/in/bucket/container/?lang=python#buckets
/bfsdefault/default/path/in/bucket/container/exaudf/exaudfclient';

This adds the new script alias MY_PYTHON to the SCRIPT_LANGUAGES parameter, defining the container as container.tar.gz (file endings must be omitted) and the language as python.

To use the alias, create a new UDF using the CREATE SCRIPT command and specify the new script as MY_PYTHON:

--/
CREATE MY_PYTHON SCALAR SCRIPT my_python_udf(input_parameter VARCHAR(2000))
RETURNS VARCHAR(2000) AS

def run(ctx):
  return ctx.input_parameter
/

Customize an existing SLC

You can build a customized script language container (SLC) with the libraries that you need. The easiest way to do this is by using exaslct, which is the tool used by Exasol to build the pre-built containers. Using exaslct you can build a new container from a build description that we call flavor. A flavor contains a number of Docker files and a description of how they are connected. For all flavors, we also provide a dedicated mechanism to customize them. After you customize and build your container, the rest of the installation procedure is the same as for our pre-built containers.

For more information and downloads, see Exasol Script Languages Container Tool.

Exasol AI Lab

Exasol AI Lab provides a virtual machine image that can be used as a developer sandbox to easily build script language containers. The image is pre-configured with all necessary dependencies installed, and is available in multiple editions involving different technology stacks.

The Exasol AI Lab image contains a Jupyter Notebook environment, which includes a tutorial that will do the following:

  1. Download the script-languages-release repositories

  2. Customize the packages

  3. Build the script language container

  4. Test the new package

For more information, see the Exasol AI Lab User Guide .

Adding JAR files to a Java UDF

As an alternative to adding a new script language container, you can specify required JAR files in a Java UDF using their BucketFS path.

For example, if you want to use Google’s library to process telephone numbers, you could upload the file in a bucket named javalib. The corresponding local bucket path would look like the following: /buckets/bfsdefault/javalib/libphonenumber-4.2.jar.

In the following script, you can see how the path is configured to import the library.

--/
CREATE JAVA SCALAR SCRIPT jphone(num VARCHAR(2000))
RETURNS VARCHAR(2000) AS
%jar /buckets/bfsdefault/javalib/libphonenumber-4.2.jar;
 
import com.google.i18n.phonenumbers.PhoneNumberUtil;
import com.google.i18n.phonenumbers.NumberParseException;
import com.google.i18n.phonenumbers.Phonenumber.PhoneNumber;
class JPHONE {
  static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
    PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
    try {
      PhoneNumber swissNumberProto = phoneUtil.parse(ctx.getString("num"),
                            "DE");
      return swissNumberProto.toString();  
    } catch (NumberParseException e) {
      System.err.println("NumberParseException thrown: " + e.toString());
    }
    return "failed";
  }
}
/