Adding New Packages to Existing Script Languages
This section explains how to expand existing script languages with additional packages or libraries using BucketFS.
The Lua script language is not expandable and is natively compiled into the Exasol database software.
Script Language Container
A script language container consists of a Linux container with a complete Linux distribution and all required libraries. You can install the libraries and other dependencies to a script language container with the usual package manager, such as apt-get or pip. This simplifies the installation of packages with many dependencies.
Exasol provides different flavors of script language containers for Python 2 and 3, R, and Java on GitHub. A flavor is essentially a build description that contains a number of files and a description of how they are connected. We also provide various packages for these languages. You can use one of our pre-built containers or extend one of our flavors with additional packages and build your container.
Using a Pre-built Script Language Container
You can download the current version of the pre-built container from the release section on GitHub.
All containers specify the minimum Exasol version they are compatible with. You can use them with later version of Exasol than specified. For example, the container standard-EXASOL-6.0.0-release-....tar.gz is compatible with Exasol 6.0.0 and later. For more information about the containers and their compatibility, refer to the Exasol Script Language Flavors documentation.
For instructions on how to upload the new container to BucketFS, see Manage Buckets and Files in BucketFS. The following example using curl defines an environment for the examples in the subsequent steps:
local_container_file = "path/to/container.tar.gz"
bucketfs_host = "192.168.6.75"
bucketfs_port = "1234"
write_password = "write-password"
bucketfs_name = "bfsdefault"
bucket_name = "default"
path_in_bucket = "path/in/bucket/"
container_file_in_bucket = "container.tar.gz"
container_name = "container"
protocol = "HTTP" # or HTTPS
curl -vX PUT -T "$local_container_file" "$protocol://w:$write_password@$bucketfs_host:$bucketfs_port/$bucket_name/$path_in_bucket$container_file_in_bucket
The uploaded container must have the file extension .tar.gz, .tar.bz2, or .zip. Otherwise the container will not get extracted and its contents will not be available in BucketFS.
The value for path_in_bucket
can be empty. If it is not empty it must end with a forward slash /
.
When the container is uploaded to BucketFS it will be distributed and extracted to all nodes. After a few minutes, you can activate it and use it.
We recommend activating the container for your session first and testing it before activating it system-wide.
To activate a new container you must edit the session parameter SCRIPT_LANGUAGES
. The SCRIPT_LANGUAGES
parameter defines a list of aliases for script language/container combinations. You can change a script alias using ALTER SESSION (session-wide) and ALTER SYSTEM (system-wide) commands. Changing the script alias is especially useful if you have multiple language versions in parallel, or when migrating from one version to another.
We recommend adding new aliases for non-standard containers and only overwrite the built-in defined aliases if necessary.
The current session and system parameters are available in the EXA_PARAMETERS system table.
The default values are internal macros to make the parameter compact and to dynamically use the last installed version. For example, the alias for bultin_java
would look like the following:
JAVA=localzmq+protobuf:///bfsdefault/default/EXAClusterOS/ScriptLanguages-release-standard-exasol-<version>/?lang=java#/buckets/bfsdefault/default/EXAClusterOS/ScriptLanguages-release-standard-exasol-<version>/exaudf/exaudfclient
In the next section, we explain how you can define your own script alias for another script language container.
Define Your Own Script Aliases
For containers derived from the script language containers on GitHub, you can use exaslct
to build the ALTER SESSION statement by running it with the command "generate-language-activation"
and specifying the bucket details. This method is less error-prone than writing it from scratch. For more information, see the documentation on GitHub.
The SCRIPT_LANGUAGES
parameter is a string that lists all active script aliases separated with spaces. Each script alias consists of an ALIAS_NAME
and a LANGUAGE_URL
or the builtin_*
macro.
The syntax for the SCRIPT_LANGUAGES
parameter is defined as follows:
A script alias cannot contain spaces.
The LANGUAGE_URL
contains the following information for the database:
path_to_linux_container
: BucketFS path to the Linux container which is used for the root file system of the UDF.path_to_client
: BucketFS path to the script client. Usually, this path should be a sub-directory of the path to the Linux container.client_param_list
: Parameter list for the script client.
The complete script alias is as shown below:
<alias>=localzmq+protobuf:///<path_to_linux_container>/[?<client_param_list>]#<path_to_client>
For containers derived from our script language containers on GitHub, you can use the following template to create your language URL, or use exaslct to generate the command for you:
<alias>=localzmq+protobuf:///<bucketfs_name>/<bucket_name>/<path_in_bucket><container_name>/?lang=<language>#buckets/<bucketfs_name>/<bucket_name>/<path_in_bucket><container_name>/exaudf/exaudfclient[_py3]
Exasol currently provides the following language
parameter options:
- java
- r
- python
To switch between Python 2 and Python 3 you need to change the name of the script client from exaudfclient
to exaudfclient_py3
.
The container_name
corresponds to the file name of the container in the BucketFS without the file extension. The reason for this is that the BucketFS extracts files with the file extension .tar.gz, .tar.bz2 or .zip.
The value for path_in_bucket
can be empty. However, if it is not empty it must end with a '/'.
Example
Based on the contents and examples from above, we get the following ALTER SESSION
command:
ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON=builtin_python PYTHON3=builtin_python3 R=builtin_r JAVA=builtin_java MY_PYTHON=localzmq+protobuf:///bfsdefault/default/path/in/bucket/container/?lang=python#buckets
/bfsdefault/default/path/in/bucket/container/exaudf/exaudfclient';
Next, we add the new script alias MY_PYTHON
with the script language container in container.tar.gz and python as the language to the SCRIPT_LANGUAGES
parameter. You can now use this in a UDF by creating a new UDF with CREATE SCRIPT command and specify the new script as MY_PYTHON
. For example, you can create a UDF as shown below:
Customizing an Existing Container
You can build a customized script language container with the libraries that you need. The easiest way to do this is by using exaslct, which is available on our GitHub. Exasol also uses exaslct to build the pre-built containers.
Using exaslct you can build a new container from a build description that we call flavor. A flavor contains a number of Docker files and a description of how they are connected. For all flavors, we also provide a dedicated mechanism to customize them. After you customize and build your container, the rest of the installation procedure is the same as for our pre-built containers.
For more details, see script-languages-release on GitHub.
Script Languages Developer Sandbox
Exasol provides virtual machine images that can be used as a developer sandbox to easily build script language containers. The virtual machine image provides:
-
The script-languages-release repositories
-
All necessary dependencies to execute exaslct in script-languages-release (including a correctly configured docker runtime)
-
A running Jupyterlab instance that is automatically started when the vm boots
For more information and downloads, see the Script-Languages-Developer-Sandbox User Guide on GitHub.
Adding Java files (.jar) to a Java UDF
As an alternative to adding a new script language container, you can specify required JAR-Files in a Java UDF via there BucketFS Path.
If you for instance want to use Google's library to process telephone numbers, you could upload the file similarly to the examples above in the
bucket named javalib. The corresponding local bucket path would look like the following: /buckets/bfsdefault/javalib/libphonenumber-4.2.jar
.
In the following script, you can see how the path is configured to import the library.
--/
CREATE JAVA SCALAR SCRIPT jphone(num VARCHAR(2000))
RETURNS VARCHAR(2000) AS
%jar /buckets/bfsdefault/javalib/libphonenumber-4.2.jar;
import com.google.i18n.phonenumbers.PhoneNumberUtil;
import com.google.i18n.phonenumbers.NumberParseException;
import com.google.i18n.phonenumbers.Phonenumber.PhoneNumber;
class JPHONE {
static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try {
PhoneNumber swissNumberProto = phoneUtil.parse(ctx.getString("num"),
"DE");
return swissNumberProto.toString();
} catch (NumberParseException e) {
System.err.println("NumberParseException thrown: " + e.toString());
}
return "failed";
}
}
/