Python 3
This section explains how to use Python 3 for UDF scripting in Exasol.
For additional information about Python, refer to the official Python Documentation.
run() and cleanup() Methods
The method run()
is called for each input tuple (SCALAR) or each group (SET). Its
parameter is a type of execution context and provides access to the data and the iterator in
case of a SET script.
To initialize expensive steps (such as opening external connections), you can write code outside
the run()
method, since this code is executed once at the beginning by each virtual machine.
For deinitialization purpose, the method cleanup()
is available. It is called once
for each virtual machine, at the end of the execution.
Parameters
The internal Python data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:
Datatype SQL | Datatype Python 3 |
---|---|
DECIMAL(p,0)
|
int
|
DECIMAL(p,s)
|
decimal.Decimal
|
DOUBLE
|
float
|
DATE
|
datetime.date
|
TIMESTAMP
|
datetime.datetime
|
BOOLEAN
|
bool
|
VARCHAR and CHAR |
str
|
The value None
is the equivalent of the SQL NULL
.
For better performance, you should prefer DOUBLE
to
DECIMAL
for the parameter types.
The input parameters can be addressed by their names, for example, ctx.my_input
. ctx
refers to the name of the context parameter that is passed to the run()
method.
You can also use a dynamic number of parameters via the notation (...)
, for example, CREATE
PYTHON3 SCALAR SCRIPT my_script (...)
. The parameters can then be accessed
through an index (ctx[0] for the first parameter). The number of parameters and
their data types (both determined during the call of the script) are part of the metadata.
Metadata
You can access the following metadata through global variables:
Metadata | Description |
---|---|
|
Database name |
|
Database version |
|
Name and version of the script language |
|
Name of the script |
|
Schema in which the script is stored |
|
Schema which is currently opened |
|
Code of the script |
|
Session ID |
|
Statement ID within the session |
|
Current user |
|
Scope user ( |
|
Number of cluster nodes |
|
Local node ID starting with 0 |
|
Unique ID for the local machine (the IDs of the virtual machines have no relation to each other) |
|
Type of the input data (SCALAR or SET) |
|
Number of input columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
|
Type of the output data (RETURNS or EMITS) |
|
Number of output columns |
|
Array including the following information: {name, type, sql_type, precision, scale, length} |
Data Iterator
For scripts having multiple input tuples per call (keyword SET), you can iterate through
that data using the method next()
, which is accessible through the context. Initially, the iterator points to the first input row. For iterating you can use a while True
loop which
is aborted in case if not ctx.next()
.
If the input data is empty, then the run()
method will not be called,
and similar to aggregate functions the NULL value is returned as
result (for example, SELECT MAX(x) FROM t WHERE
false
).
Additionally, there is a method reset()
that resets the iterator to the first input
element. Therefore, you can do multiple iterations through the data, if this is required for
your algorithm.
The method size()
returns the number of input values.
emit()
You can return multiple output tuples per call (keyword EMITS) using the method emit()
.
The method expects as many parameters as output columns were defined. In the case of
dynamic output parameters, it is handy in Python to use a list
object which can be unpacked using * (like in the example above: ctx.emit(*currentRow)
).
Import other scripts
Other scripts can be imported through the method exa.import_script()
. The return value
of this method must be assigned to a variable, representing the imported module.
Syntax
Examples
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT TEST.PYTHON_DEMO() RETURNS VARCHAR(2000) AS
def run(ctx):
return "Minimal Python UDF"
/
select TEST.PYTHON_DEMO();
CREATE SCHEMA IF NOT EXISTS LIB;
--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
def helloWorld():
return "Hello Python3 World!"
/
CREATE SCHEMA IF NOT EXISTS TEST;
--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS
l = exa.import_script('LIB.MYLIB')
def run(ctx):
return l.helloWorld()
/
select TEST.MYHELLOWORLD();
Access connection definitions
The data that has been specified when defining connections with CREATE CONNECTION
is available in Python UDF scripts through the method exa.get_connection("<connection_
name>")
. The result is a Python object with the following fields:
Fields | Description |
---|---|
|
The type of the connection definition. |
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
|
The part of the connection definition that followed the |
Auxiliary libraries
The following libraries are provided that are not already part of the language:
Libraries | Description |
---|---|
|
XML processing. For more details, see http://pypi.python.org/pypi/lxml. |
|
Numeric calculations. For details, see http://www.numpy.org. |
|
Hierarchical database package. For details, see http://www.pytables.org. |
|
Time zone functions. For details, see http://pytz.sourceforge.net. |
|
Interface for Redis. For details, see http://pypi.python.org/pypi/redis/. |
|
Machine Learning. For details, see http://scikit-learn.org. |
|
Scientific tools. For details, see http://www.scipy.org. For this library, the required build tool atlas is available at http://pypi.python.org/pypi/atlas. |
ujson
|
UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 2.5+ and 3. For more details, see https://pypi.org/project/ujson/. |
pyexasol
|
Official Python driver for Exasol. For more details, see https://github.com/exasol/pyexasol. |
requests
|
Standard for making HTTP requests in Python. For more details, see https://pypi.org/project/requests/. |
pycurl
|
Can be used to fetch objects identified by a URL from a Python program. For more details, see https://pypi.org/project/pycurl/. |
boto3
|
Boto3 is the Amazon Web Services (AWS) SDK for Python. For more details, see https://pypi.org/project/boto3/. |
boto
|
Boto is the (deprecated) Amazon Web Services (AWS) SDK for Python. For more details, see https://pypi.org/project/boto/. |
ldap
|
Python-ldap provides an object-oriented API to access LDAP directory servers from Python programs. For more details, see https://www.python-ldap.org/en/latest/. |
roman
|
Converts an integer to a roman numeral. For more details, see https://pypi.org/project/roman/. |
OpenSSL
|
A python wrapper module around the OpenSSL library. For more details, see https://www.pyopenssl.org/en/stable/. |
smbc
|
Binding for Samba client library libsmbclient. For more details, see https://pypi.org/project/pysmbc/. |
leveldb
|
Binding for the key-value database LevelDB. For more details, see https://code.google.com/archive/p/py-leveldb/. |
pyodbc
|
Database API module for ODBC. For more details, see https://github.com/mkleehammer/pyodbc/wiki. |
pandas
|
Data structures and data-analysis tools for working with structured and time-series data. For more details, https://pandas.pydata.org. |
pycparser
|
Parser for the C language. For more details, see https://github.com/eliben/pycparser. |
cffi
|
C Foreign Functions Interface to interact with C code from Python. For more details, see https://cffi.readthedocs.io/en/latest/. |
protobuf
|
Google's platform-neutral mechanism for serializing structured data. For more details, see https://developers.google.com/protocol-buffers/. |
pykickstart
|
Library for reading and writing kickstart files. For more details, see https://pykickstart.readthedocs.io/en/latest/. |
martian
|
Library for embedding configuration information in Python code. For more details, see https://pypi.org/project/martian/. |
Dynamic output parameters callback
If the UDF script is defined with dynamic output parameters and the output parameters
cannot be determined by specifying EMITS
in the query or using INSERT INTO SELECT
, the database calls the method default_output_columns()
which you can implement in the script. The expected return value is a string with the names and types of the output columns, "a int, b varchar(100)"
.
For more details about
when default_output_columns()
is called, see Dynamic input and output parameters.
You can access the metadata exa.meta
in the method
to find out the number and types of input columns.
The method will be executed only once on a single node.
User defined import callback
To support a user defined import, you can implement the callback method generate_sql_for_import_spec(import_spec)
. For details about the syntax, see Dynamic input and output parameters and IMPORT.
The parameter import_spec
contains all information about the executed IMPORT FROM SCRIPT
statement. The function has to generate and return a SELECT
statement, which will retrieve the data to be imported.
The import_spec
parameter has the following fields:
Field | Description |
---|---|
|
Parameters specified in the |
|
This is true, if the |
|
If |
|
If |
|
This returns the name of the connection, if it was
specified. Otherwise it returns |
|
This is only defined, if the user provided connection
information. It returns an object similar to the return value of |
The password is transferred in plaintext and can be visible in the logs. We recommend that you create a CONNECTION
to specify only the connection
name, which can be obtained from the connection_name
field. The actual
connection information can be obtained through exa.get_connection(name)
.
User defined export callback
To support a user defined export, you can implement the callback method generate_sql_for_export_spec(export_spec)
. For details about the syntax, see Dynamic input and output parametersand EXPORT.
The parameter export_spec
contains all information about the executed EXPORT INTO SCRIPT
statement. The function has to generate and return a SELECT
statement which will generate the data to be exported. The FROM
part of that string can be a dummy table (DUAL) since the export command is aware which table should be exported, but it must be specified to be able to compile the SQL string.
The parameter export_spec
has the following fields:
Field | Description |
---|---|
|
Parameters specified in the |
|
List of column names of the resulting table that should be exported. |
|
Boolean value from the |
|
Boolean value from the |
|
String value from the |
|
This returns the name of the connection, if it was
specified. Otherwise it returns |
|
This is only defined, if the user provided connection
information. It returns an object similar to the return value of |
The password is transferred in plaintext and can be visible in the logs. We recommend that you create a CONNECTION
to specify only the connection
name, which can be obtained from the connection_name
field. The actual
connection information can be obtained through exa.get_connection(name)
.
Example
/*
This example loads from a webserver
and processes the following file goalies.xml:
<?xml version='1.0' encoding='UTF-8'?>
<users>
<user active="1">
<first_name>Manuel</first_name>
<last_name>Neuer</last_name>
</user>
<user active="1">
<first_name>Joe</first_name>
<last_name>Hart</last_name>
</user>
<user active="0">
<first_name>Oliver</first_name>
<last_name>Kahn</last_name>
</user>
</users>
*/
--/
CREATE PYTHON3 SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
import urllib.request
import lxml.etree as etree
def run(ctx):
data = b''.join(urllib.request.urlopen(ctx.url).readlines())
tree = etree.XML(data)
for user in tree.findall('user/[@active="1"]'):
fn = user.findtext('first_name')
ln = user.findtext('family_name')
ctx.emit(fn, ln)
/
Adapter script callback
For virtual schemas an adapter script must define the function adapter_call(request_json)
. The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. The callback function will be executed only on a single node.
For the Virtual Schema API documentation, see Information for Developers.
Activate Python 3 in databases installed prior to version 6.2
In a new installation of Exasol version 6.2 or later, Python 3 is activated by default. In a system that was installed with a version prior to 6.2 and then updated to a later version, Python 3 must be explicitly activated.
To check if Python 3 is active in your database, run the following query:
If the result contains PYTHON3=builtin_python3
, Python 3 is active in your system and you do not have to make any changes. If the result does not contain this string, you must explicitly activate Python 3. To do this, append a space and PYTHON3=builtin_python3
to the value returned by the query, then use ALTER SYSTEM SET SCRIPT_LANGUAGES
to update the parameter. For example:
ALTER SYSTEM SET SCRIPT_LANGUAGES = 'PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3';
We recommend that you test the changes using ALTER SESSION
before making system-wide changes using ALTER SYSTEM
.