Python 2.7 (Deprecated)

Python 2 has reached end of life and is no longer supported in Exasol. We recommend that you use Python 3 in all Python UDFs.

Python 2.7 (the final version of the Python 2 series) reached end of life status in 2020 and is no longer supported in Exasol. We therefore recommend that you use Python 3 in all your UDFs. If you need to continue using Python 2 in your UDFs, refer to Using Python 2 with Exasol 7.1.20 or later for a suggested workaround.

The following information in this section provides details about the now deprecated Python 2.7. For additional information, see the official Python Documentation.

run() and cleanup() Methods

The method run() is called for each input tuple (SCALAR) or each group (SET). Its parameter is a type of execution context and provides access to the data and the iterator in case of a SET script.

To initialize expensive steps such as opening external connections you can write code outside the run() method, since this code is executed once at the beginning by each virtual machine. For deinitialization purpose, the method cleanup() is available. It is called once for each virtual machine, at the end of the execution.

Parameters

The internal Python data types and the database SQL types are not identical. Therefore, casts must be done for the input and output data:

Datatype SQL	Datatype Python
`DECIMAL(p,0)`	`int`
`DECIMAL(p,s)`	`decimal.Decimal`
`DOUBLE`	`float`
`DATE`	`datetime.date`
`TIMESTAMP`	`datetime.datetime`
`BOOLEAN`	`bool`
`VARCHAR` and `CHAR`	`unicode`

The Python keyword None is the equivalent of SQL NULL.

The input parameters can be addressed by their names. For example, ctx.my_input. ctx refers to the name of the context parameter that is passed to the run() method.

You can also use a dynamic number of parameters via the notation (...), for example, CREATE PYTHON SCALAR SCRIPT my_script (...). The parameters can then be accessed through an index (ctx[0] for the first parameter). The number of parameters and their data types (both determined during the call of the script) are part of the metadata.

For better performance, use DOUBLE instead of DECIMAL for parameter types.

Metadata

You can access the following metadata through global variables:

Metadata	Description
`exa.meta.database_name`	Database name
`exa.meta.database_version`	Database version
`exa.meta.script_language`	Name and version of the script language
`exa.meta.script_name`	Name of the script
`exa.meta.script_schema`	Schema in which the script is stored
`exa.meta.current_schema`	Schema which is currently opened
`exa.meta.script_code`	Code of the script
`exa.meta.session_id`	Session ID
`exa.meta.statement_id`	Statement ID within the session
`exa.meta.current_user`	Current user
`exa.meta.scope_user`	Scope user (`current_user` or the owner of a view if the udf script is called within a view)
`exa.meta.node_count`	Number of cluster nodes
`exa.meta.node_id`	Local node ID starting with 0
`exa.meta.vm_id`	Unique ID for the local machine (the IDs of the virtual machines have no relation to each other)
`exa.meta.input_type`	Type of the input data (SCALAR or SET)
`exa.meta.input_column_count`	Number of input columns
`exa.meta.input_columns[]`	Array including the following information: {name, type, sql_type, precision, scale, length}
`exa.meta.output_type`	Type of the output data (RETURNS or EMITS)
`exa.meta.output_column_count`	Number of output columns
`exa.meta.output_columns[]`	Array including the following information: {name, type, sql_type, precision, scale, length}

Data Iterator

For scripts having multiple input tuples per call (keyword SET) you can iterate through that data using the method next(), which is accessible through the context. Initially, the iterator points to the first input row. For iterating you can use a while True loop, which is aborted in case if not ctx.next().

If the input data is empty, then the run() method will not be called, and similar to aggregate functions the NULL value is returned as result (for example, SELECT MAX(x) FROM t WHERE false).

Additionally, there is a method reset() that resets the iterator to the first input element. Therefore, you can do multiple iterations through the data, if this is required for your algorithm.

The method size() returns the number of input values.

emit()

You can return multiple output tuples per call (keyword EMITS) using the method emit(). The method expects as many parameters as output columns were defined. In the case of dynamic output parameters, it is handy in Python to use a list object which can be unpacked using * (like in the example above: ctx.emit(*currentRow)).

Import of Other Scripts

Other scripts can be imported through the method exa.import_script(). The return value of this method must be assigned to a variable, representing the imported module.

Syntax

Copy

<alias> = exa.import_script('<schema>.<script>')

Example:

Copy

--/
CREATE OR REPLACE PYTHON SCALAR SCRIPT LIB.MYLIB() RETURNS INT AS
def helloWorld():
  return "Hello Python World!"
/

--/
CREATE OR REPLACE PYTHON SCALAR SCRIPT TEST.MYHELLOWORLD() RETURNS VARCHAR(2000) AS
l = exa.import_script('LIB.MYLIB')
def run(ctx):
  return l.helloWorld()
/

select TEST.MYHELLOWORLD();

Accessing Connection Definitions

The data that has been specified when defining connections with CREATE CONNECTION is available in Python UDF scripts through the method exa.get_connection("<connection_ name>"). The result is a Python object with the following fields:

Fields	Description
`type`	The type of the connection definition. `ConnectionType` is an enumeration which currently only contains the entry PASSWORD
`address`	The part of the connection definition that followed the `TO` keyword in the `CREATE CONNECTION` command
`user`	The part of the connection definition that followed the `USER` keyword in the `CREATE CONNECTION` command
`password`	The part of the connection definition that followed the `IDENTIFIED BY` keyword in the `CREATE CONNECTION` command

Auxiliary Libraries

The following libraries are provided that are not already part of the language. For more information, refer to the documentation from the respective developer.

Libraries	Description
`cjson`	Processing of JSON objects
`lxml`	XML processing
`NumPy`	Numeric calculations
`PyTables`	Hierarchical database package
`pytz`	Time zone functions
`redis`	Interface for Redis
`scikit-learn`	Machine Learning
`SciPy`	Scientific tools
`requests`	Standard for making HTTP requests in Python.
`pycurl`	Can be used to fetch objects identified by a URL from a Python program.
`boto3`	Boto3 is the Amazon Web Services (AWS) SDK for Python.
`boto`	Boto is the (deprecated) Amazon Web Services (AWS) SDK for Python.

Dynamic Output Parameters Callback Function

If the UDF script is defined with dynamic output parameters and the output parameters cannot be determined (by specifying EMITS in the query or by INSERT INTO SELECT), the database calls the method default_output_columns() which you can implement. The expected return value is a String with the names and types of the output columns, "a int, b varchar(100)". For more details about when this method is called, see Dynamic input and output parameters

You can access the Metadata exa.meta in the method to find out the number and types of input columns.

The method will be executed only once on a single node.

User Defined Import Callback Function

To support a user defined import you can implement the callback method generate_sql_for_import_spec(import_spec). For syntax, see Dynamic input and output parameters and IMPORT statement. The parameter import_spec contains all information about the executed IMPORT FROM SCRIPT statement. The function has to generate and return a SELECT statement which will retrieve the data to be imported. import_spec has the following fields:

Fields	Description
`parameters[]`	Parameters specified in the `IMPORT` statement. For example, `parameters['FOO']` returns either the value of parameter `FOO`, or `None`, if `FOO` was not specified.
`is_subselect`	This is true, if the `IMPORT` is used inside a `SELECT` statement and not inside an `IMPORT INTO` table statement.
`subselect_column_names[]`	If `is_subselect` is true and the user specified the target column names and types, this returns the names of all specified columns.
`subselect_column_types[]`	If `is_subselect` is true and the user specified the target column names and types, this returns the types of all specified columns. The types are returned in SQL format (`"VARCHAR(100)"`).
`connection_name`	This returns the name of the connection, if it was specified. Otherwise it returns None. The UDF script can then obtain the connection information through `exa.get_connection(name)`.
`connection`	This is only defined, if the user provided connection information. It returns an object similar to the return value of `exa.getConnection(name)`. Returns `None` if no connection information is specified.

The password is transferred into plaintext and could be visible in the logs. Therefore, it is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from connection_name field). The actual connection information can be obtained through exa.get_connection(name).

User Defined Export Callback Function

To support a user defined export you can implement the callback method generate_sql_for_export_spec(export_spec). For syntax, see Dynamic input and output parametersand EXPORT statement. The parameter export_spec contains all information about the executed EXPORT INTO SCRIPT statement. The function has to generate and return a SELECT SQL statement which will generate the data to be exported. The FROM part of that string can be a dummy table (DUAL) since the export command is aware which table should be exported. But it has to be specified to be able to compile the SQL string successfully.

export_spec has the following fields:

Fields	Description
`parameters[]`	Parameters specified in the `EXPORT` statement. For example, `parameters['FOO']` returns either the value of parameter `FOO`, or `None`, if `FOO` was not specified.
`source_column_names[]`	List of column names of the resulting table that should be exported.
`has_truncate`	Boolean value from the `EXPORT` command option that defines whether the content of the target table should be truncated before the data transfer.
`has_replace`	Boolean value from the `EXPORT` command option that defines whether the target table should be deleted before the data transfer.
`created_by`	String value from the `EXPORT` command option that defines the creation text executed in the target system before the data transfer.
`connection_name`	This returns the name of the connection, if it was specified. Otherwise it returns `None`. The UDF script can then obtain the connection information through `exa.get_connection(name)`.
`connection`	This is only defined, if the user provided connection information. It returns an object similar to the return value of `exa.getConnection(name)`. Returns `None` if no connection information is specified.

The password is transferred into plaintext and could be visible in the logs. Therefore, It is recommended to create a CONNECTION before and to specify only the connection name (can be obtained from connection_name field). The actual connection information can be obtained through exa.get_connection(name).

Example:

Copy

/*
  This example loads from a webserver
  and processes the following file goalies.xml:

  <?xml version='1.0' encoding='UTF-8'?>
  <users>
    <user active="1">
      <first_name>Manuel</first_name>
      <last_name>Neuer</last_name>    
    </user>
    <user active="1">
      <first_name>Joe</first_name>
      <last_name>Hart</last_name>
    </user>
    <user active="0">
      <first_name>Oliver</first_name>
      <last_name>Kahn</last_name>
    </user>
  </users>
*/
 
--/
CREATE PYTHON SCALAR SCRIPT process_users(url VARCHAR(500))
EMITS (firstname VARCHAR(20), lastname VARCHAR(20)) AS
import re
import xml.etree.cElementTree as etree
from urllib2 import urlopen

def run(ctx):
    content = etree.parse(urlopen(ctx.url))
    for user in content.findall('user/[@active="1"]'):
        ctx.emit(user.find('first_name').text, user.find('last_name').text)
/
 
SELECT process_users ('http://www.my_valid_webserver/goalies.xml')
FROM DUAL;

Adapter Script Callback Function

For virtual schemas an adapter script must define the function adapter_call(request_json). The parameter is a string in JSON containing the Virtual Schema API request. The return value must also be a string in JSON containing the response. For the Virtual Schema API Documentation, see Information for Developers. The callback function will be executed only on a single node.

Python 2.7 (Deprecated)

run() and cleanup() Methods

Parameters

Metadata

Data Iterator

emit()

Import of Other Scripts

Syntax

Example:

Accessing Connection Definitions

Auxiliary Libraries

Dynamic Output Parameters Callback Function

User Defined Import Callback Function

User Defined Export Callback Function

Example:

Adapter Script Callback Function

PRODUCT

RESOURCES