Protegrity

Protegrity allows you use tokens and mask data in your Exasol database. This enhances your data security by substituting the sensitive data with non-sensitive data as token. The document describes how you can connect to Protegrity with Exasol for data tokenization using UDFs.

Environment

The document is created using the following environment as an example:

  • Exasol installation with four nodes
  • Protegrity Enterprise Security Administrator
  • Exasol’s Protegrity EXAoperation Plugin Package
  • Python 3

Prerequisites

You must have the following to connect your Exasol database with Protegrity:

  • A running Exasol database
  • Plugin.Security.Protegrity-<version>.pkg

    Contact Exasol Support for the package.

  • A running instance of Protegrity Enterprise Security Administrator
  • Python 3
  • Exasol XML-RPC package configured (see XML-RPC Commandline)
  • PEP server configuration and server key data that allows you to connect to an ESA server. You can obtain the data from ESA using PepServerSetup_Linux_x64_<version>.sh script provided by Protegrity. Add this data in an archive file (for example, PEP_data.tgz) on the server where you run the XML-RPC commands.

Installation and Configuration

Step 1: Upload the Protegrity Plugin

  1. Log into EXAoperation as an administrator user.
  2. Go to Software > Versions > Software Update File.
  3. Upload the Protegrity package Plugin.Security.Protegrity-<version>.pkg.
  4. Click Submit.

Step 2: Install the Plugin through XML-RPC

  1. Run the following command to create an XML-RPC connection with Exasol cluster.

    from xmlrpclib import Server as xmlrpc
    from ssl import _create_unverified_context as ssl_context
    from pprint import pprint as pp
    from base64 import b64encode
    import getpass
    server = "<ip or hostname of ESA License Server>" 
    user = "<ESA user name>"
    password = getpass.getpass(prompt='Enter Password:')
    server = xmlrpc('https://%s:%s@%s/cluster1/' % (user,password,server) ,context = ssl_context())
  2. Run the following command to check the Protegrity plugin is available on the cluster.

    pp(server.showPluginList())
    #['Security.Protegrity-<version>']
  3. (Optional) you can run the following command to check the available options with the plugin.

    pp(server.showPluginFunctions('Security.Protegrity-<version>'))
        #'INSTALL': 'Install plugin.',
        #'UPLOAD_DATA': 'Upload data directory.',
        #'UNINSTALL': 'Uninstall plugin.',
        #'START': 'Start pepserver.',
        #'STOP': 'Stop pepserver.',
        #'STATUS': 'Show status of plugin (not installed, started, stopped).',
        #'LAST_MODIFIED': 'Show the timestamp when a file last has been modified',
        #'SIZE': 'Show the size of a file',
        #'LIST_FILES': 'Shows the files in a subfolder of /opt/protegrity/'
  4. Run the following command to store the name of the plugin and the list of database nodes in Python variables.

    pname = 'Security.Protegrity-<version>'
    nlist = server.getNodeList()
  5. Run the following command to install the plugin on all the nodes of your Exasol cluster. The following command uses a four-node cluster as an example.

    pp([[node] + server.callPlugin(pname, node, 'INSTALL', '') for node in nlist])
    #[['n0011', 0, ''], ['n0012', 0, ''], ['n0013', 0, ''], ['n0014', 0, '']]
  6. Run the following command to check the status of the PEP server software.

    pp([[node] + server.callPlugin(pname, node, 'STATUS', '') for node in nlist])
    #[['n0011', 0, 'stopped'],
    #['n0012', 0, 'stopped'],
    #['n0013', 0, 'stopped'],
    #['n0014', 0, 'stopped']]

Step 3: Configure the PEP and ESA

After installing the plugin, do the following to configure the PEP server:

  1. Upload the archive file with PEP server keys and configurations (see Prerequisites) to the cluster using the following command.

    pdata = b64encode(open('PEP_data.tgz', 'rb').read())
    pp([[node,] + server.callPlugin(pname, node, 'UPLOAD_DATA', pdata) for node in nlist])
  2. Start the PEP servers.

    pp([[node] + server.callPlugin(pname, node, 'START', '') for node in nlist])
    #[['n0011', 0, 'started'],
    #['n0012', 0, 'started'],
    #['n0013', 0, 'started'],
    #['n0014', 0, 'started']]

Create UDFs

Exasol provides you with loosely-coupled UDFs to work with Protegrity. The UDFs are interpreted by external processes to implement any programming language and can be extended with arbitrary libraries and functionality.

Map UDF Language Name with UDF Language Implementation

The definition of the UDF mentions the programming language PYTHON in its definition. For loosely coupled UDF languages, such as, Python, Java, R, and Protegrity, the database needs to know what language implementation to use.

For Protegrity, the content of the SCRIPT_LANGUAGES variable contains a mapping.

The following command uses udfplugin:///protegrity_710 from the Protegrity UDF Language provided by Exasol.

ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PROTEGRITY=udfplugin:///protegrity_710'

The following command uses udfplugin:///protegrity_710 and udfplugin:///protegrity_710_ua from the Protegrity UDF Language provided by Exasol.

ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PROTEGRITY_SCOPE_USER_AUTH=udfplugin:///protegrity_710 PROTEGRITY_SESSION_USER_AUTH=udfplugin:///protegrity_710_ua'

For details about the differences between the two commands, see the Map Exasol Database Users with Protegrity Users section in this topic.

Map Exasol Database Users with Protegrity Users

Protegrity UDFs use Exasol database user names for authentication in the ESA server.

Database user names in Exasol aren't case sensitive, and when you want to expose them to an external systems in uppercase, you can do the following:

  • Use uppercase user names in the ESA server.
  • Specify option case-sensitive = no in the pepserver.cfg file.

udfplugin_protegrity_710_ua uses the current user of the SQL session as the user name when interacting with Protegrity.

udfplugin:///protegrity_710_ua uses the scope user of a query. This is the same user as the session user with the notable exception of database views that contain Protegrity UDFs. For these, the owner of the view is used when interacting with Protegrity.

Available UDFs

You can use the following forms to create Protegrity UDFs.

UDF Form Template
No data form (NDF)
CREATE OR REPLACE <PROTEGRITY_LANGUAGE_NAME> SCALAR SCRIPT
x.get_protegrity_version() RETURNS VARCHAR(2000000) AS getversion
[EXTERNAL_IV]
/
Identity form (IF)
CREATE <PROTEGRITY_LANGUAGE_NAME> SCALAR SCRIPT
<SCRIPT_NAME> (<INPUT> VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS identity
[EXTERNAL_IV]
/
Dynamic form (DF)
CREATE <PROTEGRITY_LANGUAGE_NAME> SCALAR SCRIPT
<SCRIPT_NAME> (<PROTECTION_METHOD>VARCHAR(20), <DATA_ELEMENT>VARCHAR(20), <INPUT>VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS dynamic
[EXTERNAL_IV]
/
Fixed method form (FMF)
CREATE <PROTEGRITY_LANGUAGE_NAME> SCALAR SCRIPT
<SCRIPT_NAME> (<DATA_ELEM>VARCHAR(20), <INPUT>VARCHAR(2000000)) 
RETURNS VARCHAR(2000000)AS <PROTECTION_METHOD>
[EXTERNAL_IV]
/
Fixed method and data element form (FMEF)
CREATE <PROTEGRITY_LANGUAGE_NAME> SCALAR SCRIPT
<SCRIPT_NAME> (<INPUT>VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS <PROTECTION_METHOD>:<DATA_ELEM>
[EXTERNAL_IV]
/

The following variables are used in the templates:

  • PROTEGRITY_LANGUAGE_NAME: Any name that syntactically conforms to the rules of SQL identifiers in Exasol and is not reserved. At the time the UDF is created or invoked, this identifier must be mapped to a UDF plugin that implements the Protegrity connector.
  • SCRIPT_NAME: Any name that is allowed as UDF name in Exasol.
  • INPUT: Any name that is allowed as UDF parameter name in Exasol. The name itself has no meaning for the UDF plugin and only serves documentary purposes via the SQL metadata.
  • PROTECTION_METHOD: One of the allowed protection method names.
  • DATA_ELEM: The name of the ESA data element.
  • EXTERNAL_IV: An optional additional initialization, supported by some data elements.

Protection Behaviors

The basic protection methods are protect and unprotect. The methods call Protegrity protection APIs one value at a time and don't perform any transformation. Exasol VARCHAR data, which is always encoded as UTF-8 strings, is sent to the protection APIs in raw form. Other protection methods offer special behaviors which are described in the following table.

Special Behavior Description
rt (“right truncate”) White space at the end of the character data values is removed before calling Protegrity protection APIs.
u16 (“UTF-16”)

Internally, Exasol stores character data using UTF-8. With this option, data is converted to UTF-16 before calling protect. And when calling unprotect, the unprotected data is converted from UTF-16 back to UTF-8 before sending it to Exasol.

This behavior is useful when migrating from legacy systems that internally store character values as UTF-16. Using these methods, the old protected values are still usable in Exasol.

x (“hex”) This behavior returns the protected value as hexadecimal character string. There is no corresponding version for unprotect.
b (“bulk”)

This behavior processes data bulkwise leading to potentially better performance.

For this behavior, there is a hard size limit on the data that can be protected (4096 bytes) enforced by the UDF plugin connector. In normal mode, there also might be size limits, but they are enforced by Protegrity’s protection APIs and are beyond the scope of this document.

Protection Methods

The following table describes all available methods and their behaviors with the form they are used with.

Protection Method Description Special Behavior Supported UDF Form
identity This method is only supported in identity form. This method simply returns its input without calling Protegrity’s protection APIs and is probably useful for debugging. - IF, DF
dynamic

This method must be chosen for the dynamic UDF form. It allows to dynamically (even on a per value basis) specify the protection method and the data element. All methods that can be used in DF form are allowed.

The behavior depends on the method argument when calling the UDF.

- DF
protect The standard protect method. When using this method, values are unprotected using Protegrity’s APIs one value at a time. - DF, FMEF
unprotect The standard protect method. When using this method, values are unprotected using Protegrity’s APIs one value at a time. - DF, FMEF
protectu16 Similar to protect with addition of u16 behavior. u16 DF, FMEF
protectu16rt Similar to protect with addition of u16 and rt behavior. u16, rt DF, FMEF
protectx Similar to protect with addition of x behavior. x DF, FMEF
protectxu16 Similar to protect with addition of u16 and x behavior. x, u16 DF, FMEF
protectxu16rt Similar to protect with addition of u16, rt, and x behavior. x, u16, rt DF, FMEF
unprotectu16 Similar to unprotect with addition of u16 behavior. u16 DF, FMEF
unprotectu16rt Similar to unprotect with addition of u16 and rt behavior. u16, rt DF, FMEF
bprotect Similar to protect with addition of b behavior. b FMEF
bprotectu16 Similar to protect with addition of u16 and b behavior. b, u16 FMEF
bprotectu16rt Similar to protect with addition of u16, rt, and b behavior. b, u16, rt FMEF
bunprotect Similar to unprotect with addition of b behavior. b FMEF
bunprotectu16 Similar to unprotect with addition of u16 and b behavior. b, u16 FMEF
bunprotectu16rt Similar to unprotect with addition of u16, rt, and b behavior. b, u16, rt FMEF
protectd This method is the fixed method form version of protect. - FMF
protectdu16 This method is the fixed method form version of protectu16. u16 FMF
protectdu16rt This method is the fixed method form version of protectu16rt. u16, rt FMF
protectxd This method is the fixed method form version of protectx. - FMF
protectxdu16 This method is the fixed method form version of protectxu16. u16 FMF
protectxdu16rt This method is the fixed method form version of protectxu16rt. u16, rt FMF
getversion This method creates a function that returns the Protegrity API version - NDF
whoami This method creates a function that returns the user name for Protegrity API calls. - NDF

External Initialization Vector (IV)

Some of the data elements in Protegrity support external IV. You can specify these elements in the UDFs in the following ways.

Inline External IVs

To specify an inline external IV data, use the following syntax in the UDF definition:

IV:some secret text here

For example:

--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
IV:this-is-a-secret-iv-as-inline-string
/

External IV Hidden in Exasol Connection Object

To specify a connection as external IV data, use the following syntax in the UDF definition:

IV_CONNECTION:iv_connection1

Exasol provides connection objects (see CREATE CONNECTION) for different use cases. You can use the connection objects to manage external IVs. For example,

CREATE CONNECTION
iv_connection1 TO 'protegrity_iv'
USER 'extiv-1-secret';

--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
IV_CONNECTION:iv_connection1
/

In the above example, the address component TO must contain the string protegrity_iv. The USER component is used as external IV data.

Example UDF with Results

--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
/

select prot_test1('Hello World'); -- produces 'lm8Q3 3RLQi'

--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
unprotect:test1
/

open schema prot_1;
select unprot_test1('lm8Q3 3RLQi'); -- produces 'Hello World'

Limitations

Exasol and Protegrity integration has the following limitations:

  • Only data elements for tokenization (and not for encryption) are supported.
  • Only one global Protegrity software installation per Exasol cluster.
  • Currently, ESA log message aggregation is not available.
  • Protegrity UDFs only support character datatypes.
  • The maximum token size for bulk operations is 4096.