Protegrity
Protegrity allows you use tokens and mask data in your Exasol database. This enhances your data security by substituting the sensitive data with non-sensitive data as token. The document describes how you can connect to Protegrity with Exasol for data tokenization using UDFs.
Environment
The document is created using the following environment as an example:
- Exasol installation with four nodes
- Protegrity Enterprise Security Administrator
- Exasol’s Protegrity EXAoperation Plugin Package
- Python 3
Prerequisites
You must have the following to connect your Exasol database with Protegrity:
- A running Exasol database
- Plugin.Security.Protegrity-<version>.pkg
Contact Exasol Support for the package.
- A running instance of Protegrity Enterprise Security Administrator
- Python 3
- Exasol XML-RPC package configured (see XML-RPC Commandline)
- PEP server configuration and server key data that allows you to connect to an ESA server. You can obtain the data from ESA using PepServerSetup_Linux_x64_<version>.sh script provided by Protegrity. Add this data in an archive file (for example, PEP_data.tgz) on the server where you run the XML-RPC commands.
Installation and Configuration
Step 1: Upload the Protegrity Plugin
- Log into EXAoperation as an administrator user.
- Go to Software > Versions > Software Update File.
- Upload the Protegrity package Plugin.Security.Protegrity-<version>.pkg.
- Click Submit.
Step 2: Install the Plugin through XML-RPC
-
Run the following command to create an XML-RPC connection with Exasol cluster.
from xmlrpclib import Server as xmlrpc
from ssl import _create_unverified_context as ssl_context
from pprint import pprint as pp
from base64 import b64encode
import getpass
server = "<ip or hostname of ESA License Server>"
user = "<ESA user name>"
password = getpass.getpass(prompt='Enter Password:')
server = xmlrpc('https://%s:%s@%s/cluster1/' % (user,password,server) ,context = ssl_context()) -
Run the following command to check the Protegrity plugin is available on the cluster.
-
(Optional) you can run the following command to check the available options with the plugin.
pp(server.showPluginFunctions('Security.Protegrity-<version>'))
#'INSTALL': 'Install plugin.',
#'UPLOAD_DATA': 'Upload data directory.',
#'UNINSTALL': 'Uninstall plugin.',
#'START': 'Start pepserver.',
#'STOP': 'Stop pepserver.',
#'STATUS': 'Show status of plugin (not installed, started, stopped).',
#'LAST_MODIFIED': 'Show the timestamp when a file last has been modified',
#'SIZE': 'Show the size of a file',
#'LIST_FILES': 'Shows the files in a subfolder of /opt/protegrity/' -
Run the following command to store the name of the plugin and the list of database nodes in Python variables.
-
Run the following command to install the plugin on all the nodes of your Exasol cluster. The following command uses a four-node cluster as an example.
-
Run the following command to check the status of the PEP server software.
Step 3: Configure the PEP and ESA
After installing the plugin, do the following to configure the PEP server:
-
Upload the archive file with PEP server keys and configurations (see Prerequisites) to the cluster using the following command.
-
Start the PEP servers.
Create UDFs
Exasol provides you with loosely-coupled UDFs to work with Protegrity. The UDFs are interpreted by external processes to implement any programming language and can be extended with arbitrary libraries and functionality.
Map UDF Language Name with UDF Language Implementation
The definition of the UDF mentions the programming language PYTHON in its definition. For loosely coupled UDF languages, such as, Python, Java, R, and Protegrity, the database needs to know what language implementation to use.
For Protegrity, the content of the SCRIPT_LANGUAGES variable contains a mapping.
The following command uses udfplugin:///protegrity_710 from the Protegrity UDF Language provided by Exasol.
ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PROTEGRITY=udfplugin:///protegrity_710'
The following command uses udfplugin:///protegrity_710 and udfplugin:///protegrity_710_ua from the Protegrity UDF Language provided by Exasol.
ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON=builtin_python R=builtin_r JAVA=builtin_java PYTHON3=builtin_python3 PROTEGRITY_SCOPE_USER_AUTH=udfplugin:///protegrity_710 PROTEGRITY_SESSION_USER_AUTH=udfplugin:///protegrity_710_ua'
For details about the differences between the two commands, see the Map Exasol Database Users with Protegrity Users section in this topic.
Map Exasol Database Users with Protegrity Users
Protegrity UDFs use Exasol database user names for authentication in the ESA server.
Database user names in Exasol aren't case sensitive, and when you want to expose them to an external systems in uppercase, you can do the following:
- Use uppercase user names in the ESA server.
- Specify option case-sensitive = no in the pepserver.cfg file.
udfplugin_protegrity_710_ua uses the current user of the SQL session as the user name when interacting with Protegrity.
udfplugin:///protegrity_710_ua uses the scope user of a query. This is the same user as the session user with the notable exception of database views that contain Protegrity UDFs. For these, the owner of the view is used when interacting with Protegrity.
Available UDFs
You can use the following forms to create Protegrity UDFs.
UDF Form | Template |
---|---|
No data form (NDF) | |
Identity form (IF) | |
Dynamic form (DF) | |
Fixed method form (FMF) | |
Fixed method and data element form (FMEF) |
The following variables are used in the templates:
- PROTEGRITY_LANGUAGE_NAME: Any name that syntactically conforms to the rules of SQL identifiers in Exasol and is not reserved. At the time the UDF is created or invoked, this identifier must be mapped to a UDF plugin that implements the Protegrity connector.
- SCRIPT_NAME: Any name that is allowed as UDF name in Exasol.
- INPUT: Any name that is allowed as UDF parameter name in Exasol. The name itself has no meaning for the UDF plugin and only serves documentary purposes via the SQL metadata.
- PROTECTION_METHOD: One of the allowed protection method names.
- DATA_ELEM: The name of the ESA data element.
- EXTERNAL_IV: An optional additional initialization, supported by some data elements.
Protection Behaviors
The basic protection methods are protect and unprotect. The methods call Protegrity protection APIs one value at a time and don't perform any transformation. Exasol VARCHAR data, which is always encoded as UTF-8 strings, is sent to the protection APIs in raw form. Other protection methods offer special behaviors which are described in the following table.
Special Behavior | Description |
---|---|
rt (“right truncate”) | White space at the end of the character data values is removed before calling Protegrity protection APIs. |
u16 (“UTF-16”) |
Internally, Exasol stores character data using UTF-8. With this option, data is converted to UTF-16 before calling protect. And when calling unprotect, the unprotected data is converted from UTF-16 back to UTF-8 before sending it to Exasol. This behavior is useful when migrating from legacy systems that internally store character values as UTF-16. Using these methods, the old protected values are still usable in Exasol. |
x (“hex”) | This behavior returns the protected value as hexadecimal character string. There is no corresponding version for unprotect. |
b (“bulk”) |
This behavior processes data bulkwise leading to potentially better performance. For this behavior, there is a hard size limit on the data that can be protected (4096 bytes) enforced by the UDF plugin connector. In normal mode, there also might be size limits, but they are enforced by Protegrity’s protection APIs and are beyond the scope of this document. |
Protection Methods
The following table describes all available methods and their behaviors with the form they are used with.
Protection Method | Description | Special Behavior | Supported UDF Form |
---|---|---|---|
identity | This method is only supported in identity form. This method simply returns its input without calling Protegrity’s protection APIs and is probably useful for debugging. | - | IF, DF |
dynamic |
This method must be chosen for the dynamic UDF form. It allows to dynamically (even on a per value basis) specify the protection method and the data element. All methods that can be used in DF form are allowed. The behavior depends on the method argument when calling the UDF. |
- | DF |
protect | The standard protect method. When using this method, values are unprotected using Protegrity’s APIs one value at a time. | - | DF, FMEF |
unprotect | The standard protect method. When using this method, values are unprotected using Protegrity’s APIs one value at a time. | - | DF, FMEF |
protectu16 | Similar to protect with addition of u16 behavior. | u16 | DF, FMEF |
protectu16rt | Similar to protect with addition of u16 and rt behavior. | u16, rt | DF, FMEF |
protectx | Similar to protect with addition of x behavior. | x | DF, FMEF |
protectxu16 | Similar to protect with addition of u16 and x behavior. | x, u16 | DF, FMEF |
protectxu16rt | Similar to protect with addition of u16, rt, and x behavior. | x, u16, rt | DF, FMEF |
unprotectu16 | Similar to unprotect with addition of u16 behavior. | u16 | DF, FMEF |
unprotectu16rt | Similar to unprotect with addition of u16 and rt behavior. | u16, rt | DF, FMEF |
bprotect | Similar to protect with addition of b behavior. | b | FMEF |
bprotectu16 | Similar to protect with addition of u16 and b behavior. | b, u16 | FMEF |
bprotectu16rt | Similar to protect with addition of u16, rt, and b behavior. | b, u16, rt | FMEF |
bunprotect | Similar to unprotect with addition of b behavior. | b | FMEF |
bunprotectu16 | Similar to unprotect with addition of u16 and b behavior. | b, u16 | FMEF |
bunprotectu16rt | Similar to unprotect with addition of u16, rt, and b behavior. | b, u16, rt | FMEF |
protectd | This method is the fixed method form version of protect. | - | FMF |
protectdu16 | This method is the fixed method form version of protectu16. | u16 | FMF |
protectdu16rt | This method is the fixed method form version of protectu16rt. | u16, rt | FMF |
protectxd | This method is the fixed method form version of protectx. | - | FMF |
protectxdu16 | This method is the fixed method form version of protectxu16. | u16 | FMF |
protectxdu16rt | This method is the fixed method form version of protectxu16rt. | u16, rt | FMF |
getversion | This method creates a function that returns the Protegrity API version | - | NDF |
whoami | This method creates a function that returns the user name for Protegrity API calls. | - | NDF |
External Initialization Vector (IV)
Some of the data elements in Protegrity support external IV. You can specify these elements in the UDFs in the following ways.
Inline External IVs
To specify an inline external IV data, use the following syntax in the UDF definition:
IV:some secret text here
For example:
--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
IV:this-is-a-secret-iv-as-inline-string
/
External IV Hidden in Exasol Connection Object
To specify a connection as external IV data, use the following syntax in the UDF definition:
IV_CONNECTION:iv_connection1
Exasol provides connection objects (see CREATE CONNECTION) for different use cases. You can use the connection objects to manage external IVs. For example,
CREATE CONNECTION
iv_connection1 TO 'protegrity_iv'
USER 'extiv-1-secret';
--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
IV_CONNECTION:iv_connection1
/
In the above example, the address component TO must contain the string protegrity_iv. The USER component is used as external IV data.
Example UDF with Results
--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
protect:test1
/
select prot_test1('Hello World'); -- produces 'lm8Q3 3RLQi'
--/
CREATE OR REPLACE PROTEGRITY SCALAR SCRIPT
prot_test1(din VARCHAR(2000000))
RETURNS VARCHAR(2000000) AS
unprotect:test1
/
open schema prot_1;
select unprot_test1('lm8Q3 3RLQi'); -- produces 'Hello World'
Limitations
Exasol and Protegrity integration has the following limitations:
- Only data elements for tokenization (and not for encryption) are supported.
- Only one global Protegrity software installation per Exasol cluster.
- Currently, ESA log message aggregation is not available.
- Protegrity UDFs only support character datatypes.
- The maximum token size for bulk operations is 4096.