Virtual Schema User Guide

This article describes how to create and use virtual schemas.

When you create a virtual schema you must specify a corresponding adapter. The adapter is a script that implements the logic for accessing data from the external system. Exasol provides adapters for several supported data sources, referred to as dialects. You can also use the generic JDBC dialect as a fallback to work with any JDBC driver that is not listed among the supported data sources. User guides for all dialects are provided in the respective repositories on GitHub.

For more details and links, see Supported Data Sources in this section and the Supported Dialects page on GitHub.

Create a virtual schema

The following steps are required to create a virtual schema:

Step 1: Deploy JDBC driver files

The JDBC driver is used by the native bulk loader in Exasol, EXALoader, to run the IMPORT statements generated by the virtual schema adapter and to read metadata from the data source.

To learn how to upload the JDBC driver, see Add JDBC Driver.

Some JDBC drivers may contain multiple files and you need to upload all of them. For more information, see Supported Data Sources.

Step 2: Install the adapter script

Create a schema to hold the adapter script using CREATE SCHEMA.

CREATE SCHEMA SCHEMA_FOR_VS_SCRIPT;

Create an adapter script using CREATE SCRIPT. Replace all the placeholders in the following example with the paths to your virtual schema libraries (JAR files) and JDBC driver. The exact syntax for your dialect can be found in the respective dialect user guides in our GitHub repository.

--/
CREATE JAVA ADAPTER SCRIPT SCHEMA_FOR_VS_SCRIPT.JDBC_ADAPTER_SCRIPT AS
  %scriptclass com.exasol.adapter.RequestDispatcher;
  %jar /buckets/your-bucket-fs/your-bucket/virtual-schema-dist-<version>-<dialect>-<version>.jar;
  %jar /buckets/your-bucket-fs/your-bucket/<JDBC driver>.jar;
/

The default path to the JDBC drivers is /buckets/bfsdefault/default/drivers/jdbc/<Database_name>/<JDBC driver>.jar

Some SQL clients may have problems determinining where a script ends or with truncating scripts. For suggested solutions to these issues, see Troubleshooting SQL clients.

Step 3: Define a JDBC connection

To create a JDBC connection, use CREATE CONNECTION.

CREATE CONNECTION JDBC_CONNECTION TO '<jdbc connection string>' USER '<username>' IDENTIFIED BY '<password>';

Step 4: Create a virtual schema

To create the virtual schema, use CREATE VIRTUAL SCHEMA and specify the adapter script and the connection that you created. The adapter retrieves the metadata through JDBC and maps them to virtual tables. The metadata (virtual tables, columns, and data types) are then cached in Exasol.

CREATE VIRTUAL SCHEMA VS_TEST
USING SCHEMA_FOR_VS_SCRIPT.JDBC_ADAPTER_SCRIPT
WITH
  CONNECTION_NAME = 'JDBC_CONNECTION'
  SCHEMA_NAME     = '<SCHEMA_NAME>';

For a list of all properties supported by the JDBC adapter, see Adapters and Properties.

Using the adapter

The following SQL commands can be used to manage the adapter: 

CREATE SCHEMA

Creates a virtual schema.

Example:

CREATE VIRTUAL SCHEMA VS_TEST USING SCHEMA_FOR_VS_SCRIPT.JDBC_ADAPTER_SCRIPT WITH
  CONNECTION_NAME = 'JDBC_CONNECTION'
  SCHEMA_NAME     = '<SCHEMA_NAME>';

DROP SCHEMA

Deletes a virtual schema and all of its virtual tables.

Example :

DROP VIRTUAL SCHEMA VS_TEST;

ALTER SCHEMA

Modifies the properties of a virtual schema or refreshes the metadata using REFRESH option.

Example:

ALTER VIRTUAL SCHEMA VS_TEST REFRESH;
ALTER VIRTUAL SCHEMA VS_TEST REFRESH TABLES T1 T2; 
-- refresh only these tables
ALTER VIRTUAL SCHEMA VS_TEST SET TABLE_FILTER='MY_TABLE, CLICKS';
ALTER VIRTUAL SCHEMA VS_TEST SET TABLE_FILTER=null;

EXPLAIN VIRTUAL

Analyzes the resulting queries for the external system that are created by the Exasol compiler.

Example:

EXPLAIN VIRTUAL SELECT * FROM MY_TABLE;

OPEN SCHEMA + SELECT

Explores the tables in the virtual schema like in a regular schema.

Example:

OPEN SCHEMA VS_TEST;
SELECT * FROM MY_TABLE;
SELECT count(*) FROM MY_TABLE;
SELECT DISTINCT USER_ID FROM MY_TABLE;

Virtual schema limitations

When developing custom virtual schema dialects, keep the following limits in mind:

Request limitations

The limitations of a virtual schema request are as follows:

  • SQL statements are limited to 64 MiB. This affects the size of CREATE VIRTUAL SCHEMA, including all its properties.

  • A single property is limited to two million characters, which is the VARCHAR limit in Exasol.

Response limitations

The limitations of a virtual schema response are as follows:

  • Virtual schema responses are limited to 2 GiB by protocol buffer.

  • Adapter notes are limited to two million characters.

  • Value lists are limited by the SQL compiler run time. In general, value lists should not exceed 100.000 entries.

Exasol recommends using the IMPORT statement for virtual schema results instead of using value lists, as this allows parallel import.

ORDER BY clause limitations

The limitations for use of the ORDER BY clause in virtual schemas are as follows:

  • ORDER BY can be used without limitations in virtual schemas that do not use IMPORT:
    • Exasol virtual schemas using a local connection. For more information, see VSEXA dialect guide.
    • Virtual schemas using row level security (RLS). RLS implementations in Java and Lua are currently available.
  • For virtual schemas using IMPORT, only unordered transfer is supported. The outermost order of the imported result rows is therefore not guaranteed.

  • Document-based virtual schemas do not support ORDER BY COLUMN at all.

If you need ordering, wrap your query inside an extra SELECT * FROM (<virtual-schema-query> ORDER BY FALSE) ORDER BY <criteria> [, ...] statement. The query will then be executed on the target Exasol database, not on the source of the virtual schema.

Troubleshooting SQL clients

DBeaver

Problem: If a script contains multiple semicolons or newlines, DBeaver has problems identifying where a script ends.

Solution: The simplest workaround is to highlight the whole script and execute it as a single step.

DbVisualizer

Problem: DbVisualizer has problem with delimiting scripts.

Solution: To tell DbVisualizer that a part of a script should be handled as a single statement, you can insert a SQL block begin identifier just before the block and a block end identifier after the block. The default begin identifier consists of two dashes followed by a forward slash (--/), while the block end identifier consists of a single slash (/). The delimiter must be the only text on the line.

For more information about using SQL block begin/end identifiers, see the DbVisualizer User Guide.