How Virtual Schemas Work

Every time you access the data of a virtual schema, a container of the corresponding language is started (for example, a JVM or a Python container) on one node. In this container, the code of the adapter script is loaded. Exasol interacts with the adapter script using a request-response protocol encoded in JSON. The database takes the active part sending the request by invoking a callback method.

Here is an example of accessing a virtual table.

SELECT name FROM VIRTUAL_SCHEMA_TEST.users WHERE name like 'A%';

In the above example, following activities go on in the back-end.

  1. Exasol determines that a virtual table is involved, looks up the corresponding adapter script and starts the language container on a single node in the Exasol cluster.
  2. Exasol sends a request to the adapter, asking for the capabilities of the adapter.
  3. The adapter returns a response including the supported capabilities, for example, whether it supports specific WHERE clause filters or specific scalar functions.
  4. Exasol sends an appropriate pushdown request by considering the specific adapter capabilities. For example, the information for column projections (in the example above, only the single column name is necessary) or filter conditions is included.
  5. The adapter processes this request and sends back a certain SQL query in Exasol syntax which will be executed afterward. This query is an IMPORT statement or a SELECT statement including a row-emitting UDF script which handles the data processing.
  6. The example above could be transformed into these two alternatives (IMPORT and SELECT):

    • SELECT name FROM (
          IMPORT FROM JDBC AT ... 
          STATEMENT 'SELECT name from remoteschema.users WHERE name LIKE ''A%''' );
    • SELECT name FROM
          SELECT GET_FROM_MY_DATA_SOURCE('data-source-address', 'required_column=name') ) 
          WHERE name LIKE 'A%';

    In the first alternative, the adapter can handle filter conditions and creates an IMPORT command including a statement which is sent to the external system.

    In the second alternative, a UDF script is used with two parameters handing over the address of the data source and the column projection, but not using any logic for the filter condition. This would then be processed by Exasol rather than the data source.

    The examples show the fully transformed query while only the inner statements are created by the adapter.

  7. The received data is directly integrated into the overall query execution of Exasol.

To learn more about the virtual schema, see the Exasol Virtual Schema repository on GitHub.

If you want to enhance existing adapter or create a new adapter, use the EXPLAIN VIRTUAL statement to analyze what the adapter is pushing down to the underlying system.