Data Migration

To load data, you have the following options: 

  1. Importing from Oracle to Exasol
  2. Loading via Flat Files

Data Migration

Importing from Oracle to Exasol

There are two different drivers (JDBC and OCI) to load data from Oracle directly into Exasol. It is recommended to use the OCI driver / ORA connection because it provides better loading performance. However, there are instances where a fallback to the JDBC driver might be necessary. For example, when loading CLOB or INTERVAL columns from Oracle to Exasol.

A CONNECTION in Exasol covers the functionality of a DB-LINK in Oracle and more.

import into "C##DB_MIG"."DIM_PRODUCT"
from ora at oracle_oci 
statement 'select * from "C##DB_MIG"."DIM_PRODUCT"';

import into "C##DB_MIG"."DIM_PRODUCT"
from jdbc at oracle_jdbc 
statement 'select * from "C##DB_MIG".DIM_PRODUCT';

The import statements shown above are loading the data using one statement. This is fine for smaller tables (for example, smaller than 10 million rows). For larger tables it is recommended to use IMPORT with multiple statements, which are executed in parallel and speed up the import processes accordingly.

Keep in mind that the idea is to divide the data into similarly sized subsets, so that the import for each statement is balanced for optimal performance.

Additionally, Oracle optimizer hints for parallel execution ( /*+parallel*/ ), can be used to speed up the select statements in Oracle even further, but keep an eye on the system load on Oracle.

There are different ways to define similarly sized subsets. Filter conditions on columns that divide the data similarly into the number of statements used are a good option, even more so if there are maintained indexes and histograms on the filter column in Oracle.

import into "C##MART".sales 
from ora at oracle_oci 
statement 'select /*+parallel*/ * from "C##MART".SALES where sales_quarter = 1'
statement 'select /*+parallel*/ * from "C##MART".SALES where sales_quarter = 2'
statement 'select /*+parallel*/ * from "C##MART".SALES where sales_quarter = 3'
statement 'select /*+parallel*/ * from "C##MART".SALES where sales_quarter = 4';

If the table in Oracle is partitioned, another option is to access individual partitions directly, making use of partition pruning in Oracle. This is the default approach in the Oracle migration scripts if partitions are available.

import into "C##MART".sales 
from ora at oracle_oci
statement 'select /*+parallel*/ * from "C##MART".SALES partition("SYS_P124122")'
statement 'select /*+parallel*/ * from "C##MART".SALES partition("SYS_P124137") 
union all  select /*+parallel*/ * from "C##MART".SALES partition("SYS_P124175")'
statement 'select /*+parallel*/ * from "C##MART".SALES partition("SYS_P125112")'
statement 'select /*+parallel*/ * from "C##MART".SALES partition("SYS_P125209") 
union all  select /*+parallel*/ * from "C##MART".SALES partition("SYS_P127135") 
union all  select /*+parallel*/ * from "C##MART".SALES partition("SYS_P128187")';

If the table has no partitions or indexes and there is no easy way to define similarly-sized subsets, the last option is to use the ROWID of the table to assign them into a "bucket" that can be used as a filter for each statement. In the Oracle migration script, this approach is used if no partitions are available.

import into "C##MART".sales 
from ora at oracle_oci 
statement 'select /*+parallel*/ * from "C##MART".SALES where ora_hash(rowid, 3) = 0'
statement 'select /*+parallel*/ * from "C##MART".SALES where ora_hash(rowid, 3) = 1'
statement 'select /*+parallel*/ * from "C##MART".SALES where ora_hash(rowid, 3) = 2'
statement 'select /*+parallel*/ * from "C##MART".SALES where ora_hash(rowid, 3) = 3';

Additional Resources

For additional information, see

Loading via Flat Files

In some cases, loading data via flat files may be your only viable option. Some of the possible scenarios could be:

  • The extraction of the source data is consuming too many Oracle resources, especially for mission-critical production systems, and you can not extract data at random times.
  • Suppose Exasol is in the cloud, and there is no direct connection to the on-premises Oracle. In that case, the data must be staged somewhere accessible to Exasol, for example, on Amazon S3.

Even though importing from a flat-file is fast, there are ways you can speed it up even further. For large data exports, the recommendation is to split the data into multiple files so that each file is imported in parallel.

Compressing each file into a ZIP, GZIP, or BZIP2 could reduce the file size and speed up the import process even further. For more details, refer to the IMPORT statement.

In Oracle data from files can be loaded via external tables to represent those files as tables in the database. When an external table is selected, the corresponding file is loaded on the fly in the background. This exact behavior is not supported in Exasol, but the behavior can be replicated via a view with a subimport. Another option is just to load the file via an IMPORT into the database.

To get an overview of the external tables and directories, run the query in Appendix D: External tables and directories on Oracle.

After loading the data into Exasol, the next step is to check for the completeness and correctness of the initial load. The Migration Testing section provides you with more details on this.

Additional Resources

For additional information, see the following sections: