Data Migration

To load data, you have the following options: 

  1. Generate IMPORT Statements with Exasol Migration Script
  2. Loading via Flat Files

Data Migration

Generate IMPORT Statements with Exasol Migration Script

The Exasol to Teradata migration script also generates the IMPORT statements for each table, as shown in the example below:

Based on this example, let’s take a closer look at the generated import for DIMACCOUNT:

IMPORT INTO "ADVENTUREWORKSDW"."DIMACCOUNT" FROM JDBC AT TERADATA_DB STATEMENT 'select "AccountKey","ParentAccountKey","AccountCodeAlternateKey","ParentAccountCodeAlternateKey","AccountDescription","AccountType","Operator","CustomMembers","ValueType","CustomMemberOptions" from AdventureWorksDW.DimAccount';

The generated imports are using JDBC, and only one statement per table is generated. This is fine for smaller tables (for example, smaller than 10 million rows). For larger tables it is highly recommended to use IMPORT with multiple statements (for example, 10 statements), which are executed in parallel and speed up the import process accordingly.

The following example loads a fact table in parallel by filtering on the date column: 

IMPORT INTO "ADVENTUREWORKSDW"."FACTINTERNETSALES" 
FROM 
JDBC AT TERADATA_DB 
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-05-01'''
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-06-01'''
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-07-01'''
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-08-01'''
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-09-01'''
STATEMENT 
'select * from AdventureWorksDW.FactInternetSales where trunc(OrderDate, ''MM'')  = ''2013-10-01'''
;

The above example loads 6 months of the table FactInternetSales in parallel. You can also use a higher number of statements running in parallel, depending on the size of your Exasol Cluster and the available Teradata resources.

For each statement, a connection to Teradata is opened. Consequently, you could reach the limit of allowed connections and resources of the technical user used in the loading process. Keep an eye on the Teradata resource monitoring during your initial load.

Additional Resources

For additional information, see

Loading via Flat Files

In some cases, loading data via flat files may be your only viable option. Some of the possible scenarios could be:

  • The extraction of the source data is consuming too many Teradata resources, especially for mission-critical production systems, and you can not extract data at random times.
  • Suppose Exasol is in the cloud, and there is no direct connection to the on-premises Teradata. In that case, the data must be staged somewhere accessible to Exasol, for example, on Amazon S3.

Even though importing from a flat-file is fast, there are ways you can speed it up even further. For large data exports, the recommendation is to split the data into multiple files so that each file is imported in parallel.

Compressing each file into a ZIP, GZIP, or BZIP2 could reduce the file size and speed up the import process even further. For more details, refer to the IMPORT statement.

After loading the data into Exasol, the next step is to check for the completeness and correctness of the initial load. The Migration Testing section provides you with more details on this.

Additional Resources

For additional information, see the following sections: