Migration Testing

Testing plays an integral part in the data warehouse migration process, and having a well-defined test plan is a key element in contributing to a successful migration. You can perform data warehouse migration testing in multiple phases during the entire migration process. This helps in identifying gaps and incorporating the necessary modifications as early as possible in the process.

Migration Testing

You can automate testing as much as possible, making each test repeatable and enabling a consistent approach to evaluating results. However, ensure you do not skip any testing step in the process.

For this section, we assume that you have tested various queries / jobs in Teradata, and you can use these result sets as a reference.

When testing the data warehouse migration, it is important to ensure the end results between Teradata and Exasol are identical. In addition to this, you also have to consider testing SLAs. Testing SLAs is crucial and must be included as a part of your test procedures whenever possible. For example, consider testing SLAs that define:

  • Extraction time-frames for source system
  • Availability
  • Backup / recovery
  • Loading frequency
  • Time when the data must be ready for reports
  • How long queries of a report can take

Sample Test Documentation

While conducting the tests, it is highly advised to define a test documentation template to enforce a complete and standardized testing procedure. This document must cover:

  • Job ID, Job Name, Job Type
  • Test Overview – describing what and how it will be tested
  • Sources and Targets of the Job
    • Structure – Teradata and translated Exasol DDLs
    • Data – Making sure the data in the source and target tables of the query are the same in Teradata and Exasol
      • Queries that loaded the initial state
  • Queries to test – Original Teradata and translated to Exasol
  • Check queries to compare the result sets of the Teradata SQL and translated Exasol SQL
  • Analysis – What is causing the difference?
  • Adjustments / Solution

Generic Test Approach

The Database Migration script in our GitHub repository automatically generates DDLs for tables and constraints in Exasol. This script also creates import statements to load data into Exasol and includes generic test statements.

A generic Teradata to Exasol migration test procedure is structured as follows:

  1. To test a load, a test table is created for each table of the migration. The test table's name has "_MIG_CHK" added as a suffix to the original table name.

  2. The test table has two rows that contain the results of the test queries for Teradata and Exasol. The first column indicates the database, and the second column indicates the overall row count of the tables. Depending on the data type, the rest of the columns contain metrics such as:
    1. Maximum / minimum values / length
    2. Count of distinct values
    3. Count null values
    4. Distinct values
    5. Average values
  3. Example: 

CREATE OR REPLACE TABLE "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION_MIG_CHK" AS 
SELECT 
    CAST('Exasol' AS                     VARCHAR(20))   AS "DB_SYSTEM"
    CAST(COUNT(*) AS                     DECIMAL(36,0)) AS "CNT"
    CAST(MIN(LENGTH("DBVERSION")) AS     DECIMAL(36,0)) AS "DBVERSION_MIN_LEN"
    CAST(MAX(LENGTH("DBVERSION")) AS     DECIMAL(36,0)) AS "DBVERSION_MAX_LEN"
    CAST(COUNT(DISTINCT("DBVERSION")) AS DECIMAL(36,0)) AS "DBVERSION_CNT_DST"
    ...
FROM 
    "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION" 
 
UNION ALL 
 
SELECT * 
FROM 
    (IMPORT FROM jdbc AT TERADATA_DB STATEMENT '
    SELECT 
    CAST(''Teradata'' AS VARCHAR(20))   AS "DB_SYSTEM", 
    CAST(COUNT(*) AS DECIMAL(36,0)) AS "CNT", 
    CAST(MIN(LENGTH("DBVersion")) AS DECIMAL(36,0)) AS "DBVERSION_MIN_LEN", 
    CAST(MAX(LENGTH("DBVersion")) AS DECIMAL(36,0)) AS "DBVERSION_MAX_LEN", 
    CAST(COUNT(DISTINCT("DBVersion" (CASESPECIFIC))) AS DECIMAL(36,0)) AS "DBVERSION_CNT_DST", 
    ...
FROM 
    "AdventureWorksDW"."AdventureWorksDWBuildVersion"' 
);
  1. A summary table called MIGRATION_CHECK is created in the Database Migration Schema. This summary table contains the columns for schema name, table name, column name of the specific test table, the name of the metric, results for Teradata and Exasol, and lastly, a timestamp. This structure allows for a more straightforward analysis of the differences between the results of Teradata and Exasol.
    Example: 
CREATE OR REPLACE TABLE database_migration.migration_check ( 
        schema_name VARCHAR(128), 
        table_name VARCHAR(128), 
        column_name VARCHAR(128), 
        exasol_metric VARCHAR(50), 
        teradata_metric VARCHAR(50), 
        check_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP 
    );


INSERT INTO     "DATABASE_MIGRATION"."MIGRATION_CHECK" (
                schema_name, table_name, column_name,  exasol_metric, teradata_metric

WITH Exasol AS (   
        SELECT  'ADVENTUREWORKSDW'  AS schema_name, 'ADVENTUREWORKSDWBUILDVERSION_MIG_CHK' AS table_name, 
                'CNT' AS column_name, TO_CHAR("CNT") AS Exasol_metric 
        FROM    "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION_MIG_CHK" 
        WHERE    DB_SYSTEM = 'Exasol' 
         
        UNION ALL 
         
        SELECT  'ADVENTUREWORKSDW' AS schema_name, 'ADVENTUREWORKSDWBUILDVERSION_MIG_CHK' AS table_name, 
                'DBVERSION_CNT_DST' AS column_name, TO_CHAR("DBVERSION_CNT_DST") AS Exasol_metric 
        FROM    "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION_MIG_CHK" 
        WHERE   DB_SYSTEM = 'Exasol' 
         
        UNION ALL 
        ...
        
    ) 
    , 
    Teradata AS (   
        SELECT  'ADVENTUREWORKSDW' AS schema_name, 'ADVENTUREWORKSDWBUILDVERSION_MIG_CHK' AS table_name, 
                'CNT' AS column_name, TO_CHAR("CNT") AS Teradata_metric 
        FROM    "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION_MIG_CHK" 
        WHERE   DB_SYSTEM = 'Teradata' 
         
        UNION ALL 
         
        SELECT  'ADVENTUREWORKSDW' AS schema_name, 'ADVENTUREWORKSDWBUILDVERSION_MIG_CHK' AS table_name, 
                'DBVERSION_CNT_DST' AS column_name, TO_CHAR("DBVERSION_CNT_DST") AS Teradata_metric 
        FROM    "ADVENTUREWORKSDW"."ADVENTUREWORKSDWBUILDVERSION_MIG_CHK" 
        WHERE   DB_SYSTEM = 'Teradata' 
         
        UNION ALL 
        ... 
        
    ) 
SELECT      e.schema_name, e.table_name, e.column_name, e.exasol_metric, t.teradata_metric 
FROM        exasol e 
JOIN     teradata t 
ON          e.schema_name = t.schema_name 
AND         e.table_name = t.table_name 
AND         e.column_name = t.column_name
;

While the generic test statements might be sufficient for initial loads from Teradata and source loads, ETL/ELT processes might need some adjustments, such as additional metrics and / or filters.

Initial Loads

One of the first technical milestones would be to migrate the relevant structures and data from Teradata into Exasol. The generic tests provide a good baseline to immediately check the data after being imported from Teradata into Exasol.

There are other aspects that you still need to consider, such as:

  • The right character representation
  • The handling of empty strings and null values in Teradata and Exasol
  • Checking if the correct data is in the correct column (for example, loading SHIPPING_DATE column into the ORDER_DATE and vice versa)

Source Loads

There are two different ways you can load data into the data warehouse.

  • Using an external data provider: When you use an external data provider to push data into the data warehouse, the data provider needs to export the data into Exasol, including drivers, network access, schemas, users with specific roles/rights, and so on.
  • Pulling data into the data warehouse: The other way to load data is by pulling data into the data warehouse, for example, by import jobs scheduled in an ETL / ELT tool or by a dedicated job scheduler.

The sources, for example, can be files, other source databases, or even APIs. Even though the different sources may rely on similar metrics to be checked on, each source type may require a different approach due to restrictions or limitations.

To test the above-mentioned ways to load data, Teradata's data loading procedures must be translated into the Exasol dialect. To create a scenario, you must define the data in the source and target tables. Make sure that the sources and target tables contain the same data. As a best practice, compare the test setup between Teradata and Exasol before and after the execution of the source loads. The testing procedures generated by the migration script are a good baseline to spot obvious differences.

ETL / ELT Processes

ETL / ELT procedures may contain complex SQL constructs, functions, and procedures to load the core data warehouse and compute business logic. The translation of the ETL / ELT processes from Teradata to Exasol is key to ensure the same results in every step through the data warehouse.

The testing procedure for the ETL / ETL process is similar to source loads; however, the test queries could require more metrics due to the complex nature of these statements. These additional metrics could include uniqueness, referential integrity, various other data cleaning metrics, and data transformation / business logic with their test cases.

For more details, see Verification of the Primary Key Property (PRIMARY KEY) and Verification of the Foreign Key Property (FOREIGN KEY).

Roles / Users / Rights

In addition to structures, data, processes, testing users' rights and roles is a crucial part of the whole migration. While roles and users can be easily migrated and tested from Teradata to Exasol, mapping and testing access rights require some work.

To identify the missing rights for a user while testing, you must execute all commands in Exasol with the same user as in Teradata. Additionally, you can define use cases to focus on testing the operations a user/role is not allowed to perform. This is to ensure the users have the correct privileges. You can also include scenarios to test row-level security.

For more details about access rights in Teradata, refer to the Teradata documentation website.

For more details about the access management in Exasol, see the following sections:

BI Tools and Other Applications

A data warehouse is usually the source for a lot of different data consumers like BI Tools, ETL tools, and other applications. One requirement is that Exasol is supplying these data consumers with the same data that Teradata would, and all the operations the data consumers are doing deliver the same results as before. Therefore, verifying and validating the integration of all the relevant external components, such as job Scheduler, reports in BI Tools or other applications. This is important for the success of the whole migration.