IMPORT

Purpose

Use the IMPORT command to transfer data from external data sources into Exasol.

You can specify a table that the imported data should be inserted into. If no table is specified, the data is returned as a result set.

Prerequisites

  • You must have the system privileges IMPORT and INSERT.
  • When using a connection, you must have the system privilege USE ANY CONNECTION or the connection must be granted either to the user or to one of the user roles by the GRANT statement. For more information, see CREATE CONNECTION.
  • When using an error table, you must have the necessary privileges to write or insert data.
  • On the source system, you must have privileges to read the table contents or the files.

Syntax

import::=

IMPORT syntax diagram 1

IMPORT syntax diagram 2

import_columns::=

Import columns syntax diagram

dbms_src::=

Database source syntax diagram

Database source syntax diagram 2

file_src::=

File source syntax diagram 1

File source syntax diagram 2

connection_def::=

Connection definition syntax diagram

user_identification:=

user identification

cert_verification:=

cert verification

cloud_connection_def::=

Cloud connection syntax diagram

csv_cols::=

CSV columns syntax diagram

fbv_cols::=

FBV columns syntax diagram

file_opts::=

File options syntax diagram

script_src::=

Script source syntax diagram

error_clause::=

Error clause syntax diagram

reject_clause::=

Reject clause syntax diagram

error_dst::=

Error destination syntax diagram

Usage notes

  • Importing from a CSV file is the fastest way to import data into Exasol.

  • When importing from JDBC or CSV sources, decimals are truncated if the target data type has less precision than the source data type.

  • Lines starting with # (hash) in an imported file will be ignored.

    For more information about formatting rules for data records, see File Format and Details .

  • The IMPORT statement runs in parallel on multiple nodes, which means that row packets arrive in no particular order. Ordering is therefore not guaranteed when using IMPORT, even if the source query contains an ORDER BY clause.

  • Import statements can be used within SELECT queries. See also the examples in IMPORT.

  • For more information about ETL processes, see ETL in Exasol.

You can monitor the data transfer operation by opening a second connection and viewing the progress in the column ACTIVITY of the system table EXA_USER_SESSIONS.

Explanation of elements and parameters in IMPORT

import_columns

Instead of importing data into a table by specifying a table name, you can specify a list of columns to perform a temporary import. The data imported into Exasol is then not persistent, but is returned as a result set.

To specify the output columns to be the same as the columns of an existing table in the Exasol database, use the LIKE clause.

dbms_src

Defines the database source whose connection data is specified in connection_def.

You can choose among an Exasol connection (EXA), a native connection to an Oracle database (ORA), or a JDBC connection to any database (JDBC). For more information about adding drivers, see Driver Management.

The source data can either be a database table as identifier (for example, MY_SCHEMA.MY_TABLE) or a database statement as a string (for example, 'SELECT * FROM DUAL'). When using a database statement as a string, the expression is executed on the source database, for example, a SQL query or a procedure call.

When using the TABLE syntax (as opposed to STATEMENT), the table name identifier is treated similarly to Exasol tables. If the remote system expects case-sensitive syntax, you must use quote marks to delimit the table names.

Parallel EXA to EXA connections

To enable parallelization in IMPORT/EXPORT operations using an Exasol connection (EXA), you can define multiple hosts in the connection string. Parallel connections are then established using the existing database port. No additional ports need to be opened by the database.

The connection string supports OpenID authentication and the following ODBC parameters: SSLCertificate, AuthMethod, HostTimeOut, LoginTimeout, Encryption, LogMode, EXALogFile, and Fingerprint. Other parameters are not supported.

You must include all hosts in the connection string.

To achieve optimal parallelization: 

  • If you import from an Exasol database using FROM EXA, importing is always parallelized. This means that for Exasol, loading tables directly is significantly faster than using the STATEMENT option.

  • If you import data from Oracle sources using FROM ORA with the TABLE option, partitioned tables will be loaded in parallel.

  • Specifying multiple STATEMENT clauses will allow them to be executed in parallel, which may improve performance compared to loading all data as a single STATEMENT. You can only specify multiple statements for JDBC and Oracle sources.

file_src

Specifies the data file source.

  • The source file can be in either CSV or FBV format, and must comply to the format specifications described in CSV Data Format or Fixblock Data format (FBV).

  • File names may only consist of ASCII characters. Using a BOM (byte order mark) is not supported.

  • Compressed files are recognized by their file extension. Supported extensions are .zip, .gz (gzip) and .bz2 (bzip2).

  • When System.in is specified as filename, data is read from the standard input stream.

Remote files

You can import data from files that are stored on remote services.

FTP, FTPS, SFTP, HTTP, and HTTPS servers are supported if connection data is defined using connection_def.

  • For FTP and FTPS servers, only passive mode is supported.

  • For HTTP and HTTPS servers, only basic authentication is supported.

  • For HTTP and HTTPS connections, HTTP query parameters can be specified by appending them to the file name. For example:

    FILE 'file.csv?op=OPEN&user.name=user'

  • In case of URLs starting with ftps://, Exasol uses the implicit encryption.

  • In case of URLs starting with ftp://, Exasol encrypts the user name and password (explicit encryption) if the server supports this. If the server requires encryption, the whole data transfer is done encrypted.

  • For encrypted connections, TLS certificate verification is disabled by default. You can enable or disable certificate verification in the following ways:

    • Set the database parameter -etlCheckCertsDefault=1

    • Enable or disable certificate verification directly in the SQL statement:

      VERIFY CERTIFICATE: Enables certificate verification.

      IGNORE CERTIFICATE: Disables certificate verification.

      PUBLIC KEY 'sha256//*******': Specifies the public key for certificate verification.

  • If you specify a folder, the result contains one row for each file in the given folder, with one column containing the filename.

Local files

You can import local files from your client system. When importing local files, the JDBC driver opens an internal connection to the cluster and provides an HTTP or HTTPS server, depending on if the SECURE option is specified. If the SECURE keyword is used, the server uses the HTTPS protocol and the data is transferred encrypted. If the SECURE keyword is omitted, the server uses HTTP and the connection will be unencrypted.

The default option for the internal connection is determined by the main connection:

  • If the main connection is encrypted, the local import operation is also encrypted by default.

  • If the main connection is not encrypted, the local import operation is also not encrypted by default. In this case, you can override the driver behavior by using the SECURE keyword to enforce a secure connection.

When using the SECURE option, data transfer performance will be reduced.

This functionality is only supported when using EXAplus and the JDBC driver. It cannot be used in prepared statements or within database scripts. If you want to process a local file by an explicit program, you can use the tool EXAjload, which is included in the JDBC driver package. For more information about how to use EXAjload, execute the program without parameters.

Cloud storage

You can import files from cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. The connection to the cloud storage service is then defined in cloud_connection_def. For more information, see also Load Data from CSV/FBV Files.

connection_def

Defines the connection to an external database or file server. The connection can be specified in a connection string, which may include authentication information.

For regular ETL jobs you can also make use of connections, where connection information such as username and password can easily be encapsulated. For more information, see CREATE CONNECTION.

The declaration of USER  and IDENTIFIED BY within the command are optional. If they are omitted, the authentication information in the connection string or the connection object are used.

Kerberos SSO

For JDBC connections, it is possible to use Kerberos authentication by specifying specific data in the IDENTIFIED BY field. This data consists of a key that indicates that Kerberos authentication should be used (ExaAuthType=Kerberos), a base64 encoded configuration file, and a base64 encoded keytab file containing the credentials for the principal.

Example
IMPORT INTO table1 
    FROM JDBC AT '<JDBC_URL>' 
    USER '<kerberos_principal>' 
    IDENTIFIED BY 'ExaAuthType=Kerberos;<base64_krb_conf>;<base64_keytab>' TABLE table2; 

cloud_connection_def

Defines a connection to a cloud storage service.

The connection can be specified in a connection string, which may include authentication information. The format of the connection string can be different depending on the cloud service.

For more information, see Load Data from CSV/FBV Files.

Example
CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net' 
    USER '<accountName>' IDENTIFIED BY '<key>';

csv_cols

Defines how the column data is written to a CSV file. For more information, see CSV DATA Format.

Option Description

col_nr

Defines the column number. The first column is 1.

You can also define a column range, for example, 5..8 for columns 5, 6, 7, 8. Column numbers must be in ascending order.

FORMAT

Optional format definition for numbers or datetime values. The default is the session format.

For more information, see Numeric format models and Date/time format models.

Example

In the following example, four columns are loaded from the CSV file, with the fourth column using a specified date format.

(1..3,4 FORMAT='DD-MM-YYYY')

For more complete examples of how to use FORMAT, see Examples.

fbv_cols

Defines which columns are read from an FBV file, and how they are interpreted. For more information, see Fixblock Data Format.

The following elements can be specified in an FBV file:

Element Description
SIZE

Defines the number of bytes of the column. The size must always be specified.

START Start byte of the column. It starts with zero. The START values must be in ascending order.
FORMAT

Optional format definition for numbers or date time values. Default: session format.

For more information, see Numeric format models and Date/time format models.

ALIGN

Alignment of the column.

Allowed values are LEFT or RIGHT. The default value is LEFT.

PADDING

Padding characters for columns.

By default, a space character is used. You can also specify an ASCII character, either in plain text ('+'), as a hexadecimal value ('0x09'), or as an abbreviation ('NUL', 'TAB', 'LF', 'CR', 'ESC').

Example

In this example, four columns are imported from an FBV file. The first column is padded with “x” characters. After the first 12 bytes there is a gap, and the fourth column has the date format specified for the FBV file.

(SIZE=8 PADDING='x' ALIGN=RIGHT,
 SIZE=4,
 START=17 SIZE=8,
 SIZE=32 FORMAT='DD-MM-YYYY')

file_opts

Option Description
COLUMN DELIMITER

Defines the field delimiter for CSV files.

By default, the double quote character " is used. You can specify any string as a delimiter, either as plain text ('"'), as a hexadecimal value ('0x09'), or as an ASCII abbreviation ('NUL', 'TAB', 'LF', 'CR', 'ESC').

A plain text value is limited to 10 characters, which are automatically converted to the encoding set for the file with the ENCODING option.

A hexadecimal value is limited to 10 bytes (not characters) and will not be converted.

If you do not want to use any field delimiter, define an empty string ('').

COLUMN SEPARATOR

Defines the field separator for CSV files.

By default, comma is used. You can specify any string as a separator, either as plain text (','), as a hexadecimal value ('0x09'), or as an ASCII abbreviation ('NUL', 'TAB', 'LF', 'CR', 'ESC').

A plain text value is limited to 10 characters, which are automatically converted to the encoding set for the file with the ENCODING option.

A hexadecimal value is limited to 10 bytes (not characters) and will not be converted.

ENCODING

Encoding of the CSV or FBV files. Default is UTF-8.

For information about supported encodings, see Supported Encodings for ETL Processes.

NULL

Additional representation of NULL values. This option is only supported for CSV files and only applies to fields not enclosed in field delimiters. Regardless of this option, an empty string in the input data always represents a NULL value.

ROW SEPARATOR

Defines the line break character. 

'LF' (default) corresponds to the ASCII character 0x0a (Unix/Linux).

'CR' corresponds to the ASCII character 0x0d (macOS).

'CRLF' corresponds to the ASCII characters 0x0d and 0x0a (Windows).

'NONE' means no line break. This is only allowed in FBV files.

SKIP

Number of rows that will be omitted in the import. This can be useful if you have to include header information within the data files. SKIP defines rows by line breaks (defined by ROW SEPARATOR) even if they occur inside data.

TRIM, LTRIM, RTRIM

Defines whether spaces are removed at the border of CSV columns.

LTRIM: from the left, RTRIM: from the right, TRIM: from both sides.

By default, no spaces are removed.

Boolean values

The following boolean value pairs are automatically accepted when inserted into a boolean column:

'1/0', 'TRUE/FALSE', 'true/false', 'True/False', 'T/F', 't/f', 'y/n', 'Y/N', 'yes/no', 'Yes/No', 'YES/NO'

For more information, see EXPORT.

script_src

Specifies the UDF script to be used for a user-defined import. Optionally, you can define a connection or properties that will be forwarded to the script. The specified script will generate a SELECT statement internally that does the actual import. The script implements a special callback function that receives the import specification (for example, parameters and connection information) and returns a SELECT statement. For more information, see User-defined IMPORT/EXPORT Using UDFs.

connection_def

Optional connection definition for being able to encapsulate connection information such as password. For more information, see connection_def.

WITH parameter=value...

Optional parameters to be passed to the script. Each script can define the mandatory and optional parameters it supports. Parameters are simple key-value pairs, with the value being a string.

Example
... WITH PARAM_1='val1' PARAM_2 = 'val2';

error_clause

Defines how many invalid rows of the source are allowed.

For example, in the case of REJECT LIMIT 5, the statement will work fine if there are less than or equal to five invalid rows and would throw an exception after the sixth row. The exact row that causes the exception is non-deterministic and may vary.

You can write the faulty rows into a CSV file (not FBV) or to a local table within Exasol to process or analyze them later.

An optional expression can be specified for identification reasons in case you use the same error table or file multiple times. You can also use CURRENT_TIMESTAMP for this.

Table

For every faulty row, the following columns are created: row number, error message, [expression], and a truncated flag, followed by the actual data. The truncated flag indicates whether the data was truncated to the maximum string length.

CSV file For every faulty row, a comment row is created with the row number, error message, [expression], followed by the actual data row.

Constraint violation errors will throw an exception even if the REJECT LIMIT has not been reached.

Examples

-- import data from local csv file
-- using semicolon as separator and skipping the first 5 lines

IMPORT INTO my_table
    FROM LOCAL CSV FILE '~/my_table.csv'
    COLUMN SEPARATOR = ';' SKIP = 5;
-- import data from local csv file with custom datetime format for column 7 and error clause

IMPORT INTO my_table
FROM LOCAL CSV FILE 'C:\Data\f3.csv.zip' (1..6, 7 FORMAT = 'YYYY-MM-DD HH24:MI:SS:FF3', 8..9)
COLUMN SEPARATOR = ';'
ROW SEPARATOR = 'CRLF'
COLUMN DELIMITER = '"'
ENCODING = 'UTF-8'
SKIP = 1
ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT UNLIMITED;
-- import data from two csv files on a server using tls/ssl certificate

IMPORT INTO my_table FROM CSV
    AT 'https://192.168.1.1:8080/' USER 'my_user' IDENTIFIED BY 'my_secret'
    VERIFY CERTIFICATE PUBLIC KEY 'my_public_key'
    FILE 'tab1_part1.csv' FILE 'tab1_part2.csv'
    COLUMN SEPARATOR = ';'
    SKIP = 5;
-- import data from fbv file on ftp server

CREATE CONNECTION my_fileserver
    TO 'ftp://192.168.1.2/' USER 'my_user' IDENTIFIED BY 'my_secret';

IMPORT INTO my_table FROM FBV
    AT my_fileserver
    FILE 'tab2_part1.fbv'
    (SIZE=8 PADDING='+' ALIGN=RIGHT,
    SIZE=4,
    SIZE=8,
    SIZE=32 FORMAT='DD-MM-YYYY' );
-- import data from an oracle database

CREATE CONNECTION my_oracle
    TO '(DESCRIPTION =
            (ADDRESS_LIST = 
                (ADDRESS =
                    (PROTOCOL = TCP)
                    (HOST = 192.168.0.25)
                    (PORT = 1521)
                )
            )
            (CONNECT_DATA = (SERVICE_NAME = orautf8))
        )';
        
IMPORT INTO my_table (col1, col2, col4) FROM ORA
    AT my_oracle
    USER 'my_user' IDENTIFIED BY 'my_secret'
    STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '
    ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10;
-- import data from microsoft sql server using jdbc

IMPORT INTO my_table FROM JDBC DRIVER='MSSQL'
    AT 'jdbc:sqlserver://dbserver;databaseName=testdb'
    USER 'my_user' IDENTIFIED BY 'my_secret'
    STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
-- import from a csv file on a hadoop node

IMPORT INTO my_table FROM CSV
    AT 'http://HadoopNode:50070/webhdfs/v1/tmp'
    FILE 'file.csv?op=OPEN&user.name=user';
-- import data from another exasol database, using multiple hosts for parallelization

IMPORT INTO my_table FROM EXA
    AT "my_exasol_1,my_exasol_2:8563;HostTimeOut=1000;Encryption=Y"    
    USER 'my_user' 
    IDENTIFIED BY 'my_secret'
    TABLE MY_SCHEMA.MY_TABLE;
-- import data using a udf script for hcatalog

IMPORT INTO my_table FROM SCRIPT etl.import_hcat_table
    WITH HCAT_DB = 'default'
         HCAT_TABLE = 'my_hcat_table'
         HCAT_ADDRESS = 'hcatalog-server:50111'
         HDFS_USER = 'hdfs';

Import from cloud storage services

-- import data from csv file in amazon s3 bucket
    
IMPORT INTO my_table FROM CSV
       AT 'https://<bucketname>.s3[-REGION].amazonaws.com/'
       USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
       FILE 'file.csv';
-- import data from csv file in azure blob

IMPORT INTO my_table FROM CSV 
    AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net' 
    USER '<AccountName>' IDENTIFIED BY '<AccountKey>' 
    FILE '<container>/<blob>';
-- import data from csv file on google cloud storage

IMPORT INTO my_table FROM CSV
    AT 'https://testbucket.storage.googleapis.com' 
    USER '<Access ID>' IDENTIFIED BY '<secret>'
    FILE 'my_path/file.csv';

Import using SELECT result set

-- getting a result set using IMPORT (which can also be used as a sub-select)

SELECT * FROM (
IMPORT INTO (i INT, v VARCHAR(200)) FROM EXA
    AT my_exasol
    TABLE MY_SCHEMA.MY_TABLE
);
-- result set IMPORT without INTO clause

IMPORT FROM JDBC
    AT my_jdbc_conn
    STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
-- result set IMPORT with INTO and LIKE clause

IMPORT INTO (LIKE CAT) FROM JDBC
    AT my_exa_conn
    STATEMENT ' SELECT OBJECT_NAME, OBJECT_TYPE FROM EXA_USER_OBJECTS WHERE OBJECT_TYPE IN (''TABLE'', ''VIEW'') ';