IMPORT

Purpose

Use the IMPORT command to transfer data from external data sources into Exasol. You can specify a table name to insert the imported data into that table. Else, the data is returned as result set.

Prerequisites

  • In the source system, you need to have privileges to read the table contents or the files.
  • In Exasol, you need to have system privileges IMPORT, and INSERT to insert rows into the table.
  • When using a connection, you need to have the system privilege USE ANY CONNECTION, or the connection must be granted by the GRANT statement to the user or one of the user roles. For additional information, refer to the CREATE CONNECTION statement.
  • When using an error table, you need the appropriate rights for writing or inserting data.

Syntax

import::=

IMPORT syntax diagram 1

IMPORT syntax diagram 2

import_columns::=

Import columns syntax diagram

dbms_src::=

Database source syntax diagram

Database source syntax diagram 2

file_src::=

File source syntax diagram 1

File source syntax diagram 2

connection_def::=

Connection definition syntax diagram

cloud_connection_def::=

Cloud connection syntax diagram

csv_cols::=

CSV columns syntax diagram

fbv_cols::=

FBV columns syntax diagram

file_opts::=

File options syntax diagram

error_clause::=

Error clause syntax diagram

reject_clause::=

Reject clause syntax diagram

error_dst::=

Error destination syntax diagram

script_src::=

Script source syntax diagram

Usage Notes

  • The progress of the data transfer can be viewed using the system table EXA_USER_SESSIONS (column ACTIVITY) through a second connection screen.
  • Import statements can also be used within SELECT queries. For more information, refer to SELECT statement in the Query Language (DQL) section.
  • In case of an IMPORT from JDBC or CSV sources, decimals are truncated if the target data type has less precision than the source data type.
  • Lines starting with # (hash) will be ignored. For additional information about formatting rules for data records, refer to File Format and Details .
  • For additional information about ETL processes, refer to the ETL in Exasol section.

The following table provides you with an overview of the different elements and their meaning in the IMPORT command: 

Element Meaning
dbms_src

Defines the database source whose connection data is specified in the connection_def. You can choose among an Exasol connection (EXA), a native connection to an Oracle database (ORA) or a JDBC connection to any database (JDBC). For more information about adding drivers, see Driver Management.

The source data can either be a database table as identifier (for example, MY_SCHEMA.MY_TABLE) or a database statement as a string (for example, 'SELECT * FROM DUAL'). In the second case, the expression is executed on the source database, for example, a SQL query or a procedure call.

When using the TABLE syntax (as opposed to STATEMENT), the table name identifier is treated similarly to Exasol tables. If your remote systems expect case-sensitive syntax, you must use quote marks to delimit the table names.

To achieve optimal parallelization: 

  • Importing from Exasol databases (FROM EXA) is always parallelized. This means that for Exasol, loading tables directly is significantly faster than using the STATEMENT option.
  • If you import data from Oracle sources (FROM ORA, using the TABLE option), partitioned tables will be loaded in parallel.
  • Specifying multiple statements is only possible for JDBC and Oracle sources.
file_src

Specifies the data file source. It can be remote files, or local files. The source files can either be CSV or FBV files and should comply to the format specifications in the CSV Data Format and the Fixblock Data Format (FBV). File names may only consist of ASCII characters. A BOMClosed Byte Order Mark is not supported.

Compressed files are recognized by their file extension. Supported extensions are .zip, .gz (gzip) and .bz2 (bzip2). When System.in is specified as filename, data is read from the standard input stream (System.in).

Remote Files: 

FTP, FTPS, HTTP, and HTTPS servers are supported if connection data is defined through the connection_def. The following are some of the considerations while using remote data file source:

  • Certificates are not verified for encrypted connections.
  • If you specify a folder, the result contains one row for each file in the given folder with one column containing the filename.
  • In case of URLs starting with “ftps://”, the implicit encryption is used.
  • In case of URLs starting with “ftp://”, Exasol encrypts the user name and password (explicit encryption), if the server supports this. If the server requires encryption, then the whole data transfer is done encrypted.
  • For FTP and FTPS servers, only passive mode is supported.
  • For HTTP and HTTPS servers only basic authentication is supported.
  • For HTTP and HTTPS connections, HTTP query parameters can be specified by appending them to the file name.

    Example: FILE 'file.csv?op=OPEN&user.name=user'

Local Files:

You can also import local files from your client system. For importing local files, the JDBC driver opens an internal connection to the cluster and provides an HTTP or HTTPS (SECURE-Option) server. When specifying the SECURE option, the data is transferred encrypted, but also with slower performance.

This functionality is only supported for EXAplus and the JDBC driver. It cannot be used in prepared statements or within database scripts.
If you want to process a local file by an explicit program, you can use the tool EXAjload, which is included in the JDBC driver package. Execute this program without parameters to get information about the usage.

Cloud Storage Service:

You can import files from a cloud storage service. See cloud_connection_def.

script_src

Specifies the UDF script to be used for a user-defined import. Optionally, you can define a connection or properties which is forwarded to the script. The specified script will internally generate an SQL statement that does the actual import using SELECT. The script implements a special callback function which receives the import specification (for example, parameters and connection information) and returns an SQL statement. For more information, refer to the User-defined IMPORT using UDFs section.

connection_def

Optional connection definition to encapsulate connection information such as password. For more information, refer to the connection_def.

WITH parameter=value...

Optional parameters to be passed to the script. Each script can define the mandatory and optional parameters it supports. Parameters are simple key-value pairs, with the value being a string. For example:

... WITH PARAM_1='val1' PARAM_2 = 'val2';
connection_def

Defines the connection to the external database or file server. This can be specified within a connection string (for example, 'ftp://192.168.1.1/') and the corresponding login information.

For regular ETL jobs, you can also make use of connections where the connection data like user name and password can easily be encapsulated. For more information, refer to the CREATE CONNECTION section in Access Control Using SQL(DCL).

The declaration of user name and password within the IMPORT command are optional. If they are omitted, the data from the connection string or the connection object are used.

cloud_connection_def

Defines a connection to a cloud storage service. This can be specified within a connection string and the corresponding authentication information. The connection string format may vary by cloud service.

Example:

CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net' 
USER '<accountName>' 
IDENTIFIED BY '<key>'
;
csv_cols

Defines which columns, and how the columns of the CSV files are interpreted. For more information, refer to the CSV DATA Format section.

col_nr Defines the column number starting from one. Alternatively, you can define a certain column range (for example, 5...8 for columns 5,6,7,8). Column numbers must be in ascending order.
FORMAT Optional format definition for numbers or datetime values (default: session format). For more information, refer to Numeric Format Models and Date/Time Format Models sections.

Example

In this example, the first four columns are the column numbers loaded from the CSV file, and the last column has the date format specified for the CSV file.

(1..3,4 FORMAT='DD-MM-YYYY')
fbv_cols

Defines which columns, and how the columns of the FBV files are interpreted. For more information, refer to the Fixblock Data Format section.

The following elements can be specified in an FBV file:

SIZE Defines the number of bytes of the column. The size must be always specified.
START Start byte of the column. It starts with zero. The START values must be in ascending order.
FORMAT Optional format definition for numbers or date time values (default: session format). For more information, refer to Numeric Format Models and Date/Time Format Models sections.
ALIGN Alignment of the column (LEFT or RIGHT). The default value is LEFT.
PADDING Padding characters for columns. By default, space is used. You can also specify an ASCII, either in plain text ('+'), or as hexadecimal value ('0x09'), or as abbreviation('NUL', 'TAB', 'LF', 'CR', 'ESC').

Example

In this example, four columns are imported from the FBV file. The first column is aligned to the right and padded with “x” characters. After the first 12 bytes there is a gap, and the fourth column has the date format specified for the FBV file.

(SIZE=8 PADDING='x' ALIGN=RIGHT,
 SIZE=4,
 START=17 SIZE=8,
 SIZE=32 FORMAT='DD-MM-YYYY')
file_opts
ENCODING Encoding of the CSV or FBV files (default is UTF-8). For information on the supported encodings, refer to Supported Encodings for ETL Processes section.
ROW SEPARATOR

Line break character: 

  • 'LF'(Default): Corresponds to the ASCII character 0x0a (Unix)
  • 'CR': Corresponds to the ASCII character 0x0d (Macintosh)
  • 'CRLF': Corresponds to the ASCII characters 0x0d and 0x0a (Windows)
  • 'NONE': No line break. This is only allowed in FBV files.
NULL

Additional representation of NULL values. This option is only supported for CSV files and only applies to fields not enclosed in field delimiters. Regardless of this option an empty string in the input data always represents a NULL value.

SKIP

Number of rows which will be omitted. This can be useful if you have to include header information within the data files. Skip corresponds to the number of line breaks (ROW SEPARATOR), even if they occur inside data.

TRIM, LTRIM, RTRIM

Defines whether spaces are deleted at the border of CSV columns (LTRIM: from the left, RTRIM: from the right, TRIM: from both sides). By default, no spaces are trimmed.

COLUMN SEPARATOR

Defines the field separator for CSV files. In the default case, the comma (,) is used. You can specify any string, either as plain text (for example, ','), as a hexadecimal value (for example, '0x09') or as an abbreviation (one of 'NUL', 'TAB', 'LF', 'CR', 'ESC').

A plain text value is limited to 10 characters, which will be automatically converted to the file's specified ENCODING (see above).

A hexadecimal value is limited to 10 bytes (not characters) and will not be converted.

COLUMN DELIMITER

Defines the field delimiter for CSV files. By default, the double quote (") is used. You can specify any string, either as plain text ('"'), as a hexadecimal value ('0x09'), or as an abbreviation ('NUL', 'TAB', 'LF', 'CR', 'ESC').

A plain text value is limited to 10 characters and will be automatically converted to the file's specified ENCODING.

A hexadecimal value is limited to 10 bytes (not characters) and will not be converted. If you do not want to use any field delimiter, you can define the empty string ('').

ROW SIZE

Applicable only for FBV files. If the last column of the FBV file is not used, then this value must be specified to recognize the end of a row. Otherwise, the end of a row is implicitly calculated by the last column defined in the IMPORT command.

For example, in the case of (SIZE=4 START=5), it is assumed that one column is read with 4 bytes and that the row consists of 9 bytes overall.

Boolean values '1/0', 'TRUE/FALSE', 'true/false', 'True/False', 'T/F', 't/f', 'y/n', 'Y/N', 'yes/no', 'Yes/No', 'YES/NO' are automatically accepted when inserted into a boolean column. For more information, see EXPORT.

error_clause

This clause defines how many invalid rows of the source are allowed. For example, in the case of REJECT LIMIT 5, the statement will work fine if there are less than or equal to five invalid rows and would throw an exception after the sixth row. The exact row which causes the exception is non-deterministic and may vary.

Additionally, you can write the faulty rows into a file (CSV but not FBV) or a local table within Exasol to process or analyze them later.

Table

For every faulty row, the following columns are created: row number, error message, (expression), truncated flag, and the actual data. The truncated flag indicates whether the data was truncated to the maximum string length.

CSV file For every faulty row, a comment row is created with row number, error message and expression (optional), followed by the actual data row.

The optional expression can be specified for identification reasons in case you use the same error table or file multiple times. You could also use the CURRENT_TIMESTAMP for this.

Constraint violation errors will throw an exception even if the REJECT LIMIT has not been reached.

import_columns

Instead of importing data in to a table by specifying the name of the table, you can specify a list of columns to perform a temporary import. This way, the data imported into Exasol is not persistent but returned as a result set.

You can use the LIKE clause to specify the output columns to be the same as the columns of an existing table in the Exasol database.

Examples

IMPORT INTO table_1 FROM CSV
       AT 'http://192.168.1.1:8080/' USER 'agent_007' IDENTIFIED BY 'secret'
       FILE 'tab1_part1.csv' FILE 'tab1_part2.csv'
       COLUMN SEPARATOR = ';'
       SKIP = 5;

CREATE CONNECTION my_fileserver
       TO 'ftp://192.168.1.2/' USER 'agent_007' IDENTIFIED BY 'secret';
IMPORT INTO table_2 FROM FBV
       AT my_fileserver
       FILE 'tab2_part1.fbv'
       (SIZE=8 PADDING='+' ALIGN=RIGHT,
        SIZE=4,
        SIZE=8,
        SIZE=32 FORMAT='DD-MM-YYYY' );

CREATE CONNECTION my_oracle
  TO '(DESCRIPTION =
         (ADDRESS_LIST = (ADDRESS =
                            (PROTOCOL = TCP)
                            (HOST = 192.168.0.25)(PORT = 1521)
                          )
         )
         (CONNECT_DATA = (SERVICE_NAME = orautf8))
       )';
IMPORT INTO table_3 (col1, col2, col4) FROM ORA
       AT my_oracle
       USER 'agent_008' IDENTIFIED BY 'secret'
       STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '
       ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10;
IMPORT INTO table_4 FROM JDBC DRIVER='MSSQL'
       AT 'jdbc:sqlserver://dbserver;databaseName=testdb'
       USER 'agent_008' IDENTIFIED BY 'secret'
       STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
IMPORT INTO table_5 FROM CSV
       AT 'http://HadoopNode:50070/webhdfs/v1/tmp'
       FILE 'file.csv?op=OPEN&user.name=user';
IMPORT INTO table_6 FROM EXA
       AT my_exasol
       TABLE MY_SCHEMA.MY_TABLE;
IMPORT INTO table_7 FROM SCRIPT etl.import_hcat_table
WITH   HCAT_DB = 'default'
       HCAT_TABLE = 'my_hcat_table'
       HCAT_ADDRESS = 'hcatalog-server:50111'
       HDFS_USER = 'hdfs';
IMPORT INTO table_8  
       FROM LOCAL CSV FILE '~/my_table.csv'
       COLUMN SEPARATOR = ';' SKIP = 5;
-- getting a result set using IMPORT (which can also be used as a sub-select):
SELECT * FROM (
IMPORT INTO (i INT, v VARCHAR(200)) FROM EXA
       AT my_exasol
       TABLE MY_SCHEMA.MY_TABLE
);
-- result set IMPORT without INTO clause:
IMPORT FROM JDBC
       AT my_jdbc_conn
       STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
-- result set IMPORT with INTO and LIKE clause:
IMPORT INTO (LIKE CAT) FROM JDBC
       AT my_exa_conn
       STATEMENT ' SELECT OBJECT_NAME, OBJECT_TYPE FROM EXA_USER_OBJECTS WHERE OBJECT_TYPE IN (''TABLE'', ''VIEW'') ';

Import from an Amazon S3 bucket:

IMPORT INTO table_1 FROM CSV
    AT 'https://<bucketname>.s3.amazonaws.com'
    USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
    FILE 'file.csv';

Import from an Amazon S3 bucket using a fully qualified bucket URL:

An Amazon S3 bucket URL using the legacy global endpoint format (<bucket-name>.s3.amazonaws.com) may need up to 24 hours after bucket creation before it becomes available.

A fully qualified Amazon S3 bucket URL that includes the AWS region (<bucket-name>.s3.<region-code>.amazonaws.com) will become available immediately.

IMPORT INTO table_1 FROM CSV
       AT 'https://<bucketname>.s3-<region>.amazonaws.com/'
       USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
       FILE 'file.csv';

Import from Azure Blob Storage:

IMPORT INTO table1 FROM CSV 
    AT CLOUD AZURE BLOBSTORAGE 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net' 
    USER '<AccountName>' IDENTIFIED BY '<AccountKey>' 
    FILE '<container>/<blob>';