IMPORT

Purpose

Use the IMPORT command to transfer data from external data sources into Exasol. You can specify a table name to insert the imported data into that table. Else, the data is returned as result set.

Prerequisites

  • In the source system, you need to have privileges to read the table contents or the files.
  • In Exasol, you need to have system privilege IMPORT, and INSERT privileges to insert rows into the table.
  • When using a connection, you need to have the system privilege USE ANY CONNECTION or the connection is granted by the GRANT statement to the user, or to one of the user roles. For additional information, refer to the CREATE CONNECTION statement.
  • When using an error table, you need the appropriate rights for writing or inserting data.

Syntax

import::=

Import Statement

Import Statement

import_columns::=

Import_Columns

dbms_src::=

DBMS Source

DBMS Source

connection_def::=

Connection Definition

file_src::=

File Src

File Src

csv_cols::=

CSV Import Cols

fbv_cols::=

fbv import cols

file_opts::=

File Import Options

error_clause::=

Error Clause

Error Clause

error_dst::=

Error dst

script_src::=

Script Src

Usage Notes

  • The progress of the data transfer can be viewed using the system table EXA_USER_SESSIONS (column ACTIVITY) through a second connection screen.
  • Import statements can also be used within SELECT queries. For more information, refer to SELECT statement in the Query Language (DQL) section.
  • In case of an IMPORT from JDBC or CSV sources, decimals are truncated if the target data type has less precision than the source data type.
  • For additional information about ETL processes, refer to the ETL Processes section.

The following table provides you with an overview of the different elements and their meaning in the IMPORT command: 

Element Meaning
dbms_src

Defines the database source whose connection data is specified in the connection_def. You can choose among an Exasol connection (EXA), a native connection to an Oracle database (ORA) or a JDBC connection to any database (JDBC).

Some JDBC drivers are already delivered as default (visible in EXAoperation) and can be used within the connection string (for example, jdbc:mysql, jdbc:postgres). You can additionally configure JDBC drivers in EXAoperation and choose them through the DRIVER option if its prefix is ambiguous.

Only the pre-installed JDBC drivers (marked gray in EXAoperation) are tested and officially supported. However, our support team will try to help you in case of problems with other drivers.

The source data can either be a database table (as an identifier like, for example, MY_SCHEMA.MY_TABLE) or a database statement (as a string like, for example, 'SELECT "TEST" FROM DUAL'). In the second case, this expression is executed on the source database, for example, a SQL query or a procedure call.

When using the TABLE syntax (as opposed to STATEMENT), the table name identifier is treated similarly to Exasol tables. Therefore you have to quote table names if your remote systems expect case-sensitive syntax.

To achieve an optimal parallelization: 

  • Importing from Exasol databases is always parallelized. For Exasol, loading tables directly is significantly faster than using the STATEMENT option.
  • If you import data from Oracle sources, partitioned tables will be loaded in parallel.
  • Specifying multiple statements is only possible for JDBC and Oracle sources.
file_src

Specifies the data file source. It can be remote files, or local files. The source files can either be CSV or FBV files and should comply to the format specifications in the CSV Data Format and the Fixblock Data Format (FBV). File names may only consist of ASCII characters. A BOMByte Order Mark is not supported.

Compressed files are recognized by their file extension. Supported extensions are .zip, .gz (gzip) and .bz2 (bzip2). When System.in is specified as filename, data is read from the standard input stream (System.in).

Remote Files: 

FTP, FTPS, SFTP, HTTP, and HTTPS servers are supported whose connection data is defined through the connection_def. The following are some of the considerations while using remote data file source:

  • Certificates are not verified for encrypted connections.
  • If you specify a folder, the result contains one row for each file in the given folder with one column containing the filename.
  • In case of URLs starting with 'ftps://', the implicit encryption is used.
  • In case of URLs starting with 'ftp://', Exasol encrypts the user name and password (explicit encryption), if the server supports this. If the server requires encryption, then the whole data transfer is done encrypted.
  • For HTTP and HTTPS servers only basic authentication is supported.
  • For HTTP and HTTPS connections, http query parameters can be specified by appending them to the file name (for example, FILE 'file.csv?op=OPEN&user.name=user').

Local Files:

You can also import local files from your client system. When specifying the SECURE option, the data is transferred encrypted, but also with slower performance.

This functionality is only supported for EXAplus and the JDBC driver. It can neither be used in prepared statements nor within database scripts. If you want to process a local file by an explicit program, you can use the tool EXAjload which is delivered within the JDBC driver package. Execute this program without parameters to get information about the usage.

For importing local files, the JDBC driver opens an internal connection to the cluster and provides an HTTP or HTTPS (SECURE-Option) server.

script_src

Specifies the UDF script to be used for a user-defined import. Optionally, you can define a connection or properties which is forwarded to the script. The specified script will internally generate an SQL statement that does the actual import using INSERT INTO SELECT. The script implements a special callback function which receives the import specification (for example, parameters and connection information) and returns an SQL statement. For more information, refer to the User-defined IMPORT using UDFs section.

connection_def

Optional connection definition to encapsulate connection information such as password. For more information, refer to the connection_def.

WITH parameter=value...

Optional parameters to be passed to the script. Each script can define the mandatory and optional parameters it supports. Parameters are simple key-value pairs, with the value being a string. For example:

... WITH PARAM_1='val1' PARAM_2 = 'val2';
connection_def

Defines the connection to the external database or file server. This can be specified within a connection string (for example, 'ftp://192.168.1.1/') and the corresponding login information.

For regular ETL jobs, you can also make use of connections where the connection data like user name and password can easily be encapsulated. For more information, refer to the CREATE CONNECTION section in Access Control Using SQL(DCL).

The declaration of user name and password within the IMPORT command are optional. If they are omitted, the data from the connection string or the connection object are used.

For JDBC connections, it is possible to use Kerberos authentication by defining specific data in the IDENTIFIED BY field. This data consists of a key which indicates that Kerberos authentication should use (ExaAuthType=Kerberos) a base64 encoded configuration file and a base64 encoded keytab file containing the credentials for the Kerberos principal. The syntax is as follows:
IMPORT INTO table1 FROM JDBC AT '<JDBC_URL>' USER '<kerberos_principal>' IDENTIFIED BY 'ExaAuthType=Kerberos;<base64_krb_conf>;<base64_keytab>' TABLE table2;

csv_cols

Defines which columns, and how the columns of the CSV files are interpreted. For more information, refer to the CSV DATA Format section.

col_nr Defines the column number starting from one. Alternatively, you can define a certain column range (for example, 5...8 for columns 5,6,7,8). Column numbers must be in ascending order.
FORMAT Optional format definition for numbers or datetime values(default: session format). For more information, refer to Numeric Format Models and Date/Time Format Models sections.

Example

In this example, the first four columns are the column numbers loaded from the CSV file, and the last column is the date format specified for the CSV file.

(1..3,4 FORMAT='DD-MM-YYYY')
fbv_cols

Defines which columns, and how the columns of the FBV files are interpreted. For more information, refer to the Fixblock Data Format section.

The following elements can be specified in an FBV file:

SIZE Defines the number of bytes of the column. The size must be always specified.
START Start byte of the column. It starts with zero. The START values must be in ascending order.
FORMAT Optional format definition for numbers or date time values (default: session format). For more information, refer to Numeric Format Models and Date/Time Format Models sections.
ALIGN Alignment of the column (LEFT or RIGHT). The default value is LEFT.
PADDING Padding characters for columns. By default, space is used. You can also specify an ASCII, either in plain text ('+'), or as hexadecimal value ('0x09'), or as abbreviation('NUL', 'TAB', 'LF', 'CR', 'ESC').

Example

In this example, four columns are imported from the FBV file. The first column is aligned to the right, and padded with x characters. After the first 12 byte, a gap exists, and the fourth column is the date format specified for the FBV file

(SIZE=8 PADDING='x' ALIGN=RIGHT,
 SIZE=4,
 START=17 SIZE=8,
 SIZE=32 FORMAT='DD-MM-YYYY')
file_opts
ENCODING Encoding of the CSV or FBV files (default is UTF8). For information on the supported encodings, refer to Supported Encodings for ETL Processes section.
ROW SEPARATOR

Line break character: 

  • 'LF'(Default): Corresponds to the ASCII character 0x0a (Unix)
  • 'CR': Corresponds to the ASCII character 0x0d (Macintosh)
  • 'CRLF': Corresponds to the ASCII character 0x0d ad 0x0a (Windows)
  • 'NONE': No line break. This is only allowed in FBV files.
NULL

Additional representation of NULL values. This option is only supported for CSV files and only applies to fields not enclosed in field delimiters. Regardless of this option an empty string in the input data always represents a NULL value.

SKIP

Number of rows which will be omitted. This can be useful if you have to include header information within the data files. Skip corresponds to the number of line breaks (ROW SEPARATOR), even if they occur inside data.

TRIM, LTRIM, RTRIM

Defines whether spaces are deleted at the border of CSV columns (LTRIM: from the left, RTRIM: from the right, TRIM: from both sides). By default, no spaces are trimmed.

COLUMN SEPARATOR

Defines the field separator for CSV files. In the default case, the comma (,) is used. You can specify any string, either as plain text (for example, ','), as a hexadecimal value (for example, '0x09') or as an abbreviation (one of 'NUL', 'TAB', 'LF', 'CR', 'ESC'). A plain text value is limited to 10 characters, which will be automatically converted to the file's specified ENCODING (see above). A hexadecimal value is limited to 10 bytes (not characters) and will not be converted.

COLUMN DELIMITER

Defines the field delimiter for CSV files. By default, the double quote (") is used. You can specify any string, either as plain text ('"'), as a hexadecimal value ('0x09'), or as an abbreviation ('NUL', 'TAB', 'LF', 'CR', 'ESC'). A plain text value is limited to 10 characters and will be automatically converted to the file's specified ENCODING. A hexadecimal value is limited to 10 bytes (not characters) and will not be converted. If you do not want to use any field delimiter, you can define the empty string ('').

ROW SIZE

Applicable only for FBV files. If the last column of the FBV file is not used, then this value must be specified to recognize the end of a row. Otherwise, the end of a row is implicitly calculated by the last column defined in the IMPORT command. For example, in the case of (SIZE=4 START=5), it is assumed that one column is read with 4 bytes and that the row consists of 9 bytes overall.

error_clause

This clause defines how many invalid rows of the source are allowed. For example, in the case of REJECT LIMIT 5, the statement will work fine if there are less than or equal to five invalid rows and would throw an exception after the sixth row.

Additionally, you can write the faulty rows into a file (CSV but not FBV) or a local table within Exasol to process or analyze them later.

Table

For every faulty row, the following columns are created: row number, error message, (expression), truncated flag, and the actual data. The truncated flag indicates whether the data was truncated to the maximum string length.

CSV file For every faulty row, a comment row is created with row number, error message and expression(optional), followed by the actual data row.

The optional expression can be specified for identification reasons in case you use the same error table or file multiple times. You could also use the CURRENT_TIMESTAMP for this.

Example

IMPORT INTO table_1 FROM CSV
       AT 'http://192.168.1.1:8080/' USER 'agent_007' IDENTIFIED BY 'secret'
       FILE 'tab1_part1.csv' FILE 'tab1_part2.csv'
       COLUMN SEPARATOR = ';'
       SKIP = 5;

CREATE CONNECTION my_fileserver
       TO 'ftp://192.168.1.2/' USER 'agent_007' IDENTIFIED BY 'secret';

IMPORT INTO table_2 FROM FBV
       AT my_fileserver
       FILE 'tab2_part1.fbv'
       (SIZE=8 PADDING='+' ALIGN=RIGHT,
        SIZE=4,
        SIZE=8,
        SIZE=32 FORMAT='DD-MM-YYYY' );

CREATE CONNECTION my_oracle
  TO '(DESCRIPTION =
         (ADDRESS_LIST = (ADDRESS =
                            (PROTOCOL = TCP)
                            (HOST = 192.168.0.25)(PORT = 1521)
                          )
         )
         (CONNECT_DATA = (SERVICE_NAME = orautf8))
       )';

IMPORT INTO table_3 (col1, col2, col4) FROM ORA
       AT my_oracle
       USER 'agent_008' IDENTIFIED BY 'secret'
       STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '
       ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10;

IMPORT INTO table_4 FROM JDBC
       AT 'jdbc:exa:192.168.6.11..14:8563'
       USER 'agent_008' IDENTIFIED BY 'secret'
       STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';

IMPORT INTO table_5 FROM CSV
       AT 'http://HadoopNode:50070/webhdfs/v1/tmp'
       FILE 'file.csv?op=OPEN&user.name=user';

IMPORT INTO table_6 FROM CSV
       AT 'https://testbucket.s3.amazonaws.com'
       USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
       FILE 'file.csv';

IMPORT INTO table_7 FROM EXA
       AT my_exasol
       TABLE MY_SCHEMA.MY_TABLE;

IMPORT INTO table_8 FROM SCRIPT etl.import_hcat_table
WITH   HCAT_DB = 'default'
       HCAT_TABLE = 'my_hcat_table'
       HCAT_ADDRESS = 'hcatalog-server:50111'
       HDFS_USER = 'hdfs';

IMPORT INTO table_9  
       FROM LOCAL CSV FILE '~/my_table.csv'
       COLUMN SEPARATOR = ';' SKIP = 5;