Use the IMPORT command to transfer data from external data sources into Exasol. You can specify a table name to insert the imported data into that table. Else, the data is returned as result set.
- In the source system, you need to have privileges to read the table contents or the files.
- In Exasol, you need to have system privilege IMPORT, and INSERT privileges to insert rows into the table.
- When using a connection, you need to have the system privilege USE ANY CONNECTION or the connection is granted by the GRANT statement to the user, or to one of the user roles. For additional information, refer to the CREATE CONNECTION statement.
- When using an error table, you need the appropriate rights for writing or inserting data.
- The progress of the data transfer can be viewed using the system table EXA_USER_SESSIONS (column ACTIVITY) through a second connection screen.
- Import statements can also be used within SELECT queries. For more information, refer to SELECT statement in the Query Language (DQL) section.
- In case of an IMPORT from JDBC or CSV sources, decimals are truncated if the target data type has less precision than the source data type.
- For additional information about ETL processes, refer to the ETL Processes section.
The following table provides you with an overview of the different elements and their meaning in the IMPORT command:
Defines the database source whose connection data is specified in the connection_def. You can choose among an Exasol connection (EXA), a native connection to an Oracle database (ORA) or a JDBC connection to any database (JDBC).
Only the pre-installed JDBC drivers (marked gray in EXAoperation) are tested and officially supported. However, our support team will try to help you in case of problems with other drivers.
The source data can either be a database table (as an identifier like, for example, MY_SCHEMA.MY_TABLE) or a database statement (as a string like, for example, 'SELECT "TEST" FROM DUAL'). In the second case, this expression is executed on the source database, for example, a SQL query or a procedure call.
When using the TABLE syntax (as opposed to STATEMENT), the table name identifier is treated similarly to Exasol tables. Therefore you have to quote table names if your remote systems expect case-sensitive syntax.
To achieve an optimal parallelization:
Specifies the data file source. It can be remote files, or local files. The source files can either be CSV or FBV files and should comply to the format specifications in the CSV Data Format and the Fixblock Data Format (FBV). File names may only consist of ASCII characters. A BOMByte Order Mark is not supported.
Compressed files are recognized by their file extension. Supported extensions are .zip, .gz (gzip) and .bz2 (bzip2). When System.in is specified as filename, data is read from the standard input stream (System.in).
FTP, FTPS, SFTP, HTTP, and HTTPS servers are supported whose connection data is defined through the connection_def. The following are some of the considerations while using remote data file source:
You can also import local files from your client system. When specifying the SECURE option, the data is transferred encrypted, but also with slower performance.
This functionality is only supported for EXAplus and the JDBC driver. It can neither be used in prepared statements nor within database scripts. If you want to process a local file by an explicit program, you can use the tool EXAjload which is delivered within the JDBC driver package. Execute this program without parameters to get information about the usage.
For importing local files, the JDBC driver opens an internal connection to the cluster and provides an HTTP or HTTPS (SECURE-Option) server.
Specifies the UDF script to be used for a user-defined import. Optionally, you can define a connection or properties which is forwarded to the script. The specified script will internally generate an SQL statement that does the actual import using INSERT INTO SELECT. The script implements a special callback function which receives the import specification (for example, parameters and connection information) and returns an SQL statement. For more information, refer to the User-defined IMPORT using UDFs section.
Optional connection definition to encapsulate connection information such as password. For more information, refer to the connection_def.
Optional parameters to be passed to the script. Each script can define the mandatory and optional parameters it supports. Parameters are simple key-value pairs, with the value being a string. For example:
... WITH PARAM_1='val1' PARAM_2 = 'val2';
Defines the connection to the external database or file server. This can be specified within a connection string (for example, 'ftp://192.168.1.1/') and the corresponding login information.
For regular ETL jobs, you can also make use of connections where the connection data like user name and password can easily be encapsulated. For more information, refer to the CREATE CONNECTION section in Access Control Using SQL(DCL).
The declaration of user name and password within the IMPORT command are optional. If they are omitted, the data from the connection string or the connection object are used.
For JDBC connections, it is possible to use Kerberos authentication by defining specific data in the IDENTIFIED BY field. This data consists of a key which indicates that Kerberos authentication should use (ExaAuthType=Kerberos) a base64 encoded configuration file and a base64 encoded keytab file containing the credentials for the Kerberos principal. The syntax is as follows:
Defines which columns, and how the columns of the CSV files are interpreted. For more information, refer to the CSV DATA Format section.
In this example, the first four columns are the column numbers loaded from the CSV file, and the last column is the date format specified for the CSV file.
Defines which columns, and how the columns of the FBV files are interpreted. For more information, refer to the Fixblock Data Format section.
The following elements can be specified in an FBV file:
In this example, four columns are imported from the FBV file. The first column is aligned to the right, and padded with x characters. After the first 12 byte, a gap exists, and the fourth column is the date format specified for the FBV file
(SIZE=8 PADDING='x' ALIGN=RIGHT, SIZE=4, START=17 SIZE=8, SIZE=32 FORMAT='DD-MM-YYYY')
This clause defines how many invalid rows of the source are allowed. For example, in the case of REJECT LIMIT 5, the statement will work fine if there are less than or equal to five invalid rows and would throw an exception after the sixth row. The exact row which causes the exception is non-deterministic and may vary.
Additionally, you can write the faulty rows into a file (CSV but not FBV) or a local table within Exasol to process or analyze them later.
The optional expression can be specified for identification reasons in case you use the same error table or file multiple times. You could also use the CURRENT_TIMESTAMP for this.
Instead of importing data in to a table by specifying the name of the table, you can specify a list of columns to perform a temporary import. This way, the data imported into an Exasol table is not persistent but returned as a result set. You can use the LIKE clause to specify the output columns to be the same as the columns of an existing table in the Exasol database.
IMPORT INTO table_1 FROM CSV AT 'http://192.168.1.1:8080/' USER 'agent_007' IDENTIFIED BY 'secret' FILE 'tab1_part1.csv' FILE 'tab1_part2.csv' COLUMN SEPARATOR = ';' SKIP = 5; CREATE CONNECTION my_fileserver TO 'ftp://192.168.1.2/' USER 'agent_007' IDENTIFIED BY 'secret'; IMPORT INTO table_2 FROM FBV AT my_fileserver FILE 'tab2_part1.fbv' (SIZE=8 PADDING='+' ALIGN=RIGHT, SIZE=4, SIZE=8, SIZE=32 FORMAT='DD-MM-YYYY' ); CREATE CONNECTION my_oracle TO '(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = 192.168.0.25)(PORT = 1521) ) ) (CONNECT_DATA = (SERVICE_NAME = orautf8)) )'; IMPORT INTO table_3 (col1, col2, col4) FROM ORA AT my_oracle USER 'agent_008' IDENTIFIED BY 'secret' STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ' ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10; IMPORT INTO table_4 FROM JDBC AT 'jdbc:exa:192.168.6.11..14:8563' USER 'agent_008' IDENTIFIED BY 'secret' STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '; IMPORT INTO table_5 FROM CSV AT 'http://HadoopNode:50070/webhdfs/v1/tmp' FILE 'file.csv?op=OPEN&user.name=user'; IMPORT INTO table_6 FROM EXA AT my_exasol TABLE MY_SCHEMA.MY_TABLE; IMPORT INTO table_7 FROM SCRIPT etl.import_hcat_table WITH HCAT_DB = 'default' HCAT_TABLE = 'my_hcat_table' HCAT_ADDRESS = 'hcatalog-server:50111' HDFS_USER = 'hdfs'; IMPORT INTO table_8 FROM LOCAL CSV FILE '~/my_table.csv' COLUMN SEPARATOR = ';' SKIP = 5;
-- getting a result set using IMPORT (which can also be used as a sub-select): SELECT * FROM ( IMPORT INTO (i INT, v VARCHAR(200)) FROM EXA AT my_exasol TABLE MY_SCHEMA.MY_TABLE ); -- result set IMPORT without INTO clause: IMPORT FROM JDBC AT my_jdbc_conn STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '; -- result set IMPORT with INTO and LIKE clause: IMPORT INTO (LIKE CAT) FROM JDBC AT my_exa_conn STATEMENT ' SELECT OBJECT_NAME, OBJECT_TYPE FROM EXA_USER_OBJECTS WHERE OBJECT_TYPE IN (''TABLE'', ''VIEW'') ';
You can use the following examples to import from Amazon S3 bucket:
IMPORT INTO table_1 FROM CSV AT 'https://<bucketname>.s3.amazonaws.com' USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>' FILE 'file.csv';
Or, with a fully qualified bucket URL.
IMPORT INTO table_1 FROM CSV AT 'https://<bucketname>.s3-<region>.amazonaws.com/' USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>' FILE 'file.csv';