IMPORT
Purpose
Use the IMPORT command to transfer data from external data sources into Exasol. You can specify a table name to insert the imported data into that table. Else, the data is returned as result set.
Prerequisites
- In the source system, you need to have privileges to read the table contents or the files.
- In Exasol, you need to have system privileges IMPORT, and INSERT to insert rows into the table.
- When using a connection, you need to have the system privilege USE ANY CONNECTION, or the connection must be granted by the GRANT statement to the user or one of the user roles. For additional information, refer to the CREATE CONNECTION statement.
- When using an error table, you need the appropriate rights for writing or inserting data.
Syntax
import::=
import_columns::=
dbms_src::=
file_src::=
connection_def::=
cloud_connection_def::=
csv_cols::=
fbv_cols::=
file_opts::=
error_clause::=
reject_clause::=
error_dst::=
script_src::=
Usage Notes
- The progress of the data transfer can be viewed using the system table EXA_USER_SESSIONS (column ACTIVITY) through a second connection screen.
- Import statements can also be used within SELECT queries. For more information, refer to SELECT statement in the Query Language (DQL) section.
- In case of an IMPORT from JDBC or CSV sources, decimals are truncated if the target data type has less precision than the source data type.
- Lines starting with # (hash) will be ignored. For additional information about formatting rules for data records, refer to File Format and Details .
- For additional information about ETL processes, refer to the ETL in Exasol section.
The following table provides you with an overview of the different elements and their meaning in the IMPORT command:
Element | Meaning | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
dbms_src |
Defines the database source whose connection data is specified in the connection_def. You can choose among an Exasol connection (EXA), a native connection to an Oracle database (ORA) or a JDBC connection to any database (JDBC). The source data can either be a database table as identifier (for example, MY_SCHEMA.MY_TABLE) or a database statement as a string (for example, 'SELECT * FROM DUAL'). In the second case, the expression is executed on the source database, for example, a SQL query or a procedure call. When using the TABLE syntax (as opposed to STATEMENT), the table name identifier is treated similarly to Exasol tables. If your remote systems expect case-sensitive syntax, you must use quote marks to delimit the table names. To achieve optimal parallelization:
|
||||||||||||||||
file_src |
Specifies the data file source. It can be remote files, or local files. The source files can either be CSV or FBV files and should comply to the format specifications
in the CSV Data Format and the Fixblock Data Format (FBV). File names may
only consist of ASCII characters. A BOM Compressed files are recognized by their file extension. Supported extensions are .zip, .gz (gzip) and .bz2 (bzip2). When System.in is specified as filename, data is read from the standard input stream (System.in). Remote Files: FTP, FTPS, SFTP, HTTP, and HTTPS servers are supported whose connection data is defined through the connection_def. The following are some of the considerations while using remote data file source:
Local Files: You can also import local files from your client system. For importing local files, the JDBC driver opens an internal connection to the cluster and provides an HTTP or HTTPS (SECURE-Option) server. When specifying the SECURE option, the data is transferred encrypted, but also with slower performance. This functionality is only supported for EXAplus and the JDBC driver. It cannot be used in prepared statements or within database scripts. Cloud Storage Service: You can import files from a cloud storage service. See cloud_connection_def. |
||||||||||||||||
script_src |
Specifies the UDF script to be used for a user-defined import. Optionally, you can define a connection or properties which is forwarded to the script. The specified script will internally generate an SQL statement that does the actual import using SELECT. The script implements a special callback function which receives the import specification (for example, parameters and connection information) and returns an SQL statement. For more information, refer to the User-defined IMPORT using UDFs section. connection_def Optional connection definition to encapsulate connection information such as password. For more information, refer to the connection_def. WITH parameter=value... Optional parameters to be passed to the script. Each script can define the mandatory and optional parameters it supports. Parameters are simple key-value pairs, with the value being a string. For example: ... WITH PARAM_1='val1' PARAM_2 = 'val2'; |
||||||||||||||||
connection_def |
Defines the connection to the external database or file server. This can be specified within a connection string (for example, 'ftp://192.168.1.1/') and the corresponding login information. For regular ETL jobs, you can also make use of connections where the connection data like user name and password can easily be encapsulated. For more information, refer to the CREATE CONNECTION section in Access Control Using SQL(DCL). The declaration of user name and password within the IMPORT command are optional. If they are omitted, the data from the connection string or the connection object are used. |
||||||||||||||||
cloud_connection_def |
Defines a connection to a cloud storage service. This can be specified within a connection string and the corresponding authentication information. The connection string format may vary by cloud service. Example:
|
||||||||||||||||
csv_cols |
Defines which columns, and how the columns of the CSV files are interpreted. For more information, refer to the CSV DATA Format section.
Example In this example, the first four columns are the column numbers loaded from the CSV file, and the last column has the date format specified for the CSV file. (1..3,4 FORMAT='DD-MM-YYYY') |
||||||||||||||||
fbv_cols |
Defines which columns, and how the columns of the FBV files are interpreted. For more information, refer to the Fixblock Data Format section. The following elements can be specified in an FBV file:
Example In this example, four columns are imported from the FBV file. The first column is aligned to the right and padded with “x” characters. After the first 12 bytes there is a gap, and the fourth column has the date format specified for the FBV file. |
||||||||||||||||
file_opts |
Boolean values '1/0', 'TRUE/FALSE', 'true/false', 'True/False', 'T/F', 't/f', 'y/n', 'Y/N', 'yes/no', 'Yes/No', 'YES/NO' are automatically accepted when inserted into a boolean column. For more information, see EXPORT. |
||||||||||||||||
error_clause |
This clause defines how many invalid rows of the source are allowed. For example, in the case of REJECT LIMIT 5, the statement will work fine if there are less than or equal to five invalid rows and would throw an exception after the sixth row. The exact row which causes the exception is non-deterministic and may vary. Additionally, you can write the faulty rows into a file (CSV but not FBV) or a local table within Exasol to process or analyze them later.
The optional expression can be specified for identification reasons in case you use the same error table or file multiple times. You could also use the CURRENT_TIMESTAMP for this. Constraint violation errors will throw an exception even if the REJECT LIMIT has not been reached. |
||||||||||||||||
import_columns |
Instead of importing data in to a table by specifying the name of the table, you can specify a list of columns to perform a temporary import. This way, the data imported into Exasol is not persistent but returned as a result set. You can use the LIKE clause to specify the output columns to be the same as the columns of an existing table in the Exasol database. |
Examples
IMPORT INTO table_1 FROM CSV
AT 'http://192.168.1.1:8080/' USER 'agent_007' IDENTIFIED BY 'secret'
FILE 'tab1_part1.csv' FILE 'tab1_part2.csv'
COLUMN SEPARATOR = ';'
SKIP = 5;
CREATE CONNECTION my_fileserver
TO 'ftp://192.168.1.2/' USER 'agent_007' IDENTIFIED BY 'secret';
IMPORT INTO table_2 FROM FBV
AT my_fileserver
FILE 'tab2_part1.fbv'
(SIZE=8 PADDING='+' ALIGN=RIGHT,
SIZE=4,
SIZE=8,
SIZE=32 FORMAT='DD-MM-YYYY' );
CREATE CONNECTION my_oracle
TO '(DESCRIPTION =
(ADDRESS_LIST = (ADDRESS =
(PROTOCOL = TCP)
(HOST = 192.168.0.25)(PORT = 1521)
)
)
(CONNECT_DATA = (SERVICE_NAME = orautf8))
)';
IMPORT INTO table_3 (col1, col2, col4) FROM ORA
AT my_oracle
USER 'agent_008' IDENTIFIED BY 'secret'
STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' '
ERRORS INTO error_table (CURRENT_TIMESTAMP) REJECT LIMIT 10;
IMPORT INTO table_4 FROM JDBC DRIVER='MSSQL'
AT 'jdbc:sqlserver://dbserver;databaseName=testdb'
USER 'agent_008' IDENTIFIED BY 'secret'
STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
IMPORT INTO table_5 FROM CSV
AT 'http://HadoopNode:50070/webhdfs/v1/tmp'
FILE 'file.csv?op=OPEN&user.name=user';
IMPORT INTO table_7 FROM SCRIPT etl.import_hcat_table
WITH HCAT_DB = 'default'
HCAT_TABLE = 'my_hcat_table'
HCAT_ADDRESS = 'hcatalog-server:50111'
HDFS_USER = 'hdfs';
-- getting a result set using IMPORT (which can also be used as a sub-select):
SELECT * FROM (
IMPORT INTO (i INT, v VARCHAR(200)) FROM EXA
AT my_exasol
TABLE MY_SCHEMA.MY_TABLE
);
-- result set IMPORT without INTO clause:
IMPORT FROM JDBC
AT my_jdbc_conn
STATEMENT ' SELECT * FROM orders WHERE order_state=''OK'' ';
-- result set IMPORT with INTO and LIKE clause:
IMPORT INTO (LIKE CAT) FROM JDBC
AT my_exa_conn
STATEMENT ' SELECT OBJECT_NAME, OBJECT_TYPE FROM EXA_USER_OBJECTS WHERE OBJECT_TYPE IN (''TABLE'', ''VIEW'') ';
Import from an Amazon S3 bucket:
IMPORT INTO table_1 FROM CSV
AT 'https://<bucketname>.s3.amazonaws.com'
USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
FILE 'file.csv';
Import from an Amazon S3 bucket using a fully qualified bucket URL:
An AWS bucket URL using the legacy global endpoint format (<bucketname>.s3.amazonaws.com) may need up to 24 hours after bucket creation before it becomes available. A fully qualified Amazon S3 URL that includes the AWS region (<bucketname>.s3-<region>.amazonaws.com) will work immediately.
IMPORT INTO table_1 FROM CSV
AT 'https://<bucketname>.s3-<region>.amazonaws.com/'
USER '<AccessKeyID>' IDENTIFIED BY '<SecretAccessKey>'
FILE 'file.csv';
Import from Azure Blob Storage: