File Format and Details

For file processing, the Comma separated Value (CSV) and the FBV (Fix Block Value) file formats are supported which are described later on. For the best performance, you can specify an HTTP or FTP server, and even split the files into several parts to allow Exasol to read/write the data in parallel.

With the IMPORT command you can load CSV / FBV files directly from your local file system or from via FTP(s), SFTP and HTTP(s) servers or from S3 or Hadoop.

For the scenario where integration with an external bulk loading tool is required, you can use the exajload tool that is available as a part of the JDBC driver package. exajload is a bulk loading tool for local files. You can use it if you want to start a loading job by scripts without opening a JDBC connection. To know about the available options, run command ./exajload -help.

CSV Data Format

Exasol supported CSV format is described below.

Formatting rules for data records

Comments A data record which has a "#" (sharp) as its first character is ignored. Comments, header, and footer can be used in this manner.
Separation of the fields

The fields within a data record are separated with the field separator.

John, Doe,120 Any St., Anytown
Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b

Formatting rules within a field for data presentation

Spacing characters

Spacing character at the beginning and/or the end of a field can optionally be trimmed by specifying the TRIM option in the IMPORT statement. In case of the option RTRIM, the data

John , Doe ,120 Any St.

is presented as

John, Doe,120 Any St.
NULL value

NULL is represented as an empty data field. Therefore,

John,,Any St.

is interpreted as

John,NULL,Any St.

Note: Empty strings are interpreted as NULL values in the database.

Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b

Numbers can be used in floating point or exponential notation. Optionally you can use a format (for more information, see Numeric Format Models ). You can also consider the settings for the session parameter NLS_NUMERIC_CHARACTERS.

For example,

John, Doe,120 Any St.,123,12.3,-1.23,-1.21E+10,4.1E-12
Timestamp/Date If no explicit format (see Date/Time Format Models ) is specified, then the current default formats are used that are defined by the session parameters NLS_DATE_FORMAT and NLS_TIMESTAMP_FORMAT. You can also consider the session parameter NLS_DATE_LANGUAGE for certain format elements.
Boolean value Boolean values are TRUE or FALSE. Those values can be represented by different values. For example, 0 for FALSE or T for TRUE (For more information, see Boolean Data Type).
Special characters

If the field separator, field delimiter or row separator occurs within the data, the affected field must be enclosed in the field delimiter. To contain the field delimiter itself in the data, write the delimiter twice consecutively.

Example (field separator: comma, field delimiter: double quotes):

Conference room 1,"John, ""please"" call me back!",""

The above is read as a data record with three fields:

Field #1: Conference room 1

Field #2: John, "please" call me back!

Field #3: Empty string which corresponds to a NULL value within the database.

The same example without using the field delimiter will result in an error:

Conference room 1,John, "please" call me back!,""

Fixblock Data format (FBV)

Exasol supports text files with a fixed byte structure also known as Fix Block Value (FBV). This format has fixed width for the columns and it's easier to parse. A delimiter between two values is not required.

Column width A fixed number of bytes is used per column. If the size of the content is smaller than the column width, the content is supplemented with the specified padding characters.
Column alignment

With left alignment the content is at the beginning of the column and the padding characters follow.

With right alignment the content is at the end of the column and is preceded by the padding characters.

Row separation The linefeed character optionally follows at the end of a data record.
NULL values To realize NULL values , a column is written across the entire length with padding characters.
Numbers Numbers are stored in a form readable to humans, whereby you can import floating point and scientific notation numbers. However, the column width must be maintained and padding characters used as necessary.
Explicit formats Optionally, you can specify a format for numbers or datetime values. For more information, see Numeric Format Models and Date/Time Format Models

Supported Encodings for ETL Processes

In this section supported encoding for ETL processes through IMPORT and EXPORT are listed.

Encoding Aliases


US-ASCII, US, ISO-IR-6, ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.IRV:1991, ISO646- US, IBM367, IBM-367, CP367, CP-367, 367


ISO8859-1, ISO88591, LATIN-1, LATIN1, L1, ISO-IR-100, ISO_8859-1:1987, ISO_8859- 1, IBM819, IBM-819, CP819, CP-819, 819


ISO8859-2, ISO88592, LATIN-2, LATIN2, L2, ISO-IR-101, ISO_8859-2:1987, ISO_8859- 2


ISO8859-3, ISO88593, LATIN-3, LATIN3, L3, ISO-IR-109, ISO_8859-3:1988, ISO_8859- 3


ISO8859-4, ISO88594, LATIN-4, LATIN4, L4, ISO-IR-110, ISO_8859-4:1988, ISO_8859- 4


ISO8859-5, ISO88595, CYRILLIC, ISO-IR-144, ISO_8859-5:1988, ISO_8859-5


ISO8859-6, ISO88596, ARABIC, ISO-IR-127, ISO_8859-6:1987, ISO_8859-6, ECMA- 114, ASMO-708


ISO8859-7, ISO88597, GREEK, GREEK8, ISO-IR-126, ISO_8859-7:1987, ISO_8859-7, ELOT_928, ECMA-118


ISO8859-8, ISO88598, HEBREW, ISO-IR-138, ISO_8859-8:1988, ISO_8859-8


ISO8859-9, ISO88599, LATIN-5, LATIN5, L5, ISO-IR-148, ISO_8859-9:1989, ISO_8859- 9


ISO8859-11, ISO885911


ISO8859-13, ISO885913, LATIN-7, LATIN7, L7, ISO-IR-179


ISO8859-15, ISO885915, LATIN-9, LATIN9, L9


IBM-850, CP850, CP-850, 850


IBM-852, CP852, CP-852, 852


IBM-855, CP855, CP-855, 855


IBM-856, CP856, CP-856, 856


IBM-857, CP857, CP-857, 857


IBM-860, CP860, CP-860, 860


IBM-861, CP861, CP-861, 861, CP-IS


IBM-862, CP862, CP-862, 862


IBM-863, CP863, CP-863, 863


IBM-864, CP864, CP-864, 864


IBM-865, CP865, CP-865, 865


IBM-866, CP866, CP-866, 866


IBM-868, CP868, CP-868, 868, CP-AR


IBM-869, CP869, CP-869, 869, CP-GR


CP1250, CP-1250, 1250, MS-EE


CP1251, CP-1251, 1251, MS-CYRL


CP1252, CP-1252, 1252, MS-ANSI


CP1253, CP-1253, 1253, MS-GREEK


CP1254, CP-1254, 1254, MS-TURK


CP1255, CP-1255, 1255, MS-HEBR


CP1256, CP-1256, 1256, MS-ARAB


CP1257, CP-1257, 1257, WINBALTRIM


CP1258, CP-1258, 1258


CP874, CP-874, 874, IBM874, IBM-874


WINDOWS-932, CP932, CP-932, 932


CP936, CP-936, 936, GBK, MS936, MS-936


WINDOWS-949, CP-949, 949


WINDOWS-950, CP950, CP-950, 950, BIG, BIG5, BIG-5, BIG-FIVE, BIGFIVE, CN- BIG5, BIG5-CP950




UTF-8, ISO10646/UTF8