File Format and Details

For file processing, the Comma separated Value (CSV) and the FBV (Fix Block Value) file formats are supported which are described later on. For the best performance, you can specify an HTTP or FTP server, and even split the files into several parts to allow Exasol to read/write the data in parallel.

With the IMPORT command you can load CSV / FBV files directly from your local file system or from via FTP(s), SFTP and HTTP(s) servers or from S3 or Hadoop.

For the scenario where integration with an external bulk loading tool is required, you can use the exajload tool that is available as a part of the JDBC driver package. exajload is a bulk loading tool for local files. You can use it if you want to start a loading job by scripts without opening a JDBC connection. To know about the available options, run command ./exajload -help.

CSV Data Format

Exasol supported CSV format is described below.

Formatting rules for data records

Comments A data record which has a "#" (sharp) as its first character is ignored. Comments, header, and footer can be used in this manner.
Separation of the fields

The fields within a data record are separated with the field separator.

John, Doe,120 Any St., Anytown
Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b

Formatting rules within a field for data presentation

Spacing characters

Spacing character at the beginning and/or the end of a field can optionally be trimmed by specifying the TRIM option in the IMPORT statement. In case of the option RTRIM, the data

John , Doe ,120 Any St.

is presented as

John, Doe,120 Any St.
NULL value

NULL is represented as an empty data field. Therefore,

John,,Any St.

is interpreted as

John,NULL,Any St.

Note: Empty strings are interpreted as NULL values in the database.

Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b
Numbers

Numbers can be used in floating point or exponential notation. Optionally you can use a format (for more information, see Numeric Format Models ). You can also consider the settings for the session parameter NLS_NUMERIC_CHARACTERS.

For example,

John, Doe,120 Any St.,123,12.3,-1.23,-1.21E+10,4.1E-12
Timestamp/Date If no explicit format (see Date/Time Format Models ) is specified, then the current default formats are used that are defined by the session parameters NLS_DATE_FORMAT and NLS_TIMESTAMP_FORMAT. You can also consider the session parameter NLS_DATE_LANGUAGE for certain format elements.
Boolean value Boolean values are TRUE or FALSE. Those values can be represented by different values. For example, 0 for FALSE or T for TRUE (For more information, see Boolean Data Type).
Special characters

If the field separator, field delimiter or row separator occurs within the data, the affected field must be enclosed in the field delimiter. To contain the field delimiter itself in the data, write the delimiter twice consecutively.

Example (field separator: comma, field delimiter: double quotes):

Conference room 1,"John, ""please"" call me back!",""

The above is read as a data record with three fields:

Field #1: Conference room 1

Field #2: John, "please" call me back!

Field #3: Empty string which corresponds to a NULL value within the database.

The same example without using the field delimiter will result in an error:

Conference room 1,John, "please" call me back!,""

Fixblock Data format (FBV)

Exasol supports text files with a fixed byte structure also known as Fix Block Value (FBV). This format has fixed width for the columns and it's easier to parse. A delimiter between two values is not required.

Column width A fixed number of bytes is used per column. If the size of the content is smaller than the column width, the content is supplemented with the specified padding characters.
Column alignment

With left alignment the content is at the beginning of the column and the padding characters follow.

With right alignment the content is at the end of the column and is preceded by the padding characters.

Row separation The linefeed character optionally follows at the end of a data record.
NULL values To realize NULL values , a column is written across the entire length with padding characters.
Numbers Numbers are stored in a form readable to humans, whereby you can import floating point and scientific notation numbers. However, the column width must be maintained and padding characters used as necessary.
Explicit formats Optionally, you can specify a format for numbers or datetime values. For more information, see Numeric Format Models and Date/Time Format Models

Supported Encodings for ETL Processes

In this section supported encoding for ETL processes through IMPORT and EXPORT are listed.

Encoding Aliases

ASCII

US-ASCII, US, ISO-IR-6, ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.IRV:1991, ISO646- US, IBM367, IBM-367, CP367, CP-367, 367

ISO-8859-1

ISO8859-1, ISO88591, LATIN-1, LATIN1, L1, ISO-IR-100, ISO_8859-1:1987, ISO_8859- 1, IBM819, IBM-819, CP819, CP-819, 819

ISO-8859-2

ISO8859-2, ISO88592, LATIN-2, LATIN2, L2, ISO-IR-101, ISO_8859-2:1987, ISO_8859- 2

ISO-8859-3

ISO8859-3, ISO88593, LATIN-3, LATIN3, L3, ISO-IR-109, ISO_8859-3:1988, ISO_8859- 3

ISO-8859-4

ISO8859-4, ISO88594, LATIN-4, LATIN4, L4, ISO-IR-110, ISO_8859-4:1988, ISO_8859- 4

ISO-8859-5

ISO8859-5, ISO88595, CYRILLIC, ISO-IR-144, ISO_8859-5:1988, ISO_8859-5

ISO-8859-6

ISO8859-6, ISO88596, ARABIC, ISO-IR-127, ISO_8859-6:1987, ISO_8859-6, ECMA- 114, ASMO-708

ISO-8859-7

ISO8859-7, ISO88597, GREEK, GREEK8, ISO-IR-126, ISO_8859-7:1987, ISO_8859-7, ELOT_928, ECMA-118

ISO-8859-8

ISO8859-8, ISO88598, HEBREW, ISO-IR-138, ISO_8859-8:1988, ISO_8859-8

ISO-8859-9

ISO8859-9, ISO88599, LATIN-5, LATIN5, L5, ISO-IR-148, ISO_8859-9:1989, ISO_8859- 9

ISO-8859-11

ISO8859-11, ISO885911

ISO-8859-13

ISO8859-13, ISO885913, LATIN-7, LATIN7, L7, ISO-IR-179

ISO-8859-15

ISO8859-15, ISO885915, LATIN-9, LATIN9, L9

IBM850

IBM-850, CP850, CP-850, 850

IBM852

IBM-852, CP852, CP-852, 852

IBM855

IBM-855, CP855, CP-855, 855

IBM856

IBM-856, CP856, CP-856, 856

IBM857

IBM-857, CP857, CP-857, 857

IBM860

IBM-860, CP860, CP-860, 860

IBM861

IBM-861, CP861, CP-861, 861, CP-IS

IBM862

IBM-862, CP862, CP-862, 862

IBM863

IBM-863, CP863, CP-863, 863

IBM864

IBM-864, CP864, CP-864, 864

IBM865

IBM-865, CP865, CP-865, 865

IBM866

IBM-866, CP866, CP-866, 866

IBM868

IBM-868, CP868, CP-868, 868, CP-AR

IBM869

IBM-869, CP869, CP-869, 869, CP-GR

WINDOWS-1250

CP1250, CP-1250, 1250, MS-EE

WINDOWS-1251

CP1251, CP-1251, 1251, MS-CYRL

WINDOWS-1252

CP1252, CP-1252, 1252, MS-ANSI

WINDOWS-1253

CP1253, CP-1253, 1253, MS-GREEK

WINDOWS-1254

CP1254, CP-1254, 1254, MS-TURK

WINDOWS-1255

CP1255, CP-1255, 1255, MS-HEBR

WINDOWS-1256

CP1256, CP-1256, 1256, MS-ARAB

WINDOWS-1257

CP1257, CP-1257, 1257, WINBALTRIM

WINDOWS-1258

CP1258, CP-1258, 1258

WINDOWS-874

CP874, CP-874, 874, IBM874, IBM-874

WINDOWS-31J

WINDOWS-932, CP932, CP-932, 932

WINDOWS-936

CP936, CP-936, 936, GBK, MS936, MS-936

CP949

WINDOWS-949, CP-949, 949

BIG5

WINDOWS-950, CP950, CP-950, 950, BIG, BIG5, BIG-5, BIG-FIVE, BIGFIVE, CN- BIG5, BIG5-CP950

SHIFT-JIS

SJIS

UTF-8

UTF-8, ISO10646/UTF8