CSV and FBV file formats and details

Learn about data formatting rules when loading data from CSV and FBV files.

You can use the IMPORT command to load files from your local file system or from a remote file storage service. To improve performance when loading from remote storage, you can split the files into several parts to allow Exasol to read/write the data in parallel.

This article describes the data formatting rules for comma-separated value (CSV) and fixed block value (FBV) files.

To learn about data formatting and options when importing Parquet files, see Importing Parquet files.

EXAjload

The native bulk loader in Exasol is normally the most efficient way to load data. If integration with an external bulk loading tool is desired you can use the Java command-line tool EXAjload, which is available as a part of the JDBC driver package.

You can use EXAjload to start a loading job using a script without opening a JDBC connection. To learn more, use ./exajload -help (Linux) or java -jar ./exajload.jar -help (Windows) after installing the JDBC driver.

For additional information and examples, see also EXAjload: advanced techniques.

CSV data format

Formatting rules for data records

Comments A data record which has a "#" (hash) as its first character is ignored. Comments, header, and footer can be added this way.
Separation of the fields

The fields within a data record are separated with the field separator.

John, Doe,120 Any St., Anytown
Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b

Formatting rules within a field for data presentation

Spacing characters

Spacing character at the beginning and/or the end of a field can optionally be trimmed by specifying the TRIM option in the IMPORT statement. In case of the option RTRIM, the data

John , Doe ,120 Any St.

is presented as

John, Doe,120 Any St.
NULL value

NULL is represented as an empty data field. Therefore,

John,,Any St.

is interpreted as

John,NULL,Any St.

Note: Empty strings are interpreted as NULL values in the database.

Row separation

The data records are separated from one another with the row separator. Each row is equal to one data record.

John, Doe,120 any str., Anytown, 12a 
John, Doe,120 any str., Anytown, 12b
Numbers

Numbers can be used in floating point or exponential notation. Optionally you can use a format (for more information, see Numeric format models ). You can also consider the settings for the session parameter NLS_NUMERIC_CHARACTERS.

For example,

John, Doe,120 Any St.,123,12.3,-1.23,-1.21E+10,4.1E-12
Timestamp/Date If no explicit format (see Date/time format models ) is specified, then the current default formats are used that are defined by the session parameters NLS_DATE_FORMAT and NLS_TIMESTAMP_FORMAT. You can also consider the session parameter NLS_DATE_LANGUAGE for certain format elements.
Boolean value Boolean values are TRUE or FALSE. Those values can be represented by different values. For example, 0 for FALSE or T for TRUE (For more information, see Boolean data type).
Special characters

If the field separator, field delimiter or row separator occurs within the data, the affected field must be enclosed in the field delimiter. To contain the field delimiter itself in the data, write the delimiter twice consecutively.

Example (field separator: comma, field delimiter: double quotes):

Conference room 1,"John, ""please"" call me back!",""

The above is read as a data record with three fields:

Field #1: Conference room 1

Field #2: John, "please" call me back!

Field #3: Empty string which corresponds to a NULL value within the database.

The same example without using the field delimiter will result in an error:

Conference room 1,John, "please" call me back!,""

FBV data format

Exasol supports text files with a fixed byte structure, also known as fixed block value (FBV). This format uses a fixed width for the columns. A delimiter between two values is therefore not required.

Column width A fixed number of bytes is used per column. If the size of the content is smaller than the column width, the content is supplemented with the specified padding characters.
Column alignment

With left alignment the content is at the beginning of the column and the padding characters follow.

With right alignment the content is at the end of the column and is preceded by the padding characters.

Row separation The linefeed character optionally follows at the end of a data record.
NULL values To realize NULL values , a column is written across the entire length with padding characters.
Numbers Numbers are stored in a form readable to humans, whereby you can import floating point and scientific notation numbers. However, the column width must be maintained and padding characters used as necessary.
Explicit formats Optionally, you can specify a format for numbers or datetime values. For more information, see Numeric format models and Date/time format models