File Format and Details
This article describes the file formats that are supported when loading data from files into an Exasol database.
Exasol supports loading data from files in comma-separated value (CSV) and fixed block value (FBV) formats. You can use the IMPORT command to load CSV/FBV files directly from your local file system or from a remote file storage service. To improve performance when loading from a remote storage service, you can split the files into several parts to allow Exasol to read/write the data in parallel.
If integration with an external bulk loading tool is required, you can use the command-line tool exajload, which is available as a part of the JDBC driver package. You can use this tool to start a loading job using a script without opening a JDBC connection. To learn more about the available options in exajload, run the command ./exajload -help
(Linux) or java -jar ./exajload.jar -help
(Windows). For additional information and examples on how to use EXAjload, see EXAjload: advanced techniques.
CSV data format
Formatting rules for data records
Comments | A data record which has a "#" (hash) as its first character is ignored. Comments, header, and footer can be added this way. |
Separation of the fields |
The fields within a data record are separated with the field separator. John, Doe,120 Any St., Anytown |
Row separation |
The data records are separated from one another with the row separator. Each row is equal to one data record. John, Doe,120 any str., Anytown, 12a John, Doe,120 any str., Anytown, 12b |
Formatting rules within a field for data presentation
Spacing characters |
Spacing character at the beginning and/or the end of a field can optionally be trimmed by specifying the TRIM option in the IMPORT statement. In case of the option RTRIM, the data John , Doe ,120 Any St. is presented as John, Doe,120 Any St. |
NULL value |
NULL is represented as an empty data field. Therefore, John,,Any St. is interpreted as John,NULL,Any St. Note: Empty strings are interpreted as NULL values in the database. |
Row separation |
The data records are separated from one another with the row separator. Each row is equal to one data record. John, Doe,120 any str., Anytown, 12a John, Doe,120 any str., Anytown, 12b |
Numbers |
Numbers can be used in floating point or exponential notation. Optionally you can use a format (for more information, see Numeric format models ). You can also consider the settings for the session parameter For example, John, Doe,120 Any St.,123,12.3,-1.23,-1.21E+10,4.1E-12 |
Timestamp/Date | If no explicit format (see Date/time format models ) is specified, then the current default formats are used that are defined by the session parameters NLS_DATE_FORMAT and NLS_TIMESTAMP_FORMAT . You can also consider the session parameter NLS_DATE_LANGUAGE for certain format elements. |
Boolean value | Boolean values are TRUE or FALSE . Those values can be represented by different values. For example, 0 for FALSE or T for TRUE (For more information, see Boolean data type). |
Special characters |
If the field separator, field delimiter or row separator occurs within the data, the affected field must be enclosed in the field delimiter. To contain the field delimiter itself in the data, write the delimiter twice consecutively. Example (field separator: comma, field delimiter: double quotes): Conference room 1,"John, ""please"" call me back!","" The above is read as a data record with three fields: Field #1: Field #2: Field #3: Empty string which corresponds to a NULL value within the database. The same example without using the field delimiter will result in an error: Conference room 1,John, "please" call me back!,"" |
FBV data format
Exasol supports text files with a fixed byte structure, also known as fixed block value (FBV). This format uses a fixed width for the columns. A delimiter between two values is therefore not required.
Column width | A fixed number of bytes is used per column. If the size of the content is smaller than the column width, the content is supplemented with the specified padding characters. |
Column alignment |
With left alignment the content is at the beginning of the column and the padding characters follow. With right alignment the content is at the end of the column and is preceded by the padding characters. |
Row separation | The linefeed character optionally follows at the end of a data record. |
NULL values | To realize NULL values , a column is written across the entire length with padding characters. |
Numbers | Numbers are stored in a form readable to humans, whereby you can import floating point and scientific notation numbers. However, the column width must be maintained and padding characters used as necessary. |
Explicit formats | Optionally, you can specify a format for numbers or datetime values. For more information, see Numeric format models and Date/time format models |
Supported encodings for ETL processes
This table lists the encodings that are supported for ETL processes through IMPORT and EXPORT.
Encoding | Aliases |
---|---|
ASCII |
US-ASCII, US, ISO-IR-6, ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.IRV:1991, ISO646- US, IBM367, IBM-367, CP367, CP-367, 367 |
ISO-8859-1 |
ISO8859-1, ISO88591, LATIN-1, LATIN1, L1, ISO-IR-100, ISO_8859-1:1987, ISO_8859- 1, IBM819, IBM-819, CP819, CP-819, 819 |
ISO-8859-2 |
ISO8859-2, ISO88592, LATIN-2, LATIN2, L2, ISO-IR-101, ISO_8859-2:1987, ISO_8859- 2 |
ISO-8859-3 |
ISO8859-3, ISO88593, LATIN-3, LATIN3, L3, ISO-IR-109, ISO_8859-3:1988, ISO_8859- 3 |
ISO-8859-4 |
ISO8859-4, ISO88594, LATIN-4, LATIN4, L4, ISO-IR-110, ISO_8859-4:1988, ISO_8859- 4 |
ISO-8859-5 |
ISO8859-5, ISO88595, CYRILLIC, ISO-IR-144, ISO_8859-5:1988, ISO_8859-5 |
ISO-8859-6 |
ISO8859-6, ISO88596, ARABIC, ISO-IR-127, ISO_8859-6:1987, ISO_8859-6, ECMA- 114, ASMO-708 |
ISO-8859-7 |
ISO8859-7, ISO88597, GREEK, GREEK8, ISO-IR-126, ISO_8859-7:1987, ISO_8859-7, ELOT_928, ECMA-118 |
ISO-8859-8 |
ISO8859-8, ISO88598, HEBREW, ISO-IR-138, ISO_8859-8:1988, ISO_8859-8 |
ISO-8859-9 |
ISO8859-9, ISO88599, LATIN-5, LATIN5, L5, ISO-IR-148, ISO_8859-9:1989, ISO_8859- 9 |
ISO-8859-11 |
ISO8859-11, ISO885911 |
ISO-8859-13 |
ISO8859-13, ISO885913, LATIN-7, LATIN7, L7, ISO-IR-179 |
ISO-8859-15 |
ISO8859-15, ISO885915, LATIN-9, LATIN9, L9 |
IBM850 |
IBM-850, CP850, CP-850, 850 |
IBM852 |
IBM-852, CP852, CP-852, 852 |
IBM855 |
IBM-855, CP855, CP-855, 855 |
IBM856 |
IBM-856, CP856, CP-856, 856 |
IBM857 |
IBM-857, CP857, CP-857, 857 |
IBM860 |
IBM-860, CP860, CP-860, 860 |
IBM861 |
IBM-861, CP861, CP-861, 861, CP-IS |
IBM862 |
IBM-862, CP862, CP-862, 862 |
IBM863 |
IBM-863, CP863, CP-863, 863 |
IBM864 |
IBM-864, CP864, CP-864, 864 |
IBM865 |
IBM-865, CP865, CP-865, 865 |
IBM866 |
IBM-866, CP866, CP-866, 866 |
IBM868 |
IBM-868, CP868, CP-868, 868, CP-AR |
IBM869 |
IBM-869, CP869, CP-869, 869, CP-GR |
WINDOWS-1250 |
CP1250, CP-1250, 1250, MS-EE |
WINDOWS-1251 |
CP1251, CP-1251, 1251, MS-CYRL |
WINDOWS-1252 |
CP1252, CP-1252, 1252, MS-ANSI |
WINDOWS-1253 |
CP1253, CP-1253, 1253, MS-GREEK |
WINDOWS-1254 |
CP1254, CP-1254, 1254, MS-TURK |
WINDOWS-1255 |
CP1255, CP-1255, 1255, MS-HEBR |
WINDOWS-1256 |
CP1256, CP-1256, 1256, MS-ARAB |
WINDOWS-1257 |
CP1257, CP-1257, 1257, WINBALTRIM |
WINDOWS-1258 |
CP1258, CP-1258, 1258 |
WINDOWS-874 |
CP874, CP-874, 874, IBM874, IBM-874 |
WINDOWS-31J |
WINDOWS-932, CP932, CP-932, 932 |
WINDOWS-936 |
CP936, CP-936, 936, GBK, MS936, MS-936 |
CP949 |
WINDOWS-949, CP-949, 949 |
BIG5 |
WINDOWS-950, CP950, CP-950, 950, BIG, BIG5, BIG-5, BIG-FIVE, BIGFIVE, CN- BIG5, BIG5-CP950 |
SHIFT-JIS |
SJIS |
UTF-8 |
UTF-8, ISO10646/UTF8 |