File Format and Details
For file processing, comma-separated value (CSV) and fix block value (FBV) file formats are supported. To improve performance you can specify an FTP/FTPS/HTTP/HTTPS server, and/or split the files into several parts to allow Exasol to read/write the data in parallel.
With the IMPORT command you can load CSV/FBV files directly from your local file system, from FTP/FTPS/HTTP/HTTPS servers, or from S3 or Hadoop.
If integration with an external bulk loading tool is required you can use the exajload tool, which is available as a part of the JDBC driver package. exajload is a bulk loading tool for local files. You can use it if you want to start a loading job by scripts without opening a JDBC connection. To learn more about the available options, run the command ./exajload -help for Linux environments or java -jar ./exajload.jar -help for Windows environments.
For more information about EXAjload including examples of use, see EXAjload: advanced techiniques.
CSV Data Format
Formatting rules for data records
Comments | A data record which has a "#" (hash) as its first character is ignored. Comments, header, and footer can be added this way. |
Separation of the fields |
The fields within a data record are separated with the field separator. John, Doe,120 Any St., Anytown |
Row separation |
The data records are separated from one another with the row separator. Each row is equal to one data record. John, Doe,120 any str., Anytown, 12a John, Doe,120 any str., Anytown, 12b |
Formatting rules within a field for data presentation
Spacing characters |
Spacing character at the beginning and/or the end of a field can optionally be trimmed by specifying the TRIM option in the IMPORT statement. In case of the option RTRIM, the data John , Doe ,120 Any St. is presented as John, Doe,120 Any St. |
NULL value |
NULL is represented as an empty data field. Therefore, John,,Any St. is interpreted as John,NULL,Any St. Note: Empty strings are interpreted as NULL values in the database. |
Row separation |
The data records are separated from one another with the row separator. Each row is equal to one data record. John, Doe,120 any str., Anytown, 12a John, Doe,120 any str., Anytown, 12b |
Numbers |
Numbers can be used in floating point or exponential notation. Optionally you can use a format (for more information, see Numeric Format Models ). You can also consider the settings for the session parameter NLS_NUMERIC_CHARACTERS. For example, John, Doe,120 Any St.,123,12.3,-1.23,-1.21E+10,4.1E-12 |
Timestamp/Date | If no explicit format (see Date/Time Format Models ) is specified, then the current default formats are used that are defined by the session parameters NLS_DATE_FORMAT and NLS_TIMESTAMP_FORMAT. You can also consider the session parameter NLS_DATE_LANGUAGE for certain format elements. |
Boolean value | Boolean values are TRUE or FALSE. Those values can be represented by different values. For example, 0 for FALSE or T for TRUE (For more information, see Boolean data type). |
Special characters |
If the field separator, field delimiter or row separator occurs within the data, the affected field must be enclosed in the field delimiter. To contain the field delimiter itself in the data, write the delimiter twice consecutively. Example (field separator: comma, field delimiter: double quotes): Conference room 1,"John, ""please"" call me back!","" The above is read as a data record with three fields: Field #1: Conference room 1 Field #2: John, "please" call me back! Field #3: Empty string which corresponds to a NULL value within the database. The same example without using the field delimiter will result in an error: Conference room 1,John, "please" call me back!,"" |
Fixblock Data format (FBV)
Exasol supports text files with a fixed byte structure also known as Fix Block Value (FBV). This format has fixed width for the columns and it's easier to parse. A delimiter between two values is not required.
Column width | A fixed number of bytes is used per column. If the size of the content is smaller than the column width, the content is supplemented with the specified padding characters. |
Column alignment |
With left alignment the content is at the beginning of the column and the padding characters follow. With right alignment the content is at the end of the column and is preceded by the padding characters. |
Row separation | The linefeed character optionally follows at the end of a data record. |
NULL values | To realize NULL values , a column is written across the entire length with padding characters. |
Numbers | Numbers are stored in a form readable to humans, whereby you can import floating point and scientific notation numbers. However, the column width must be maintained and padding characters used as necessary. |
Explicit formats | Optionally, you can specify a format for numbers or datetime values. For more information, see Numeric Format Models and Date/Time Format Models |
Supported Encodings for ETL Processes
In this section supported encoding for ETL processes through IMPORT and EXPORT are listed.
Encoding | Aliases |
---|---|
ASCII |
US-ASCII, US, ISO-IR-6, ANSI_X3.4-1968, ANSI_X3.4-1986, ISO_646.IRV:1991, ISO646- US, IBM367, IBM-367, CP367, CP-367, 367 |
ISO-8859-1 |
ISO8859-1, ISO88591, LATIN-1, LATIN1, L1, ISO-IR-100, ISO_8859-1:1987, ISO_8859- 1, IBM819, IBM-819, CP819, CP-819, 819 |
ISO-8859-2 |
ISO8859-2, ISO88592, LATIN-2, LATIN2, L2, ISO-IR-101, ISO_8859-2:1987, ISO_8859- 2 |
ISO-8859-3 |
ISO8859-3, ISO88593, LATIN-3, LATIN3, L3, ISO-IR-109, ISO_8859-3:1988, ISO_8859- 3 |
ISO-8859-4 |
ISO8859-4, ISO88594, LATIN-4, LATIN4, L4, ISO-IR-110, ISO_8859-4:1988, ISO_8859- 4 |
ISO-8859-5 |
ISO8859-5, ISO88595, CYRILLIC, ISO-IR-144, ISO_8859-5:1988, ISO_8859-5 |
ISO-8859-6 |
ISO8859-6, ISO88596, ARABIC, ISO-IR-127, ISO_8859-6:1987, ISO_8859-6, ECMA- 114, ASMO-708 |
ISO-8859-7 |
ISO8859-7, ISO88597, GREEK, GREEK8, ISO-IR-126, ISO_8859-7:1987, ISO_8859-7, ELOT_928, ECMA-118 |
ISO-8859-8 |
ISO8859-8, ISO88598, HEBREW, ISO-IR-138, ISO_8859-8:1988, ISO_8859-8 |
ISO-8859-9 |
ISO8859-9, ISO88599, LATIN-5, LATIN5, L5, ISO-IR-148, ISO_8859-9:1989, ISO_8859- 9 |
ISO-8859-11 |
ISO8859-11, ISO885911 |
ISO-8859-13 |
ISO8859-13, ISO885913, LATIN-7, LATIN7, L7, ISO-IR-179 |
ISO-8859-15 |
ISO8859-15, ISO885915, LATIN-9, LATIN9, L9 |
IBM850 |
IBM-850, CP850, CP-850, 850 |
IBM852 |
IBM-852, CP852, CP-852, 852 |
IBM855 |
IBM-855, CP855, CP-855, 855 |
IBM856 |
IBM-856, CP856, CP-856, 856 |
IBM857 |
IBM-857, CP857, CP-857, 857 |
IBM860 |
IBM-860, CP860, CP-860, 860 |
IBM861 |
IBM-861, CP861, CP-861, 861, CP-IS |
IBM862 |
IBM-862, CP862, CP-862, 862 |
IBM863 |
IBM-863, CP863, CP-863, 863 |
IBM864 |
IBM-864, CP864, CP-864, 864 |
IBM865 |
IBM-865, CP865, CP-865, 865 |
IBM866 |
IBM-866, CP866, CP-866, 866 |
IBM868 |
IBM-868, CP868, CP-868, 868, CP-AR |
IBM869 |
IBM-869, CP869, CP-869, 869, CP-GR |
WINDOWS-1250 |
CP1250, CP-1250, 1250, MS-EE |
WINDOWS-1251 |
CP1251, CP-1251, 1251, MS-CYRL |
WINDOWS-1252 |
CP1252, CP-1252, 1252, MS-ANSI |
WINDOWS-1253 |
CP1253, CP-1253, 1253, MS-GREEK |
WINDOWS-1254 |
CP1254, CP-1254, 1254, MS-TURK |
WINDOWS-1255 |
CP1255, CP-1255, 1255, MS-HEBR |
WINDOWS-1256 |
CP1256, CP-1256, 1256, MS-ARAB |
WINDOWS-1257 |
CP1257, CP-1257, 1257, WINBALTRIM |
WINDOWS-1258 |
CP1258, CP-1258, 1258 |
WINDOWS-874 |
CP874, CP-874, 874, IBM874, IBM-874 |
WINDOWS-31J |
WINDOWS-932, CP932, CP-932, 932 |
WINDOWS-936 |
CP936, CP-936, 936, GBK, MS936, MS-936 |
CP949 |
WINDOWS-949, CP-949, 949 |
BIG5 |
WINDOWS-950, CP950, CP-950, 950, BIG, BIG5, BIG-5, BIG-FIVE, BIGFIVE, CN- BIG5, BIG5-CP950 |
SHIFT-JIS |
SJIS |
UTF-8 |
UTF-8, ISO10646/UTF8 |