Character Encoding

This article describes character encoding support and sorting rules in Exasol.

Supported character sets and encoding

Exasol supports the Unicode and ASCII character sets. Unicode characters are defined using code points, which in Exasol are transformed to binary values using the UTF-8 character encoding standard (UTF = Unicode Transformation Format). The first 128 code points in Unicode are identical to the ASCII character set, which makes ASCII a subset of Unicode.

Exasol does not support SQL collation.

UTF‑8

Binary encoding of the code point value of Unicode characters, using 1 to 4 bytes per character depending on the value of the code point.

The first 128 code points are identical to the ASCII character set.

ASCII

Binary encoding using 1 byte per character, only the lower 7 bits allowed.

Extended ASCII (8 bits) is not supported in Exasol.

Examples:

Character	Decimal value	Unicode code point	UTF-8 binary	ASCII binary (7-bit)
NULL (control character)	0	U+0000	00000000	00000000
SPACE	32	U+0020	00100000	00100000
A	65	U+0041	01000001	01000001
a	97	U+0061	01100001	01100001
DELETE (control character)	127	U+007F	00011111	01111111
Ä	196	U+00C4	11000011 10000100	-
ä	228	U+00E4	11000011 10100100	-

Sorting behavior

When comparing strings explicitly using < > <= >=, or implicitly using ORDER BY, the sort order is based on binary character values. This results in the following sorting behavior:

Sorting is case-sensitive. UPPERCASE characters are sorted before lowercase in ascending order. For example, aardvark will come after Zebra when sorting in ascending order.
Sorting is accent-sensitive. Characters with accents (diacritics) are compared and sorted according to their UTF-8 binary representation. For example, Åhus is sorted after both Axberg and Zoo in ascending order.

Supported encodings for ETL processes

String data types

SET ENCODING in EXAplus

Character Encoding

Supported character sets and encoding

Examples:

Sorting behavior

PRODUCT

RESOURCES