Character Encoding
This article describes character encoding support and sorting rules in Exasol.
Supported character sets and encoding
Exasol supports the Unicode and ASCII character sets. Unicode characters are defined using code points, which in Exasol are transformed to binary values using the UTF-8 character encoding standard (UTF = Unicode Transformation Format). The first 128 code points in Unicode are identical to the ASCII character set, which makes ASCII a subset of Unicode.
Exasol does not support SQL collation.
UTF‑8 |
Binary encoding of the code point value of Unicode characters, using 1 to 4 bytes per character depending on the value of the code point. The first 128 code points are identical to the ASCII character set. |
ASCII |
Binary encoding using 1 byte per character, only the lower 7 bits allowed. Extended ASCII (8 bits) is not supported in Exasol. |
Examples:
Character | Decimal value | Unicode code point | UTF-8 binary | ASCII binary (7-bit) |
---|---|---|---|---|
NULL (control character) | 0 | U+0000 | 00000000 | 00000000 |
SPACE | 32 | U+0020 | 00100000 | 00100000 |
A | 65 | U+0041 | 01000001 | 01000001 |
a | 97 | U+0061 | 01100001 | 01100001 |
DELETE (control character) | 127 | U+007F | 00011111 | 01111111 |
Ä | 196 | U+00C4 | 11000011 10000100 | - |
ä | 228 | U+00E4 | 11000011 10100100 | - |
Sorting behavior
When comparing strings explicitly using <
>
<=
>=
, or implicitly using ORDER BY
, the sort order is based on binary character values. This results in the following sorting behavior:
-
Sorting is case-sensitive. UPPERCASE characters are sorted before lowercase in ascending order. For example,
aardvark
will come afterZebra
when sorting in ascending order. -
Sorting is accent-sensitive. Characters with accents (diacritics) are compared and sorted according to their UTF-8 binary representation. For example,
Åhus
is sorted after bothAxberg
andZoo
in ascending order.