Sizing for Data Types

When you calculate the raw_size of different data types, it is distinguished among CHAR, VARCHAR, and all other (numeric, data, boolean) data types. The categories and the sizes of the data types are described in the following sections.

VARCHAR(x)

This data type is proportional to the number of characters in a string. For example, a text string 12345 has raw_size 5 even if the VARCHAR(100). However, the UTF-8 characters use more than 1 byte per character. Therefore, generic UTF-8 strings of type VARCHAR may use up to 4*<number of characters in a string bytes.

CHAR(x)

This data type always has at least x bytes. Using the above example, a text string 12345 with data type CHAR(100) will have at least 100 bytes of raw_size. It will be more if UTF-8 characters used.

Other Types

The other data types have a fixed size. The following table describes the raw_size for some common data types.

Data Type Raw Size in Bytes
DATE 3
TIMESTAMP 7
DECIMAL(18,0) 8
DECIMAL(36,0) 16
DECIMAL(9,0) 4
BOOLEAN 1

Calculate Mem and Raw Size

For any column of a table, you can use get the mem_size and the raw_size by looking up $EXA_COLUMN_SIZES. The mem_size and the raw_size is always in bytes.

Do the following to calculate the raw_size of a data type:

  1. Create a dummy table containing columns with the required data types.
  2. Insert some rows in the table.
  3. Look up the $EXA_COLUMN_SIZES in the table and divide the column raw_size by the number of rows.