Sizing Considerations

The purpose of hardware sizing is to establish what kind of hardware configuration you need to consider, based on your requirements and factors in your environment. The output of the sizing estimate is the following:

  • Total disk space required, which is the sum of required database disk size and required backup disk size
  • Database RAM

Factors that Impact Sizing

There are several factors that impact your sizing calculation for Exasol. These are the most common ones:

  • The expected volume of your raw data: The volume of your uncompressed data has the largest impact on the sizing estimation. The larger the data volume, the more storage space is required.
  • Performance: Performance depends mainly on the amount of RAM provided to the database. You may use the most common value (10% of uncompressed data) or go for more precise values by calculating the amount of active data or using the values of Exasol system tables.
  • Cluster redundancy: Your planned redundancy for the cluster impacts the required storage space. If you are planning to have a redundancy of '2' - in other words, the same data is stored in two segments - required storage space is doubled.
  • Number of reserve nodes: Reserve nodes are used in the case of a node failure. Reserve nodes need to have the same hardware configuration as the active nodes in a cluster.
  • Backup strategy: Certain backup strategies require more storage space. If you plan on storing backups in-cluster with a redundancy of '2', then this must be taken into account in the sizing estimation.
  • Business continuity strategy : This is an additional factor for sizing your environment. The architecture of your business continuity solution influences sizing. For example, if you plan on having a reserve cluster in a separate data center, you will need to double your hardware costs. For more information, see Business Continuity.
  • Operating System Memory Swap: Exasol recommends setting 10% of total RAM as OS memory swap. If you specify that, you need to consider the total available physical RAM for your cluster and how much of it is going to be usable for the database after allocating 10% to OS memory swap.

Database Disk Size

The estimation for required database disk space sums up the following:

  • Compressed data volume: The volume of data once it has been compressed.
  • Index volume: Indexes are automatically created and maintained by Exasol databases, and require database disk space.
  • Statistical and auditing data volume: Statistical data volume is small. However, if you switch auditing on in the system, the required disk space increases because each login and each query is stored in the corresponding auditing tables.
  • Reserve space for temporary data and fragmentation: When intermediate results do not fit into the database RAM, they are swapped out to a temporary volume. Additionally, the persistent volume can become fragmented to some degree. It is advisable to calculate reserve disk space for these potential issues to avoid problems with insufficient disk space.
  • Reserve space for temporary data: When intermediate results don't fit into the database RAM, they are swapped out to a temporary volume which causes significant performance deterioration. Therefore it is advisable, to reserve extra headroom for TEMP DB RAM.

Database Disk Space Calculation

[(compressed data volume + index volume + statistics & auditing volume) * redundancy] + fragmentation & temp headroom volume

Example

With the following input data:

  • Compressed data (net): 1000 GiB
  • Indexes size (15% of compressed data): 150 GiB
  • Statistical & auditing data size (5% of compressed data): 50 GiB
  • Redundancy: 2
  • Headroom for temp and fragmentation (60% of compressed data without redundancy): 720 GiB

We get the following numbers:

Compressed data (net) Overall data volume (net) Overall data volume with redundancy Total DB disk space

1000 GiB

 

1200 GiB

1000 GiB(compressed data) + 200 GiB(indexes + statistical & auditing data)

2400 GiB

1200 GiB x 2

3200 GiB

2400 GiB (overall data volume with redundancy) + 720 GiB (60% of overall data volume)

Backup Disk Size

The equations to calculate backup disk space required differ slightly based on whether you are storing redundant data cluster-internally or not.

Backup Disk Space Calculation

Without internal backup on the cluster:

(full backup size * (no. of full backups + 1)) + (incremental backup size * no. of incremental backups)

With internal backup on the cluster:

[(full backup size * (no. of full backups + 1)) + (incremental backup size * no. of incremental backups)] * 2

When a new backup is created, the old one is not removed yet. For some time, there is an additional backup, which requires space (number of full backups+1).

Example

With the following input data:

  • Full backup count: 2
  • Incremental backup count: 3
  • Maximum incremental backup size: 100% of full backup
  • Cluster internal backup: Yes
  • Backup redundancy: 2

The final backup space calculation includes headroom of one extra backup to avoid incidents during backup creation.

Overall data volume (net) Full backup data size Incremental backup data size Backup space (without redundancy) Backup disk space required with redundancy

1200 GiB

1200 GiB (compressed data + indexes/statistical & auditing data)

2400 GiB

1200 GiB x 2

3600 GiB

1200 GiB x 3

6000 GiB

Full backup data size + Incremental backup data size

12000 GiB

Backup space (without redundancy) x 2 (this is calculated because the backups in this example are cluster-internal)

Database RAM

An Exasol database typically performs well with database RAM of 10% of the raw (uncompressed) data volume. In addition to this basic assumption, there are additional variables that affect the estimation of database RAM required:

  • Index volume: Higher index volumes can negatively impact system performance, and require more database RAM.
  • Reserve space for temporary data: When intermediate results don't fit into the database RAM, they are swapped out to a temporary volume which causes significant performance deterioration. Therefore it is advisable, to reserve extra headroom for TEMP DB RAM.
  • User defined functions (UDFs):When processing large amounts of data using user-defined functions, the RAM required for those UDFs needs to be available on every node. The UDFs are executed in parallel, and there are up to number-of-cores many instances of a UDF per node. Therefore, you have to consider the total amount of RAM that the UDF instances need for processing the queries. For example, if a query uses 500MiB per UDF instance on a 72 core machine in an 8-node cluster, this requires an additional 282GiB extra DBRAM.

Database RAM Calculation

MAX( Compressed Data Volume * Database RAM Estimation %, Index Size * Index Scale Factor ) + Compressed Data * Temp DB RAM headroom % + MAX_UDF_RAM * number-of-cores * number-of-nodes

If you have a running system, you can use RECOMMENDED_DB_RAM_SIZE* columns of the EXA_DB_SIZE* statistical system tables to get a recommended database RAM size.

Example

With the following input data:

  • Index scale factor: 1.3
  • Temp DB RAM headroom: 0.0

MAX(1000GiB*20%, 150GiB*1.3) + 1000GiB*0% = 200GiB

Compressed Data (net) Overall data volume DB RAM Estimation

1000 GiB

1200 GiB

250 GiB