Sizing Considerations

This section explains how to determine the disk space and RAM required for your Exasol database.

The sizing calculations in this section will provide the following output:

  • Required total disk space, which is the sum of the required database disk space and required backup disk space
  • Required database RAM (DB RAM)

Factors that impact sizing

Several factors affect how to determine the disk space and RAM required for an Exasol system. These are the most common ones:

The expected volume of your raw data

The volume of your uncompressed data has the largest impact on the estimation of the required storage disk space. The larger the data volume, the more storage space is required.

Performance

Performance depends mainly on how much RAM is provided for the database (DB RAM). When estimating how much DB RAM you need, you can use the rule-of-thumb of 10 % of the volume of your uncompressed data. For a more precise value you can calculate the amount of active data, or use the actual values in Exasol system tables.

Cluster redundancy

The planned redundancy for the cluster impacts the required storage space. With redundancy 2 – meaning that the same data is stored in two segments – the required storage space is doubled.

Number of reserve nodes

Reserve nodes are used in case of a node failure. A reserve node must have the same hardware configuration as the active nodes in a cluster. For more information, see Fail Safety (On-Prem)

Backup strategy

Certain backup strategies require more storage space. If you plan on storing backups in the cluster (local backups) with redundancy 2, then this must be taken into account when estimating disk size. For more information, see Backup and Restore.

Business continuity strategy

The architecture of your business continuity solution influences sizing. For example, if you plan on having a reserve cluster in a separate data center, you will need to double your hardware costs. For more information, see Business Continuity.

Operating system reserved RAM

Exasol recommends that you reserve 10 % RAM for the operating system on a node. You need to consider how much of the total available physical RAM for your cluster is going to be usable for the database (DB RAM) after allocating RAM as OS memory reserve.

Database disk space

The estimation for required database disk space includes the following:

  • Compressed data volume: The volume of data once it has been compressed.
  • Index volume: Indexes are automatically created and maintained by Exasol databases and require database disk space.
  • Statistical and auditing data volume: Statistical data volume is small. However, if you switch auditing on in the system, the required disk space increases because each login and each query is stored in the corresponding auditing tables.
  • Reserve space for fragmentation: The persistent volume can become fragmented to some degree. It is advisable to reserve additional disk space to avoid problems with insufficient disk space due to fragmentation.
  • Reserve space for temporary data: When intermediate results do not fit into the database RAM they are swapped out to a temporary volume, which causes significant performance deterioration. We therefore recommend that you reserve extra headroom for temporary DB RAM.

Database disk space calculation

To calculate the required disk space for the database, use the following equation:

[(compressed data volume + index volume + statistical and auditing data volume) * redundancy] + fragmentation + temp headroom volume
Example

In this example we have the following input parameters:

  • Compressed data: 1000 GiB
  • Indexes (15 % of compressed data): 150 GiB
  • Statistical and auditing data (5 % of compressed data): 50 GiB
  • Redundancy: 2
  • Headroom for temp and fragmentation (60% of compressed data without redundancy): 720 GiB

This gives us this result:

Compressed data (net) Overall data volume (net) Overall data volume with redundancy Total DB disk space

1000 GiB

 

1200 GiB

1000 GiB (compressed data) + 200 GiB(indexes + statistical & auditing data)

2400 GiB

1200 GiB x 2

3200 GiB

2400 GiB (overall data volume with redundancy) + 720 GiB (60% of overall data volume)

Backup disk space

To calculate the required backup disk space, use the following equations:

Without internal backup on the cluster

(full backup size * (number of full backups + 1)) + (incremental backup size * number of incremental backups)

With internal backup on the cluster

[(full backup size * (number of full backups + 1)) + (incremental backup size * number of incremental backups)] * 2

Creating a new backup does not remove the old backup, which is why there must always be enough space for an extra backup in addition to the total number of stored backups (number of full backups + 1).

Example

In this example we have the following input parameters:

  • Number of full backups: 2
  • Number of incremental backups: 3
  • Maximum incremental backup size: 100 % of full backup
  • Cluster internal backup: Yes
  • Backup redundancy: 2

The results of the backup space calculation – including headroom for an extra backup during backup creation – will then be:

Overall data volume (net) Full backup data size Incremental backup data size Backup space (without redundancy) Backup disk space required with redundancy

1200 GiB

1200 GiB (compressed data + indexes/statistical and auditing data)

2400 GiB

1200 GiB x 2

3600 GiB

1200 GiB x 3

6000 GiB

Full backup data size + Incremental backup data size

12000 GiB

Backup space (without redundancy) x 2 (this is calculated because the backups in this example are cluster-internal)

Because different versions of an object can be accessed my multiple queries run by different users, the backup can be larger than the physical layout of the objects themselves. We recommend that you include additional space to allow for this in the archive volume.

Database RAM (DB RAM)

An Exasol database typically performs well with database RAM of 10 % of the raw (uncompressed) data volume. In addition to this basic assumption, there are additional variables that affect the estimation of the required DB RAM:

  • Index volume: Higher index volumes can negatively impact system performance and require more DB RAM.

  • Reserve space for temporary data: When intermediate results do not fit into the DB RAM they are swapped out to a temporary volume, which causes significant performance deterioration. We therefore recommend that you reserve extra headroom for temporary DB RAM.

  • User defined functions (UDFs): When processing large amounts of data using user-defined functions, the RAM required for those UDFs needs to be available on every node. The UDFs are executed in parallel, which means that there can be as many instances of a UDF per node as there are cores. Therefore, you have to consider the total amount of RAM that the UDF instances need for processing the queries. For example, if a query uses 500 MiB per UDF instance on a 72 core machine in an 8-node cluster, this requires an additional 282 GiB of DB RAM.

    You can also specify how many UDF instances are created within the UDF. For more information, see UDF Instance Limiting.

Database RAM calculation

Use the following equation to calculate the required database RAM:

MAX( compressed data volume * database RAM estimation %, index size * index scale factor ) + compressed data * temporary DB RAM headroom % + MAX_UDF_RAM * number-of-cores * number-of-nodes

If you have a running system, you can use the RECOMMENDED_DB_RAM_SIZE_* columns of the EXA_DB_SIZE_* statistical system tables to get a recommended database RAM size. For more information, see Statistical System Tables.

Example

In this example we have the following input parameters:

  • Index scale factor: 1.3
  • Temporary DB RAM headroom: 0.0

The resulting formula would then be:

MAX(1000GiB * 20%, 150GiB * 1.3) + 1000GiB * 0% = 200GiB

Which gives us this result:

Compressed data (net) Overall data volume Estimated required DB RAM

1000 GiB

1200 GiB

200 GiB