Sizing Considerations

This section explains how to determine the necessary size of hardware resources (disk space and RAM) for your Exasol database.

The output of the hardware sizing calculations in this section are:

  • Required total disk space, which is the sum of the required database disk space and required backup disk space
  • Required database RAM

To get a sizing estimate based on your data size and computing requirement you can use the Exasol Cloud Deployment Wizard. The wizard helps you see the best possible configuration based on your raw data size.

Factors that Impact Sizing

There are several factors that affect your sizing calculation for Exasol. These are the most common ones:

  • The expected volume of your raw data: The volume of your uncompressed data has the largest impact on the sizing estimation. The larger the data volume, the more storage space is required.
  • Performance: Performance depends mainly on the amount of RAM provided to the database. You may use the most common value (10 % of uncompressed data) or go for more precise values by calculating the amount of active data or using the values of Exasol system tables.
  • Cluster redundancy: Your planned redundancy for the cluster impacts the required storage space. If you are planning to have a redundancy of “2” – the same data is stored in two segments – the required storage space is doubled.
  • Number of reserve nodes: Reserve nodes are used in case of a node failure. Reserve nodes need to have the same hardware configuration as the active nodes in a cluster. For more information, see Fail Safety (Cloud).
  • Backup strategy: Certain backup strategies require more storage space. If you plan on storing backups in-cluster with a redundancy of “2”, then this must be taken into account in the sizing estimation. For more information, see Backup and Restore.
  • Operating system memory swap: Exasol recommends setting 10 % of total RAM as OS memory swap. If you specify that, you need to consider the total available physical RAM for your cluster and how much of it is going to be usable for the database after allocating RAM for OS memory swap.

Database Disk Space

The estimation for required database disk space sums up the following:

  • Compressed data volume: The volume of data once it has been compressed.
  • Index volume: Indexes are automatically created and maintained by Exasol databases and require database disk space.
  • Statistical and auditing data volume: Statistical data volume is small. However, if you switch auditing on in the system, the required disk space increases because each login and each query is stored in the corresponding auditing tables.
  • Reserve space for fragmentation: The persistent volume can become fragmented to some degree. It is advisable to reserve additional disk space to avoid problems with insufficient disk space due to fragmentation.
  • Reserve space for temporary data: When intermediate results do not fit into the database RAM they are swapped out to a temporary volume, which causes significant performance deterioration. We therefore recommend that you reserve extra headroom for TEMP DB RAM.

Database Disk Space Calculation

Use the following equation to calculate the required database disk space:

[(compressed data volume + index volume + statistical and auditing data volume) * redundancy] + fragmentation + temp headroom volume

Example

In this example we have the following input data:

  • Compressed data (net): 1000 GiB
  • Indexes (15 % of compressed data): 150 GiB
  • Statistical and auditing data (5 % of compressed data): 50 GiB
  • Redundancy: 2
  • Headroom for temp and fragmentation (60% of compressed data without redundancy): 720 GiB

Which gives us this result:

Compressed data (net) Overall data volume (net) Overall data volume with redundancy Total DB disk space

1000 GiB

 

1200 GiB

1000 GiB (compressed data) + 200 GiB(indexes + statistical & auditing data)

2400 GiB

1200 GiB x 2

3200 GiB

2400 GiB (overall data volume with redundancy) + 720 GiB (60% of overall data volume)

Database RAM

An Exasol database typically performs well with database RAM of 10 % of the raw (uncompressed) data volume. In addition to this basic assumption, there are additional variables that affect the estimation of database RAM required:

  • Index volume: Higher index volumes can negatively impact system performance and require more database RAM.

  • Reserve space for temporary data: When intermediate results do not fit into the database RAM they are swapped out to a temporary volume, causes significant performance deterioration. We therefore recommend that you reserve extra headroom for TEMP DB RAM.

  • User defined functions (UDFs): When processing large amounts of data using user-defined functions, the RAM required for those UDFs needs to be available on every node. The UDFs are executed in parallel, which means that there can be as many instances of a UDF per node as there are cores. Therefore, you have to consider the total amount of RAM that the UDF instances need for processing the queries. For example, if a query uses 500 MiB per UDF instance on a 72 core machine in an 8-node cluster, this requires an additional 282 GiB of DBRAM.

    You can also specify how many UDF instances are created within the UDF. For more information, see UDF Instance Limiting.

Database RAM Calculation

Use the following equation to calculate the required database RAM:

MAX( compressed data volume * database RAM estimation %, index size * index scale factor ) + compressed data * temp DB RAM headroom % + MAX_UDF_RAM * number-of-cores * number-of-nodes

If you have a running system, you can use the RECOMMENDED_DB_RAM_SIZE_* columns of the EXA_DB_SIZE_* statistical system tables to get a recommended database RAM size. For more information, see Statistical System Tables.

Example

In this example we have the following input data:

  • Index scale factor: 1.3
  • Temp DB RAM headroom: 0.0

The resulting formula would then be:

MAX(1000GiB * 20%, 150GiB * 1.3) + 1000GiB * 0% = 200GiB

Which gives us this result:

Compressed Data (net) Overall data volume DB RAM Estimation

1000 GiB

1200 GiB

250 GiB

Advanced Settings

The Exasol Cloud Deployment Wizard provides an advanced sizing estimate based on input parameters. You can provide raw data size and select other available options to see the best possible configuration and the cost. To know about how to use the advance settings of the Exasol Cloud Deployment Tool, see Advanced Mode.

The input parameters are used to provide you with suggested hardware configuration options. Following sections describe the logic behind the recommendations provided by the Exasol Cloud Deployment Tool.

These tables are only used to understand the configuration that Exasol Cloud Deployment Wizard provides based on your input. These values cannot be configured in Exasol Cloud Deployment Wizard.

DB Disk Space

Factor

Description

Default Value in Cloud Deployment Wizard

Index Size

Expected index size as a percentage of compressed data volume.

Indexes are automatically created and maintained by Exasol database. The index size depends on the chosen data model and queries, and can range from 2% to over 100% of compressed data. A typical Exasol system will have an index volume of about 15% of compressed data.

If you are already running an Exasol solution, you can get the size of the indexes by referring to the AUXILIARY_SIZE table.

15% of compressed data volume

Auditing Data

Expected size of statistical and auditing data as a percentage of compressed data volume.

Statistical data itself is small. However, if you switch on auditing in the system, every login and query is stored in the auditing tables. In this case, you will need to use a higher estimated value for auditing data than the default.

If you have auditing turned on but do not want to provide extra disk space, you can consider archiving historical auditing data offline and truncating the auditing tables on a regular basis.

If you are already running an Exasol solution, you can get the size of the statistical data by referring to the STATISTICS_SIZE table.

5% of compressed data volume

Reserve for TEMP & Fragmentation

Expected disk size required as a reserve for temporary data and fragmentation as a percentage of overall data volume.

When intermediate results of queries do not fit into the allocated database RAM, they are swapped out to a temporary volume. In addition, the persistent data volume can become fragmented. To cater for the potential impact of these issues on database disk space, the Hardware Configuration Wizard adds an additional 60% of the compressed data (no redundancy) for disk space by default. 

60% of the overall data volume (including indexes and statistical data)

Redundancy

The data is usually stored redundantly in the cluster to ensure there is a full set of data available after a node outage. A redundancy of "2" means that there are 2 instances of each committed block in the cluster. Please note that redundant copies are only used in the case of server failures. 

A redundancy of "2" means that the data volume you require is doubled.

Redundancy of 2

DB RAM

Variable

Description

Index Scale Factor

The scale factor to cater for headroom for index volume.

To ensure that index maintenance does not impact system performance, extra database RAM should be considered for processing indexes. The index scale factor is multiplied by the estimated index volume and represents the minimum database RAM required. The index factor is especially important if you expect to have a high index volume.

Temporary Database RAM Headroom

Additional database RAM for processing temporary data, as a percentage of compressed data volume.

Set a value for this if your system will create large volumes of intermediate results.