Hardware sizing considerations for GPU support

This article describes additional considerations regarding sizing of hardware for GPU support.

RAM sizing

When you are planning the hardware sizing for a cluster that will utilize GPU support for UDFs, RAM sizing should be based on the requirements for data science. For example, the nodes must have enough RAM for UDFs beyond the typical requirement for BI query loads. We recommend adding at least 64 to 128 GiB RAM per data node in addition to the normal calculations for a cluster.

For general guidelines on how to calculate RAM, see Sizing Guidelines.

CPU sizing

GPU support allows you to offload some of the computation from the CPU to the GPU. However, the CPU is needed for preprocessing and feeding the data to the GPU. A rule of thumb is to plan for 4 to 8 CPU cores per GPU in your data nodes. If you run additional workloads, you should plan with even more cores.

GPU sizing

GPU sizing depends on the use cases you plan to run. Different GPU models have different properties such as the number of cores, VRAM, memory bandwidth, interconnects between GPUs and CPU, such as PCIe or NVLink, or supported special compute units, such as Tensor Cores. All these properties influence the performance.

Exasol only supports NVIDIA Data Center GPUs.

LLM sizes and GPU requirements

The size of a large language model (LLM) is usually measured in parameters, which can range from hundreds of millions to trillions. The amount of VRAM required to run an LLM efficiently depends on the size of the model and the precision of the computations (for example, FP32, FP16, or INT8).

The following table provides typical values to use as a starting point when calculating VRAM requirements.

LLM size (number of parameters)	Typical VRAM usage	Note
Tiny (100M to 2B)	2 to 4 GiB
Small (2B to 10B)	6 to 16 GiB
Medium (10B to 20B)	16 to 24 GiB
Large (20B to 70B)	24 to 48 GiB	Single or multiple high-end GPUs
Very large (70B to 110B)	≥80 GiB	Single or multiple high-end GPUs
Super large (>110B)	≥160 GiB	Multiple high-end GPUs with ≥80 GiB VRAM each

To run the LLMs efficiently, the actual VRAM requirement will be slightly higher than indicated in the table. This is because additional memory is needed to store intermediate calculation results, optimizer state (if training), and input data.

GPU sizing for classic machine learning

Classic machine learning models and methods such as random forest, linear regression, K-means, or PCA can also benefit from GPUs. For traditional machine learning models, the same requirements as for Tiny LLM models are typically enough. For clustering or dimension reduction methods such as K-means and PCA the required VRAM depends on the data size, since these methods usually need the whole dataset in VRAM to work.

Hardware requirements for specific stages

The hardware requirements for training/fine-tuning AI models differ significantly from those required for inference.

Training/fine-tuning stage requirements

In the training/fine-tuning stage the goal is to adapt the model to specific data for a given task. This stage requires significant computing power, typically relying on high-end GPUs with ample memory and high-speed networking. Training/fine-tuning can be an intensive and time-consuming process, especially when it involves large datasets or deep models.

Inference stage requirements

Inference is the stage where a trained model is used to generate predictions based on new input data. While inference does not require as much computational power as fine-tuning, it demands low latency and high throughput to ensure that predictions are made in real-time or near real-time. Smaller GPUs, like the NVIDIA T4 or A10, are often preferred for inference tasks since they provide a balance between performance and cost. CPUs may also be sufficient for smaller models or when low latency is not a priority.