In-database models (UDFs)

Learn about how you can run machine learning models inside the database using UDFs.

Exasol lets you run machine learning models directly inside the database using user defined functions (UDFs). Instead of extracting data to an external service for inference, you deploy your model into Exasol and call it from SQL. The database engine executes the model code in parallel across cluster nodes, keeping data local and avoiding the overhead of network round-trips.

This approach works well for batch inference over large tables, NLP tasks on text columns, and iterative algorithms that benefit from proximity to the data.

How Python UDFs execute

A UDF is a function you define in Python (or Lua, Java, or R) that the database calls during query processing. When you run a query that invokes a Python UDF, Exasol spawns one or more virtual machines across cluster nodes. Each VM runs a script language container (SLC), which is a Linux container bundled with a Python runtime and preinstalled libraries such as pandas and scikit-learn.

Your UDF code must implement a run() function. Exasol calls this function and passes a context object that provides access to the input data. The UDF reads input values from the context, performs its computation, and either returns a result or emits output rows.

Copy
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT my_schema.predict(input_value DOUBLE)
RETURNS DOUBLE AS

def run(ctx):
    # Load model, run prediction
    return ctx.input_value * 2.0
/

If you need additional Python packages beyond what the default SLC provides, you create a custom SLC and add them. The SLC is then uploaded to BucketFS, where all database nodes can access it. For GPU-accelerated workloads that require CUDA libraries in the SLC, see GPU acceleration for UDFs.

How models are stored in BucketFS and loaded at runtime

BucketFS is a synchronous file system available on every node in an Exasol cluster. When you upload a file to BucketFS, it is automatically replicated so that UDFs running on any node can read it.

For ML workloads, you store trained model artifacts (serialized PyTorch models, scikit-learn pickles, Hugging Face model directories) in BucketFS. At runtime, your UDF code reads the model file from a local path that maps to the BucketFS location, deserializes it, and uses it for inference.

The Transformers Extension follows this pattern. You download a Hugging Face model into BucketFS using a utility UDF, and then run inference against that cached model without re-downloading it. The BucketFS connection object stores the endpoint URL, bucket name, and credentials. You create it once and reference it by name in your UDF calls.

For full SQL examples of downloading models and running inference, see Open source models (Hugging Face).

SCALAR vs SET UDFs for inference workloads

UDFs have two input modes that determine how the database feeds data to your code.

SCALAR UDFs process one row at a time. Exasol calls run() once per input row and distributes rows across multiple VMs on all cluster nodes. This gives you maximum parallelism and is the natural fit for row-level inference, where each row is an independent prediction. Most inference UDFs (classification, entity extraction, translation) use SCALAR mode.

SET UDFs receive a group of rows at once. Inside run(), you iterate through the group using the context object. With a GROUP BY clause, Exasol creates one VM per group and distributes groups across nodes. Without GROUP BY, all rows go to a single VM on a single node, which limits parallelism.

SET UDFs are useful when your model needs to see multiple rows together, for example when computing aggregate statistics, running batch normalization, or performing sequence-level operations where context spans rows.

For most inference tasks, prefer SCALAR UDFs. They parallelize naturally and keep your code simple.

When to use in-database vs external models

In-database models and external model serving (through self-hosted endpoints or cloud APIs) solve different problems. Choose based on your workload characteristics. For a full comparison of all three model connection paths, see the overview in Connect to AI models.

Use in-database models when:

  • You are running batch inference over large tables. Keeping data local avoids transfer latency.
  • Your model fits within the memory available to UDF VMs (limits depend on your cluster configuration).
  • You want a simple deployment with no external infrastructure to manage.
  • You need the inference to run within a single SQL transaction.

Use external models when:

  • You need LLM capabilities that smaller in-database models cannot provide.
  • Your model requires GPU acceleration that the database nodes do not provide.
  • You need to serve real-time, low-latency predictions to applications outside the database.
  • The model is too large to load efficiently in a UDF container.
  • You need model versioning, A/B testing, or other MLOps features that a serving platform provides.

In practice, most teams use a hybrid approach. They run batch scoring jobs in-database for analytics and reporting, while calling external models for tasks that require larger or more capable LLMs.