MLflow model serving
Learn how to serve models as HTTP endpoints using the Exasol MLflow Server.
The Exasol MLflow Server lets you deploy Hugging Face models as HTTP endpoints and call them from Exasol UDFs or external applications. It provides an MLflow-compatible serving layer that sits between your database and your models, so you can run inference over HTTP without embedding model execution directly in UDF code.
How it works
The MLflow server loads a Hugging Face model and exposes it through an HTTP API. Your Exasol UDFs send inference requests to this endpoint and receive predictions back as HTTP responses. External applications outside of Exasol can also call the same endpoint.
This architecture separates model hosting from database execution. The model runs in its own process with dedicated resources (CPU or GPU), while Exasol handles the data retrieval and result processing. You can scale the model server independently of your database cluster.
When to use MLflow model serving
Exasol offers two main approaches for running inference against ML models. Each fits different requirements.
| Approach | How it works | Best for |
|---|---|---|
| MLflow model serving | Models run as HTTP endpoints on a separate server | Teams already using MLflow; models that need dedicated GPU resources; serving a model to multiple consumers (Exasol + other apps) |
| Direct UDF inference (Transformers Extension) | Models run inside Exasol UDFs on the database nodes | Low-latency inference on data already in Exasol; simpler deployment with no external services |
Choose MLflow model serving when you want a single model endpoint that multiple systems can call, or when your team already manages models through MLflow's tracking and registry workflow. Choose direct UDF inference when you want the simplest setup and your inference workload runs entirely within Exasol. For a broader comparison of all model connection paths, see the introduction in Connect to AI models.
Prerequisites
- Python 3 (see Exasol MLflow Server on GitHub for the minimum supported version)
- An Exasol database instance (version 7.1 or later)
- Network connectivity between your Exasol cluster and the machine hosting the MLflow server
Set up the MLflow server
The following steps illustrate the general setup pattern. The repository README has the most current installation and configuration instructions.
Install the server
Clone the repository and install the dependencies:
git clone https://github.com/exasol-labs/exasol-labs-mlflow-server.git
cd exasol-labs-mlflow-server
pip install -r requirements.txt
Start the server
python -m exasol_mlflow_server --model <model-name> --host 0.0.0.0 --port 5000
Replace <model-name> with the Hugging Face model identifier you want to serve (for example, distilbert-base-uncased). For the full list of available startup flags, see Exasol MLflow Server on GitHub.
Call the endpoint from Exasol
Once the server is running, you can call it from a Python UDF in Exasol using HTTP requests. The example illustrates the general pattern. The actual request format and endpoint path depend on the server’s API.
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT MY_SCHEMA.PREDICT_VIA_MLFLOW(input_text VARCHAR(2000))
RETURNS VARCHAR(2000) AS
import requests
def run(ctx):
response = requests.post(
'http://<mlflow-server-host>:5000/invocations',
json={"inputs": [ctx.input_text]},
headers={"Content-Type": "application/json"}
)
return response.json()["predictions"][0]
/
Replace <mlflow-server-host> with the hostname or IP address of the machine running the MLflow server.
You can then call this UDF from SQL:
SELECT MY_SCHEMA.PREDICT_VIA_MLFLOW(text_column)
FROM MY_SCHEMA.MY_TABLE;
Architecture considerations
Resource isolation. Because the model runs on a separate server, you avoid consuming CPU and memory on the Exasol database nodes for inference. This matters for large models or high-throughput inference workloads.
Network latency. Every inference call is an HTTP round trip. For bulk inference on millions of rows, the network overhead can add up. If latency is a concern, consider the Transformers Extension for in-database inference instead.
Model lifecycle. MLflow provides built-in model versioning, experiment tracking, and a model registry. If your team already uses these features, the MLflow server integrates naturally into that workflow.
Further reading
- Exasol MLflow Server on GitHub
- Exasol MLflow Benchmarks on GitHub for performance comparisons between in-database and external inference patterns
- MLflow documentation
- Hugging Face model hub