Text summarization

Learn how to do text summarization in an Exasol database by connecting it with an LLM.

You can run text summarization entirely within your own infrastructure by connecting Exasol to a self-hosted LLM. This approach keeps your data on-premises, avoids per-token API charges, and gives you full control over model selection and performance tuning.

The pattern uses a Python UDF that sends text to a locally running model server (Ollama) and returns the summary as a SQL result. No data leaves your network.

How it works

The architecture has three components:

  1. Exasol database stores the text data and executes the UDF.
  2. Ollama runs on a local machine or server and hosts the LLM.
  3. A Python UDF bridges the two by making HTTP requests from inside the database to the Ollama API.

When you run a query that calls the summarization UDF, Exasol passes each row’s text to the UDF. The UDF sends that text to Ollama with a prompt instruction, receives the generated summary, and returns it as part of the SQL result set.

Prerequisites

  • An Exasol database (Docker or on-premises)
  • Ollama installed on a host accessible from the Exasol cluster
  • The Mistral 7B model (or another Ollama-compatible model) downloaded
  • Approximately 4 GB of disk space for the model

Set up Ollama

Install Ollama and pull the Mistral model:

Copy
ollama pull mistral:latest

Verify that the Ollama API is reachable by sending a test request from the machine running Exasol:

Copy
curl http://<OLLAMA_HOST_IP>:11434/api/generate \
  -d '{"model": "mistral:latest", "prompt": "Hello", "stream": false}'

You should receive a JSON response containing a response field.

Create the connection object

Store the Ollama endpoint in a connection object so your UDFs can reference it by name:

Copy
CREATE OR REPLACE CONNECTION OLLAMA_CONNECTION
    TO 'http://<OLLAMA_HOST_IP>:11434/api/generate'
    IDENTIFIED BY '';

Replace <OLLAMA_HOST_IP> with the actual IP address of the host running Ollama.

Create the summarization UDF

This SCALAR UDF takes a text value, reads the Ollama endpoint from a connection object, sends the text to Ollama, and returns the summary:

Copy
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT my_schema.summarize_article(
    article_text VARCHAR(20000)
)
RETURNS VARCHAR(2000) AS
import requests

def run(ctx):
    conn_info = exa.get_connection('OLLAMA_CONNECTION')
    ollama_url = conn_info.address

    text = ctx.article_text
    if text is None or text.strip() == '':
        return None
    try:
        payload = {
            'model': 'mistral:latest',
            'prompt': 'Summarize this news article in exactly one brief sentence: ' + text,
            'stream': False,
            'options': {'temperature': 0.3, 'num_predict': 50}
        }
        response = requests.post(
            ollama_url,
            json=payload,
            timeout=15
        )
        return response.json()['response'].strip()
    except Exception as e:
        return 'ERROR: ' + str(e)
/

The exa.get_connection() call retrieves the URL you stored in the OLLAMA_CONNECTION object, so you do not need to hardcode the Ollama IP in the script body.

The requests library is available in Exasol’s default script language container (SLC). If you use a custom SLC, verify that requests is included.

For a reusable version that reads the endpoint from a connection object passed as a parameter, use a SET UDF instead:

Copy
CREATE OR REPLACE PYTHON3 SET SCRIPT my_schema.summarize_with_timing(
    connection_name VARCHAR(200),
    heading VARCHAR(1000),
    article_text VARCHAR(20000)
)
EMITS (heading VARCHAR(1000), summary VARCHAR(2000), duration_seconds DOUBLE) AS
import requests

def run(ctx):
    conn_info = exa.get_connection(ctx.connection_name)
    ollama_url = conn_info.address

    while True:
        heading = ctx.heading
        text = ctx.article_text

        if text is None:
            ctx.emit(heading, None, None)
            if not ctx.next():
                break
            continue

        try:
            payload = {
                'model': 'mistral:latest',
                'prompt': 'Summarize this news article in exactly one brief sentence: ' + text,
                'stream': False,
                'options': {'temperature': 0.3, 'num_predict': 50}
            }
            response = requests.post(ollama_url, json=payload, timeout=15)
            result = response.json()
            duration = float(result.get('total_duration', 0)) / 1000000000.0
            summary = result['response'].strip()
            ctx.emit(heading, summary, duration)
        except Exception as e:
            ctx.emit(heading, 'ERROR: ' + str(e), 0.0)

        if not ctx.next():
            break
/

Run summarization queries

Summarize articles using the scalar UDF:

Copy
SELECT heading, my_schema.summarize_article(article) AS summary
FROM my_schema.articles;

Or use the SET UDF with timing data:

Copy
SELECT my_schema.summarize_with_timing(
    'OLLAMA_CONNECTION',
    heading,
    article
)
FROM my_schema.articles
GROUP BY heading, article;

The GROUP BY clause is required when calling a SET UDF. Exasol groups the input rows according to the GROUP BY columns and passes each group to the UDF as a batch. Here, grouping by heading, article means each row is its own group, so the UDF processes one article at a time. If you omit GROUP BY, Exasol sends all rows as a single group, and the UDF processes them in one run() call.

Performance expectations

With Mistral 7B on consumer hardware (CPU-only), expect approximately 1 to 3 seconds per article summary. Performance varies based on article length, hardware capabilities, and model load.

Tuning options

Temperature controls output randomness. Use a low value (0.1 to 0.3) for consistent, deterministic summaries. Higher values produce more varied output.

num_predict limits the number of generated tokens. For single-sentence summaries, 50 tokens is usually sufficient. Increase this if you need longer summaries.

Model selection affects both quality and speed. Mistral 7B offers a good balance of quality and speed with an Apache 2.0 license. Smaller models (such as Qwen 4B) are faster but may produce lower-quality summaries. Larger models improve quality at the cost of latency.

Quantized models (4-bit or 8-bit variants) reduce memory requirements and improve throughput. If your hardware is constrained, try a quantized version of your chosen model.

Alternative approach: Transformers Extension

If you prefer to run models entirely inside the database without an external Ollama server, the Transformers Extension supports text generation using Hugging Face models stored in BucketFS. The Transformers Extension runs PyTorch-based models as UDFs, keeping the entire pipeline within Exasol. This approach is better suited for shorter text generation tasks and scenarios where you cannot run an external model server.

Next steps