Open source models (Hugging Face)
Learn about how you can run pretrained models inside your database using the Exasol Transformers Extension.
The Exasol Transformers Extension lets you run pretrained Hugging Face models directly inside your Exasol database. You download models into BucketFS and call them through SQL UDFs, so your data never leaves the database for inference.
What the Transformers Extension does
The extension provides two categories of UDF scripts. Utility UDFs let you download, list, and delete Hugging Face models stored in BucketFS. Prediction UDFs run inference against those models on data in your Exasol tables.
The extension supports PyTorch-based models from the Hugging Face Hub. It requires Exasol DB version 7.1 or later and Python 3.10 or later.
Supported tasks
The Transformers Extension supports the following Hugging Face task types:
| Task type | What it does |
|---|---|
sequence_classification
|
Classify text into categories using pretrained classifiers |
zero_shot_classification
|
Classify text against arbitrary labels without task-specific training |
token_classification
|
Extract named entities (persons, organizations, locations) from text |
question_answering
|
Answer questions based on context passages stored in Exasol tables |
text_generation
|
Generate text continuations from a prompt |
filling_mask
|
Complete masked tokens in a sentence |
translation
|
Translate text between languages |
Each task type has a corresponding prediction UDF:
| Task type | Prediction UDF |
|---|---|
sequence_classification
|
AI_CLASSIFY_EXTENDED
|
zero_shot_classification
|
AI_CUSTOM_CLASSIFY_EXTENDED
|
token_classification
|
AI_EXTRACT_EXTENDED
|
question_answering
|
AI_ANSWER_EXTENDED
|
text_generation
|
AI_COMPLETE_EXTENDED
|
filling_mask
|
AI_FILL_MASK_EXTENDED
|
translation
|
AI_TRANSLATE_EXTENDED
|
An additional AI_ENTAILMENT_EXTENDED UDF handles text comparison and entailment tasks.
Install and deploy
Install the Python package:
pip install exasol-transformers-extension
Deploy the extension to your Exasol database. This installs the script language container (SLC) and creates the UDF scripts:
python -m exasol_transformers_extension.deploy \
--dsn <host:port> \
--db-user <username> \
--db-pass <password> \
--schema <target_schema> \
--bucketfs-conn-name <connection_name> \
--token-conn-name <token_connection_name>
Key deployment options:
| Option | Default | Description |
|---|---|---|
--dsn
|
(required) | Exasol database host and port |
--db-user
|
(required) | Database username |
--db-pass
|
(required) | Database password |
--schema
|
(required) | Target schema for UDF scripts |
--[no-]deploy-slc
|
True | Install the script language container |
--[no-]deploy-scripts
|
True | Install UDF scripts |
--bucketfs-conn-name
|
(none) | Name of the BucketFS connection object |
--token-conn-name
|
(none) | Name of the Hugging Face token connection object |
For the full list of deployment options, run python -m exasol_transformers_extension.deploy --help or see the Exasol Transformers Extension GitHub repository.
Set up a BucketFS connection
Before you can download or use models, create a connection object that points to your BucketFS:
CREATE OR REPLACE CONNECTION "MyBucketFSConnection"
TO '{"url":"https://my_cluster:6583", "bucket_name":"default", "service_name":"bfsdefault"}'
USER '{"username":"w"}'
IDENTIFIED BY '{"password":"write-password"}';
Replace my_cluster, the bucket name, the service name, and the credentials with your actual BucketFS configuration.
Download and manage models
Download a model from Hugging Face
Use the TE_MODEL_DOWNLOADER_UDF to download a model from the Hugging Face Hub into BucketFS:
SELECT TE_MODEL_DOWNLOADER_UDF(
'MyBucketFSConnection',
'my_models',
'bert-base-uncased',
'sequence_classification',
''
)
The parameters are:
- The BucketFS connection name
- A subfolder path in BucketFS where models are stored
- The Hugging Face model name
- The task type (one of:
filling_mask,question_answering,sequence_classification,text_generation,token_classification,translation,zero_shot_classification) - The Hugging Face token connection name (empty string if not needed)
Access private or gated models
To download models that require authentication, create a connection object with your Hugging Face token and pass its name as the fifth parameter to the downloader UDF. Set the token connection name during deployment using the --token-conn-name option.
List downloaded models
To see which models are stored in BucketFS:
SELECT TE_LIST_MODELS_UDF(
'MyBucketFSConnection',
'my_models'
)
Delete a model
To remove a model from BucketFS:
SELECT TE_DELETE_MODEL_UDF(
'MyBucketFSConnection',
'my_models',
'bert-base-uncased',
'sequence_classification'
)
Offline model upload (air-gapped environments)
If your Exasol database does not have internet access, you can upload models using a command-line script instead of the downloader UDF. Download the model files on a machine with internet access, then upload them to BucketFS using the upload script included in the exasol-transformers-extension package.
For more information, see the Exasol Transformers Extension GitHub repository.
Run inference with SQL
Once a model is downloaded to BucketFS, you can run inference directly in SQL. Each prediction UDF takes the same first five parameters: a device ID (use NULL for default), the BucketFS connection name, the model subfolder, the model name, and the task type. Additional parameters vary by task.
Text classification
Classify text using a pretrained sequence classification model:
SELECT AI_CLASSIFY_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'bert-base-uncased',
'sequence_classification',
text_column
) FROM my_schema.my_table;
Named entity recognition
Extract entities from text using a token classification model:
SELECT AI_EXTRACT_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'dslim/bert-base-NER',
'token_classification',
text_column
) FROM my_schema.my_table;
Question answering
Answer questions based on context passages:
SELECT AI_ANSWER_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'deepset/roberta-base-squad2',
'question_answering',
question_column,
context_column
) FROM my_schema.my_table;
Text generation
Generate text continuations from a prompt:
SELECT AI_COMPLETE_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'gpt2',
'text_generation',
prompt_column
) FROM my_schema.my_table;
Translation
Translate text between languages:
SELECT AI_TRANSLATE_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'Helsinki-NLP/opus-mt-en-de',
'translation',
text_column
) FROM my_schema.my_table;
Fill mask
Complete masked tokens in a sentence using a masked language model:
SELECT AI_FILL_MASK_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'bert-base-uncased',
'filling_mask',
text_column
) FROM my_schema.my_table;
Zero-shot classification
Classify text against arbitrary labels without task-specific fine-tuning. Zero-shot classification uses the AI_CUSTOM_CLASSIFY_EXTENDED UDF:
SELECT AI_CUSTOM_CLASSIFY_EXTENDED(
NULL,
'MyBucketFSConnection',
'my_models',
'facebook/bart-large-mnli',
'zero_shot_classification',
text_column
) FROM my_schema.my_table;
Model storage in BucketFS
All downloaded models are stored in Exasol’s BucketFS, a distributed file system built into the database cluster. Once a model is downloaded, it stays in BucketFS and does not need to be re-downloaded for future inference calls. This makes repeated inference fast and avoids repeated network transfers.
Models are organized by the subfolder path you specify when downloading (the second parameter to TE_MODEL_DOWNLOADER_UDF). Use the TE_LIST_MODELS_UDF to see what is currently stored and TE_DELETE_MODEL_UDF to remove models you no longer need.