Metadata enrichment with the Metadata Agent
Learn how to generate rich metadata using the Metadata Agent in Exasol.
The Metadata Agent is an AI agent that analyzes your Exasol database and automatically generates metadata, including descriptions, relationships, and data quality annotations. It examines schema structures and sample data to produce documentation that would otherwise require manual effort from database administrators.
Rich metadata is not just useful for human readers. It directly improves the accuracy of AI features that depend on schema understanding, including text-to-SQL, MCP-based assistants, and RAG pipelines.
Why metadata matters for AI
When an LLM generates a SQL query from a natural language question, it needs to understand what each table and column represents. A column named rev_q3 is ambiguous. A column named rev_q3 with the comment Revenue in USD for Q3 of the fiscal year
is not.
The Metadata Agent addresses this gap by generating descriptions automatically. The result is a self-describing database
where every schema object has meaningful documentation that both humans and AI systems can use.
Specifically, better metadata improves:
- Text-to-SQL accuracy. LLMs generate more correct queries when they understand column semantics.
- MCP Server responses. The MCP Server exposes schema metadata to AI assistants. Richer metadata means better context for the assistant.
- Data discovery. Analysts spend less time asking colleagues what a table contains when the database documents itself.
How the Metadata Agent works
The Metadata Agent connects to your Exasol database, reads the schema catalog, and uses an LLM to generate descriptions for tables and columns. It writes these descriptions back to the database as comments.
You point the agent at a schema, and it produces metadata without manual intervention.
Prerequisites
- An Exasol database
- Python 3.8 or later
- Ollama installed and running with the Mistral model
- Network connectivity between the agent host and the Exasol database
Installation
Clone the repository and install the dependencies:
git clone https://github.com/exasol-labs/metadata-agent.git
cd metadata-agent
pip install -r requirements.txt
The agent uses Ollama with the Mistral model as its local LLM. Install and start Ollama, then pull the model:
ollama pull mistral
Running the agent
Set the required environment variables for your Exasol connection:
export EXASOL_DSN="<host>:<port>"
export EXASOL_USER="<username>"
export EXASOL_PASSWORD="<password>"
Then run the agent against your target schema. The agent provides several run modes:
- Default interactive mode: generates suggestions, lets you review, approve, and apply them
--auto-approve: automatically approve and apply all generated metadata--generate-only: generate suggestions without applying them--apply-only: apply previously generated suggestions--dry-run: preview what would be changed without modifying the database
The agent also includes a web UI powered by Gradio, accessible at http://localhost:7860, and a Jupyter tutorial (data_steward_tutorial.ipynb) for step-by-step walkthroughs.
For the full list of CLI commands and configuration options, see Metadata Agent on GitHub.
The agent processes each table in the schema:
- Reads the table structure (column names, types, constraints).
- Samples data from the table to understand content patterns.
- Sends the schema information and sample data to an LLM.
- Receives generated descriptions and writes them as database comments.
What the agent produces
The Metadata Agent generates several types of metadata:
Descriptions. Summaries for tables and columns explaining their contents and business purpose. For example, a table description might read: Customer order history including order dates, amounts, and fulfillment status.
A column comment might read: Total order amount in EUR, excluding VAT.
Relationship annotations. Notes about how tables relate to each other in the broader schema.
Data quality annotations. Observations about data quality issues discovered during analysis.
Using enriched metadata with other AI features
Once the Metadata Agent has documented your schema, other Exasol AI features benefit automatically.
MCP Server. When an AI assistant connects to Exasol through the MCP Server, it reads schema metadata as part of its context. Descriptions generated by the Metadata Agent appear in the assistant's schema understanding, leading to more accurate query generation and better conversational responses.
Text-to-SQL. Any text-to-SQL system that reads Exasol’s catalog comments will benefit from the enriched metadata. The LLM generating SQL has more context to disambiguate column names and understand table relationships.
AI-assisted querying. When AI assistants generate SQL against your database (using MCP or text-to-SQL), enriched metadata helps them produce more accurate queries by understanding what each table and column represents.
Practical considerations
Review generated metadata. The agent uses an LLM to generate descriptions, so the output may occasionally be inaccurate. Review the generated comments before relying on them for production AI workloads. Incorrect metadata can mislead downstream AI systems.
Schema size. Processing a large schema with hundreds of tables takes time and LLM API calls. Consider running the agent incrementally, starting with the most-queried or most-ambiguous tables.
Cost. Each table and column description requires LLM inference. If you use a commercial LLM API, factor in the token cost for processing your full schema.
Re-running after schema changes. When you add or modify tables, re-run the agent to keep metadata current. Stale descriptions are worse than no descriptions, because they mislead users and AI systems.
Reviewing generated metadata
After the agent runs, you can query the Exasol system catalog to see the generated comments:
SELECT
COLUMN_SCHEMA,
COLUMN_TABLE,
COLUMN_NAME,
COLUMN_COMMENT
FROM SYS.EXA_ALL_COLUMNS
WHERE COLUMN_SCHEMA = 'MY_SCHEMA'
AND COLUMN_COMMENT IS NOT NULL
ORDER BY COLUMN_TABLE, COLUMN_ORDINAL_POSITION;
For table-level comments:
SELECT
TABLE_SCHEMA,
TABLE_NAME,
TABLE_COMMENT
FROM SYS.EXA_ALL_TABLES
WHERE TABLE_SCHEMA = 'MY_SCHEMA'
AND TABLE_COMMENT IS NOT NULL
ORDER BY TABLE_NAME;
Review the output to confirm that descriptions are accurate before relying on them for downstream AI features.
Related resources
- Metadata Agent on GitHub for source code and current documentation
- Connect AI assistants (MCP Server) to see how enriched metadata improves MCP-based interactions