Sales forecasting

This tutorial teaches you how to build a predictive model with Exasol AI Lab.

The sales forecasting example demonstrates how to build a predictive model using Exasol and Exasol AI Lab. You load historical sales data into Exasol, train a forecasting model in a Jupyter notebook, and generate predictions, all while keeping the data in the database.

This tutorial uses the pre-built sales forecasting notebooks included with Exasol AI Lab.

What you will build

The example trains a model on historical store sales data to predict future revenue. It covers the full ML workflow:

Load and prepare sales data in Exasol.
Explore the data to understand patterns and seasonality.
Train a forecasting model using Python libraries.
Generate predictions and evaluate model accuracy.

The notebooks handle each step with a mix of SQL (for data operations in Exasol) and Python (for model training and evaluation).

Prerequisites

Exasol AI Lab installed and running (see Set up Exasol AI Lab)
An Exasol database instance (the Docker DB bundled with Exasol AI Lab works, or use an on-premises instance)
The initial Exasol AI Lab configuration completed (the main_config notebook)

Set up the example

Configure the database connection

Open the main_config notebook in Exasol AI Lab’s JupyterLab interface and process all cells. Select whether you are using the Docker DB that comes with Exasol AI Lab or an on-premises Exasol instance. The final cell creates the schema needed for the example.

Install the demo files

Exasol AI Lab ships with example notebooks. After launching Exasol AI Lab, check the JupyterLab file browser for a sales forecasting folder or demo package. If the examples are distributed as a .zip file, extract them into a sales_forecasting folder in Exasol AI Lab’s root directory. You can extract from within JupyterLab:

Copy

import zipfile

with zipfile.ZipFile('sales_forecasting.zip', 'r') as zip_ref:
    zip_ref.extractall('sales_forecasting')

The exact file name and location may vary by Exasol AI Lab version. Consult the Exasol AI Lab repository or the first_steps.ipynb notebook for pointers to the current demo packages.

Run the example

Step 1: Load the data

Look for a data-loading notebook (commonly named something like export_store_data or similar) in the sales forecasting folder. This notebook loads the sample sales dataset into your Exasol database. It creates the necessary tables and imports historical transaction records.

Run all cells in order. When complete, you should have a populated sales data table in your Exasol schema.

Step 2: Train and predict

Open the main sales forecasting notebook. This notebook walks through:

Data exploration. SQL queries examine the sales data: total revenue by store, seasonal patterns, trends over time. Understanding these patterns helps you choose appropriate model features.

Feature engineering. The notebook prepares the data for model training by extracting time-based features (day of week, month, holiday indicators) and aggregating sales at the appropriate granularity.

Model training. A forecasting model is trained on the historical data. The notebook uses Python ML libraries available in the Exasol AI Lab environment.

The specific model type (ARIMA, Prophet, gradient boosting, or another approach) depends on the notebook version. The notebooks may offer multiple approaches or let you choose.

Prediction and evaluation. The trained model generates predictions for future periods. The notebook compares predictions against held-out test data and reports accuracy metrics.

Run the cells sequentially. Each cell builds on the output of the previous one.

How Exasol supports the workflow

The sales forecasting example illustrates a common pattern for ML workloads on Exasol.

Data stays in the database. The training data lives in Exasol tables. The notebook connects to Exasol using pyexasol (included in Exasol AI Lab) and runs SQL queries to retrieve data for analysis and training. Large datasets do not need to be exported to files.

SQL for data preparation. Aggregations, joins, and filtering happen in Exasol using SQL, where the MPP engine can process them in parallel. Only the prepared feature set is pulled into Python for model training.

Notebook for ML logic. Model training, hyperparameter tuning, and evaluation run in the Jupyter environment where Python ML libraries are available. This separation lets each component do what it does best: Exasol for data processing, Python for model logic.

Adapting the example to your data

To use this pattern with your own sales data:

Prepare your data in Exasol. Load your historical sales records into an Exasol table with columns for date, sales amount, and any relevant dimensions (store, product category, region).
Adjust the SQL queries. Modify the data exploration and feature engineering queries to match your table structure and column names.
Tune the model. Experiment with different feature sets, training periods, and model parameters to fit your specific forecasting problem.
Schedule retraining. As new sales data accumulates, periodically retrain the model to keep predictions accurate.

Next steps

Set up Exasol AI Lab if you have not installed Exasol AI Lab yet
In-database models (UDFs) to learn about running trained models as UDFs inside Exasol
AI architecture overview for background on UDFs, BucketFS, and parallel processing