Predict Through UDF in SQL
This section explains how to run the prediction model from the Exasol native environment in the SQL development tool.
The model can be deployed using a UDF that dynamically executes R code. This is similar to what is described in Train and Predict in Exasol, except for the way you get the data from the database via the context object ctx
.
For more details on R UDFs, see R.
-- 1. Create prediction function
CREATE OR REPLACE R SET SCRIPT r_demo.predict3(...) EMITS (id INT, RealValue DOUBLE, Prediction DOUBLE) AS
# Load library
library(randomForest)
# Get number of columns
numCols <- exa$meta$input_column_count
# Create empty list with length numCols
l = vector("list", numCols)
run <- function(ctx) {
# Load the model from the bucket
rf_model = readRDS("/buckets/bucketfs1/bucket1/rf_model")
# ...or depending on the file type
#load("/buckets/bucketfs1/bucket1/rf_model.dat")
# Split input into modest sized chunks, this can be adjusted for memory consumption
repeat{
if (!ctx$next_row(1000000))
break
# Populate list with data columns
for (i in 1:numCols){
l[[i]] <- ctx[[i]]()
}
# Convert to data.frame for modeling
df = as.data.frame(l)
# Set the column names
# From the model object, note: model/scenario specific
# the_colnames <- names(rf_model$forest$xlevels)
# colnames(df) = the_colnames
# Assign manually, note: not dynamic
# Note: case sensitive when using double quotes
colnames(df) <- c("id", "crim", "zn", "indus", "chas", "nox", "rm",
"age", "dis", "rad", "tax", "ptratio", "b", "lstat", "medv", "split")
# Get predictions
prediction <- predict(rf_model, newdata = df)
# Output ids, real values and predicted ones
ctx$emit(df$id, df$medv, prediction)
}
}
/
-- 2. Run prediction function
SELECT r_demo.predict3(id,crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat, medv, split)
FROM r_demo.boston_housing
WHERE split = 'Test'
GROUP BY IPROC() -- Node number for data locality
, MOD(ROWNUM, 5) -- Number of processes per node
ORDER BY id;
The UDFs you created are saved in your schema and can be accessed using an SQL client.