Predict Through UDF in SQL

You can run the prediction model from Exasol native environment in the SQL development tool. The model can be deployed using a UDF which dynamically executes R code. It's similar to what you do in section Train and Predict with R through UDFs, except for the way you get the data from the database via the context object ctx. To know more about it, see R with UDF Scripts.

-- 1. Create prediction function
CREATE OR REPLACE R SET SCRIPT r_demo.predict3(...) EMITS (id INT, RealValue DOUBLE, Prediction DOUBLE) AS

# Load library
library(randomForest)

# Get number of columns
numCols <- exa$meta$input_column_count

# Create empty list with length numCols
l = vector("list", numCols)

run <- function(ctx) {

  # Load the model from the bucket
  rf_model = readRDS("/buckets/bucketfs1/bucket1/rf_model")

  # ...or depending on the file type
  #load("/buckets/bucketfs1/bucket1/rf_model.dat")

  # Split input into modest sized chunks, this can be adjusted for memory consumption
  repeat{
    if (!ctx$next_row(1000000))
        break

  # Populate list with data columns
  for (i in 1:numCols){

        l[[i]] <- ctx[[i]]()

  }

  # Convert to data.frame for modeling
  df = as.data.frame(l)

  # Set the column names

  # From the model object, note: model/scenario specific
  # the_colnames <- names(rf_model$forest$xlevels)
  # colnames(df) = the_colnames

  # Assign manually, note: not dynamic
  # Note: case sensitive when using double quotes
  colnames(df) <- c("id", "crim", "zn", "indus", "chas", "nox", "rm",
  "age", "dis", "rad", "tax", "ptratio", "b", "lstat", "medv", "split")

  # Get predictions
  prediction   <- predict(rf_model, newdata = df)

  # Output ids, real values and predicted ones
  ctx$emit(df$id, df$medv, prediction)
  }
}
/

-- 2. Run prediction function
SELECT r_demo.predict3(id,crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat, medv, split)
FROM   r_demo.boston_housing
WHERE  split = 'Test'
GROUP  BY IPROC()         -- Node number for data locality
        , MOD(IPROC(), 5) -- Number of processes per node
ORDER  BY id;

The UDFs you created are saved in your schema and can be accessed via an SQL client.