Train and Predict with R using UDFs

The section illustrates how to use R and Exasol, both as standalone tools and combined, to run a machine learning algorithm such as Random Forest on some test data. The purpose of the demo is to use Exasol R package and Exasol's user defined functions (UDFs).

UDF scripts provide you with the ability to program your own analysis, processing, or generation functions, and execute them in parallel inside Exasol. See UDF Scripts for more information.

UDF scripts provide you with a flexible interface for implementing your requirement by integrating Java, Lua, Python and R language to Exasol native environment. However, you are not limited to use UDFs in Exasol. Exasol R package also allows use of UDFs via the exa.createScript() function which deploys R code dynamically from any R environment into an Exasol database. Run ?exa.createScript in RStudio to get more information about this function.

The learning process starts with training a model on some sample training data. Then, the trained model can be used to make predictions on separate (unseen) test data. The data in this case consists of housing data from the Boston Housing dataset. This will be a regression exercise since the response variable we are trying to predict is a continuous one, the median value of housing medv.

Traditionally, you load the data from a local machine into R or RStudio. Then, you run the analysis by training the model, evaluating and making predictions in such environment. However, in this demo you use some additional scenarios: