Train and Predict with R through UDFs

The section illustrates how to use R and Exasol, both as standalone tools and combined, to run a machine learning algorithm such as Random Forests (RF) on some test data. The purpose of the demo is to use Exasol R package and Exasol's user defined functions (UDFs).

UDF scripts provide you with the ability to program your own analysis, processing, or generation functions, and execute them in parallel inside Exasol. See UDF Scripts for more information.

UDF scripts provides you with a flexible interface for implementing your requirement by integrating Java, Lua, Python and R language to Exasol native environment. However, you are not limited to use UDFs in Exasol. Exasol R package also allows use of UDFs via exa.createScript() function which deploys R code dynamically from any R environment into Exasol database in parallel. See Help of ?exa.createScript for more information.

The learning process starts with training a model on some sample training data. Then, the trained model can be used to make predictions on separate (unseen) test data. The data in this case consists of housing data from the Boston Housing dataset. This will be a regression exercise since the response variable we are trying to predict is a continuous one, the median value of housing medv.

Traditionally, you load the data from a local machine into R or RStudio. Then, you run the analysis by training the model, evaluating and making predictions in such environment. However, in this demo you use some additional scenarios: