Train and Predict with R using UDFs

This section describes how to train and predict using R with Exasol user defined functions (UDFs).

The topics in this section provide you with a hands-on demonstration of how to use R and Exasol, both as standalone tools and combined, to run a machine learning algorithm such as Random Forest on test data. The purpose of the demo is to show you how to use the Exasol R package and Exasol’s user defined functions (UDFs).

UDF scripts provide a flexible interface for implementing your requirement by integrating the Java, Lua, Python and R languages in an Exasol native environment. By using UDF scripts you can program your own analysis, processing, or generation functions, and execute them in parallel inside Exasol. However, you are not limited to use UDFs in Exasol. Exasol R package also allows use of UDFs via the exa.createScript() function, which deploys R code dynamically from any R environment into an Exasol database. To get more information about this function, run ?exa.createScript in RStudio.

For more information about how to use UDFs, see UDF Scripts.

The learning process starts with training a model on some sample training data. Then, the trained model can be used to make predictions on separate (unseen) test data. The data in this case consists of housing data from the Boston Housing dataset. This will be a regression exercise since the response variable we are trying to predict is a continuous one, the median value of housing medv.

Traditionally, you load the data from a local machine into R or RStudio. Then, you run the analysis by training the model, evaluating and making predictions in such environment. However, in this demo you use some additional scenarios: