Exasol provides a package for R that you can use to connect to an Exasol database and then read and write data from/to the database from your R environment. The R package is optimized for reading data using
exa.readData() function and
exa.writeData() function to write the data into the database on a multi-node cluster.
This section explains how you can install and then use the R package with a sample dataset from Boston Housing.
To use the R package with Exasol, you need the followings:
- R programming language
- Exasol R Package
- An SQL client
- Exasol ODBC Driver
- A Running Exasol instance
Install developer extensions for R to build from sources:
- For Windows: Install Rtools matching your version of R from here.
- For Linux: Install the R-devel (RPM) or r-base-dev (Debian) package.
Always check the official R-project's download section for your OS to find the latest package.
Install ODBC Driver
Install Exasol ODBC driver by following the steps mentioned in ODBC Driver. After installation is complete, it's recommended to create a DSN pointing to your database instance. For more information about it, see README.txt file of the downloaded ODBC package.
On Linux, you also need the development files for ODBC. Therefore, install unixodbc-devel (RPM) or unixodbc-dev (Debian) package.
You can install the R Package using
devtools package. If
devtools package is not installed, run the following command:
Run the following command to install the Exasol R Package:
You can use the following methods to get help regarding the package:
- ?exasol: Shows package documentation with examples for each method.
- ?dbConnect: Shows documentation of individual commands with Exasol-specific parameters.
Connection to Database
You can connect with Exasol database via
dbConnect function. In the arguments of the
dbConnect function make sure to specify the correct
exahost. This is the IP address that connects to the database and might change in case of a virtual machine.
# Connect to Exasol DB with an ODBC DSN
exaconn <- dbConnect(
drv = "exa", # EXAdriver object
exahost = "192.168.56.103:8563", # IP of database cluster
uid = "sys", # Username
pwd = "exasol") # Password
# List all tables in Exasol (returns a character vector).
DBI package also has similar functions. If you use the
DBI package, you might need to call
dbConnect function by using
::: operation in R. For example,
You can write into Exasol database using the
exa.writeData function. This is a low level function and is faster than
dbWriteTable function provided by
Step 1: Load Library in R
Before exploring the arguments of
exa.writeData, load some data in R from the
mlbench library that contains the BostonHousing dataset.
#Add the mlbench library containing the Boston Housing dataset. This is used as an example dataset
# Load some data into R
Step 2: Create Schema
Exasol stores data in tables that are organized in schemas, therefore, you need to create a schema for the table you are going to work with.
To create a schema in the Exasol database in R, you can use the
RODBC package which implements ODBC database connectivity in R.
# Create an empty table in r_demo with the name `boston_housing` and specify
# column names and types as per Exasol standards.
"CREATE OR REPLACE TABLE boston_housing(
# Check again the list of tables contained in the Exasol database
Step 3: Write Data
The database now contains a new schema and table with its column names. To populate the columns with the actual data, exa.writeData is used. Specify the connection, source data set name, and the table name in the arguments of the function.
# Finally write the data into Exasol
exa.writeData(exaconn, data = BostonHousing, tableName = "boston_housing")
Step 4: Verify the Data (optional)
Check the newly created table in an SQL client of your choice.
You can read from Exasol using the exa.readTable function (it's faster than dbReadTable function). In the arguments, specify the connection and the query to be executed. This pulls the query's result over from the database into a data.frame.
# Read table from the Exasol database into the data.frame object called boston_housing_exasol
boston_housing_exasol = exa.readData(exaconn, "SELECT * FROM boston_housing")
You can view the same data that you transferred to the Exasol database.
bhplot = ggplot(boston_housing_exasol, aes(RM, MEDV)) +
geom_point(colour = "black", shape = 1) +
ggtitle("House price vs. # of rooms per house")
A plot of the MEDV (median house value) vs RM (rooms per house) is shown below.
When done, you can clean up and close the database connection.
# Optional: Drop schema and the table it contains
# odbcQuery(exaconn, "DROP SCHEMA r_demo CASCADE")# Disconnect
Train and Predict through UDFs
If you want to know how to use R and Exasol, both as standalone tools and combined, to run a machine learning algorithm such as Random Forests (RF) on some data, see Train and Predict with R through UDFs.