R Package
This section explains how to install and use the R package with a sample dataset.
Exasol provides a package for R that you can use to connect to an Exasol database and then read and write data from/to the database from your R environment. The R package is optimized for reading data using exa.readData()
function and exa.writeData()
function to write the data into the database on a multi-node cluster.
Prerequisites
To use the R package with Exasol, you need the following:
- R programming language
- RStudio
- Exasol R Package
- An SQL client
- Exasol ODBC Driver
- A running Exasol instance
Installation
Install Rtools
Install developer extensions for R to build from sources:
- For Windows: Install Rtools matching your version of R from here.
- For Linux: Install the R-devel (RPM) or r-base-dev (Debian) package.
Always check the official R-project's download section for your OS to find the latest package.
Install ODBC Driver
Install Exasol ODBC driver by following the steps mentioned in ODBC Driver. After installation is complete, it's recommended to create a DSN pointing to your database instance. For more information about it, see README.txt file of the downloaded ODBC package.
On Linux, you also need the development files for ODBC. Therefore, install unixodbc-devel (RPM) or unixodbc-dev (Debian) package.
Install devtools
You can install the R Package using devtools
package. If devtools
package is not installed, run the following command:
Install R Package
Run the following command to install the Exasol R Package:
Help
You can use the following commands in R to get help regarding the package:
- ?exasol: Shows package documentation with examples for each method.
- ?dbConnect: Shows documentation of individual commands with Exasol-specific parameters.
Connection to Database
You can connect with Exasol database via dbConnect
function. In the arguments of the dbConnect
function make sure to specify the correct exahost
. This is the IP address that connects to the database and might change in case of a virtual machine.
# Connect to Exasol DB with an ODBC DSN
exaconn <- dbConnect(
drv = "exa", # EXAdriver object
exahost = "192.168.56.103:8563", # IP of database cluster
uid = "sys", # Username
pwd = "exasol") # Password
# List all tables in Exasol (returns a character vector).
dbListTables(exaconn)
The DBI
package also has similar functions. If you use the DBI
package, you might need to call dbConnect
function by using :::
operation in R. For example, exasol:::dbConnect
.
Write Data
You can write into Exasol database using the exa.writeData
function. This is a low level function and is faster than dbWriteTable
function provided by DBI
.
Step 1: Load Library in R
Before exploring the arguments of exa.writeData
, load some data in R from the mlbench
library that contains the BostonHousing dataset.
#Add the mlbench library containing the Boston Housing dataset. This is used as an example dataset
library(mlbench)
# Load some data into R
data("BostonHousing")
Step 2: Create Schema and Tables
Exasol stores data in tables that are organized in schemas, therefore, you need to create a schema for the table you are going to work with.
To create a schema in the Exasol database in R, you can use the RODBC
package which implements ODBC database connectivity in R.
# Create an empty table in r_demo with the name `boston_housing` and specify
# column names and types as per Exasol standards.
odbcQuery(exaconn,
"CREATE OR REPLACE TABLE boston_housing(
crim DOUBLE,
zn DOUBLE,
indus DOUBLE,
chas VARCHAR(10),
nox DOUBLE,
rm DOUBLE,
age DOUBLE,
dis DOUBLE,
rad DOUBLE,
tax DOUBLE,
ptratio DOUBLE,
b DOUBLE,
lstat DOUBLE,
medv DOUBLE
)")
# Check again the list of tables contained in the Exasol database
dbListTables(exaconn)
Step 3: Write Data
The database now contains a new schema and table with its column names. To populate the columns with the actual data, exa.writeData is used. Specify the connection, source data set name, and the table name in the arguments of the function.
# Finally write the data into Exasol
exa.writeData(exaconn, data = BostonHousing, tableName = "boston_housing")
Step 4: Verify the Data (optional)
Check the newly created table in an SQL client of your choice.
Read Data
You can read from Exasol using the exa.readTable function (it's faster than dbReadTable function). In the arguments, specify the connection and the query to be executed. This pulls the query's result over from the database into a data.frame.
# Read table from the Exasol database into the data.frame object called boston_housing_exasol
boston_housing_exasol = exa.readData(exaconn, "SELECT * FROM boston_housing")
You can view the same data that you transferred to the Exasol database.
#Plotting library
library(ggplot2)
bhplot = ggplot(boston_housing_exasol, aes(RM, MEDV)) +
geom_point(colour = "black", shape = 1) +
ggtitle("House price vs. # of rooms per house")
plot(bhplot)
A plot of the MEDV (median house value) vs RM (rooms per house) is shown below.
Close Connection
When done, close the database connection.
Train and Predict through UDFs
If you want to know how to use R and Exasol, both as standalone tools and combined, to run a machine learning algorithm such as Random Forests (RF) on some data, see Train and Predict with R using UDFs.