Apache Spark Integration

This section provides you with information about how to integrate Apache Spark with Exasol. This is an open source project officially supported by Exasol.

Apache Spark Exasol Connector

The Spark Exasol Connector is an open-source project that provides an integration between Apache Spark and Exasol. You can use this connector in your Spark application to create Spark dataframes from Exasol queries and also save the dataframes as Exasol tables.

Prerequisites

To integrate the Spark application with Exasol, you need the following:

  • An operational Spark cluster
  • An operational Exasol cluster
  • Enough resources in the Spark cluster to start executors that are more or equal to the total number of Exasol data nodes.
  • Access to Exasol nodes from the Spark cluster on port 8563 and on port range 20000-21000.

Setup

You can use one of the following methods to include the Spark Exasol Connector as a dependency in your Spark application:

build.sbt

resolvers ++= Seq("Exasol Releases" at "https://maven.exasol.com/artifactory/exasol-releases")

libraryDependencies += "com.exasol" %% "spark-connector" % "<LATEST_VERSION>"

maven.pom

<repository>
    <id>maven.exasol.com</id>
    <url>https://maven.exasol.com/artifactory/exasol-releases</url>
</repository>

<dependency>
    <groupId>com.exasol</groupId>
    <artifactId>spark-connector_2.11</artifactId>
    <version><LATEST_VERSION></version>
</dependency>

spark-shell

spark-shell \
    --repositories https://maven.exasol.com/artifactory/exasol-releases \
    --packages com.exasol:spark-connector_2.11:<LATEST_VERSION>

spark-submit

spark-submit \
    --master spark://spark-master-url:7077
    --repositories https://maven.exasol.com/artifactory/exasol-releases \
    --packages com.exasol:spark-connector_2.11:<LATEST_VERSION> \
    --class com.myorg.MySparkClass \
    --conf spark.exasol.password=exaTru3P@ss \
    path/to/project/folder/jars/spark-exasol-connector-<LATEST_VERSION>.jar

Examples

The following example shows a code snippet of how to use the connector in Spark/ Scala applications.

// An Exasol sql syntax query string
val exasolQueryString = """
    SELECT SALES_DATE, MARKET_ID, PRICE
     FROM RETAIL.SALES
     WHERE MARKET_ID IN (661, 534, 667)
 """

// Creates dataframe from given query
val df = sparkSession
     .read
     .format("exasol")
     .option("host", "10.0.0.11")
     .option("port", "8563")
     .option("username", "sys")
     .option("password", "exaPass")
     .option("query", exasolQueryString)
     .load()


df.collect().foreach(println)

// Saves dataframe as an Exasol table
val df = sparkSession
     .write
     .mode("append")
     .option("host", "10.0.0.11")
     .option("port", "8563")
     .option("username", "sys")
     .option("password", "exaPass")
     .option("table", "RETAIL.ADJUSTED_SALES")
     .format("exasol")
     .save()

For more examples of the usage, see Spark Exasol Connector repository on GitHub.

Contribute to the Project

Exasol encourages your contribution to the open source project. To know about how to contribute to the project, see Contributing.