Apache Spark Integration
This section provides you with information about how to integrate Apache Spark with Exasol. This is an open source project officially supported by Exasol.
Apache Spark Exasol Connector
The Spark Exasol Connector is an open-source project that provides an integration between Apache Spark and Exasol. You can use this connector in your Spark application to create Spark dataframes from Exasol queries and also save the dataframes as Exasol tables.
Prerequisites
To integrate the Spark application with Exasol, you need the following:
- An operational Spark cluster
- An operational Exasol cluster
- Enough resources in the Spark cluster to start executors that are more or equal to the total number of Exasol data nodes.
- Access to Exasol nodes from the Spark cluster on port 8563 and on port range 20000-21000.
Setup
You can use one of the following methods to include the Spark Exasol Connector as a dependency in your Spark application:
build.sbt
resolvers ++= Seq("Exasol Releases" at "https://maven.exasol.com/artifactory/exasol-releases")
libraryDependencies += "com.exasol" %% "spark-connector" % "<LATEST_VERSION>"
maven.pom
<repository>
<id>maven.exasol.com</id>
<url>https://maven.exasol.com/artifactory/exasol-releases</url>
</repository>
<dependency>
<groupId>com.exasol</groupId>
<artifactId>spark-connector_2.11</artifactId>
<version><LATEST_VERSION></version>
</dependency>
spark-shell
spark-shell \
--repositories https://maven.exasol.com/artifactory/exasol-releases \
--packages com.exasol:spark-connector_2.11:<LATEST_VERSION>
spark-submit
spark-submit \
--master spark://spark-master-url:7077
--repositories https://maven.exasol.com/artifactory/exasol-releases \
--packages com.exasol:spark-connector_2.11:<LATEST_VERSION> \
--class com.myorg.MySparkClass \
--conf spark.exasol.password=exaTru3P@ss \
path/to/project/folder/jars/spark-exasol-connector-<LATEST_VERSION>.jar
Examples
The following example shows a code snippet of how to use the connector in Spark/ Scala applications.
// An Exasol sql syntax query string
val exasolQueryString = """
SELECT SALES_DATE, MARKET_ID, PRICE
FROM RETAIL.SALES
WHERE MARKET_ID IN (661, 534, 667)
"""
// Creates dataframe from given query
val df = sparkSession
.read
.format("exasol")
.option("host", "10.0.0.11")
.option("port", "8563")
.option("username", "sys")
.option("password", "exaPass")
.option("query", exasolQueryString)
.load()
df.collect().foreach(println)
// Saves dataframe as an Exasol table
val df = sparkSession
.write
.mode("append")
.option("host", "10.0.0.11")
.option("port", "8563")
.option("username", "sys")
.option("password", "exaPass")
.option("table", "RETAIL.ADJUSTED_SALES")
.format("exasol")
.save()
For more examples of the usage, see Spark Exasol Connector repository on GitHub.
Contribute to the Project
Exasol encourages your contribution to the open source project. To know about how to contribute to the project, see Contributing.