Apache Spark integration
This article explains how to integrate Apache Spark with Exasol.
Apache Spark Exasol Connector
Spark Exasol Connector is an open source project that provides an integration between Apache Spark and Exasol. You can use this connector in your Spark application to create Spark dataframes from Exasol queries and also save the dataframes as Exasol tables.
Prerequisites
To integrate the Apache Spark application with Exasol you need the following:
- An operational Spark cluster
- An operational Exasol cluster
- Enough resources in the Spark cluster to start executors that are more or equal to the total number of Exasol data nodes.
- Access to Exasol nodes from the Spark cluster on port 8563 and on port range 20000-21000.
Setup
For more information and examples of how to integrate Apache Spark with Exasol, refer to the Spark Exasol Connector User Guide.
Contribute to the Project
If you want to contribute to the Spark Exasol Connector open source project, see Contributing.