Apache Spark Integration

This section explains how to integrate Apache Spark with Exasol.

Apache Spark Exasol Connector

Spark Exasol Connector is an open source project that provides an integration between Apache Spark and Exasol. You can use this connector in your Spark application to create Spark dataframes from Exasol queries and also save the dataframes as Exasol tables.

Prerequisites

To integrate the Apache Spark application with Exasol you need the following:

  • An operational Spark cluster
  • An operational Exasol cluster
  • Enough resources in the Spark cluster to start executors that are more or equal to the total number of Exasol data nodes.
  • Access to Exasol nodes from the Spark cluster on port 8563 and on port range 20000-21000.

Setup

For more information and examples of how to integrate Apache Spark with Exasol, refer to the Spark Exasol Connector User Guide.

Contribute to the Project

Exasol encourages your contribution to this open source project. For information about how to contribute to the project, see Contributing.