PyExasol

PyEXASOL is a custom Python driver for Exasol created by Badoo. It helps us to handle massive volumes of data commonly associated with this database.

You may expect significant performance improvement over existing ODBC / JDBC solutions in single process scenario involving pandas. It is also possible to split data set across multiple processes and servers to achieve linear scalability. With PyExasol you are not limited by single CPU core.

Prerequisites

To run PyExasol, you need:

  • An Exasol installation
  • An environment with a Python 3 installation, version 3.6 or above
  • Pip to install additional modules
  • Make sure you are able to ping the Exasol instance from the computer with the Python installation.
  • The procedure in the document uses Jupyter Notebook as command-line tool. However, this is not mandatory for using the PyExasol package.

Procedure

  1. Launch Anaconda Navigator on your system.
  2. Launch Jupyter Notebook from the navigator home screen.


    On launching the Jupyter Notebook, Jupyter Notebook Dashboard opens in a window opens in your default web browser.
  3. Click NewPython 3 on the Jupyter Notebook Dashboard.

    Jupyter Notebook Editor opens in a new tab.
  4. Enter the following command in the edit command mode and click Run.
    pip install pyexasol

    pyexasol package is installed.

    You may need to restart the kernel to use the updated package. You can do that by selecting KernelRestart in the Jupyter Notebook Editor menu.

  5. Enter the following command in the edit command mode and click Run.
    import pyexasol

    All the required files are loaded into the notebook.

  6. Enter the following command in the edit command mode to connect to Exasol database.
    C = pyexasol.connect(dsn='<HostIP:port>', user='<username>', password='<password>')

    The following parameters are used in the above command:

    • dsn: Database source name containing the IP address and port of Exasol database.
    • user: Database username to log in.
    • password: Password for the database user.
  7. Enter the following command to load data into a panda.DataFrame.
  8. import pyexasol
    C = pyexasol.connect(dsn='<host:port>', user='<username>', password='<password>', compression=True)
    df = C.export_to_pandas("SELECT * FROM EXA_ALL_USERS")
    print(df.head())

Additional Information

For any additional information (examples, reference, best practices), refer to the PyExasol Github repository.