BucketFS

This section describes the BucketFS file system in Exasol.

What is BucketFS?

BucketFS is a synchronous file system that is available on all database nodes in an Exasol clusters. Each node in the cluster can connect to the BucketFS service and will see the same content as the other nodes.

A BucketFS service contains a number of buckets that can store a number of files. Each bucket can have different access privileges. Folders are not supported directly, but if you specify an upload path including folders, these will be created. If all files from a folder are deleted, the folder will automatically be dropped.

Each configured data disk in Exasol has a preinstalled BucketFS service with a default bucket. You can create additional BucketFS services as needed.

Writing data to BucketFS is an atomic operation. There is no lock on files, so the latest write operation will overwrite the file. In contrast to the database itself, BucketFS is a pure file-based system and has no transactional semantic.

When scripts are executed in parallel on the Exasol cluster, sometimes all instances need to access the same external data. Your algorithms could for example use a statistical model, or weather data. For such requirements it is obviously possible to use an external service such as a file server. In terms of performance it may however be better to have such data available locally on the cluster nodes. The Exasol BucketFS is developed for such use cases, where data should be stored synchronously and replicated across the cluster.

This video explains how to set up and use BucketFS within UDF scripts.

Setting up BucketFS and creating buckets

You can configure the BucketFS in the EXAoperation user interface. There is a pre-installed default BucketFS service for the configured data disk. If you want to create additional file system services, you need to specify only the data disk and specify the HTTPS ports. If you follow the link of a BucketFS ID, you can create and configure any number of buckets within this BucketFS. Beside the bucket name, you have to specify read/write passwords and define whether the bucket should be public readable (accessible for everyone).

For detailed steps on how to create a new BucketFS service and create new buckets, see BucketFS Setup.

For information about how to expand script languages (for example, installing additional R packages) or integrate new languages into the script framework using BucketFS, see Adding New Packages to Existing Script Languages.

Usage notes

  • The default BucketFS service has a default bucket with the Java, Python and R script languages preinstalled. For storing larger amounts of user data we recommend that you create a separate BucketFS service on a separate partition.

    For information about how to create additional BucketFS services, see Create New BucketFS Service.

  • The data in BucketFS is replicated locally on every server and automatically synchronized. For performance reasons you should therefore not store very large amounts of data in BucketFS.

  • The data in BucketFS is not part of the database backups and must be backed up manually if needed.

    For information about how to download and upload files in BucketFS, see Manage Buckets and Files in BucketFS.

  • The default BucketFS service does not have a TLS port defined in a new installation of Exasol, you must add a HTTP or HTTPS port. The recommended default port for the BucketFS service is 2580.

    To configure the ports in the default BucketFS service, open EXAoperation and navigate to the Services > EXABuckets > EXABucketFS Services screen, then select the service and click on Edit.