BucketFS

The BucketFS file system is a synchronous file system available in Exasol. This section provides you with an overview of BucketFS in Exasol and how to use it.

What is BucketFS?

BucketFS is a synchronous file system (also known as replicated file system) available in the Exasol cluster. Each cluster node can connect to this service through the HTTPS interface and will see exactly the same content.

A BucketFS service contains a number of buckets, and every bucket stores a number of files. Each bucket can have different access privileges. Folders are not supported directly, but if you specify an upload path including folders, these will be created. If all files from a folder are deleted, the folder will be dropped automatically.

Writing data to BucketFS is an atomic operation. There is no lock on files, so the latest write operation will overwrite the file. In contrast to the database itself, BucketFS is a pure file-based system and has no transactional semantic.

When scripts are executed in parallel on the Exasol cluster, there are some use cases where all instances have to access the same external data. Your algorithms could for example use a statistical model or weather data. For such requirements, it is obviously possible to use an external service such as a file server. But in terms of performance, it is quite handy to have such data available locally on the cluster nodes. The Exasol BucketFS file system is developed for such use cases, where data should be stored synchronously and replicated across the cluster.

Watch this video to know how to set up and use BucketFS within UDF scripts.

Setting Up BucketFS and Creating Buckets

You can configure the BucketFS in the EXAoperation user interface. There is a pre-installed default BucketFS service for the configured data disk. If you want to create additional file system services, you need to specify only the data disk and specify the HTTPS ports. If you follow the link of a BucketFS ID, you can create and configure any number of buckets within this BucketFS. Beside the bucket name, you have to specify read/write passwords and define whether the bucket should be public readable (accessible for everyone).

For detailed steps on how to create a new BucketFS service and create new buckets, see BucketFS Setup.

For additional information on BucketFS and how to expand the script languages (for example, installing additional R packages) or even integrate completely new languages into the script framework using BucketFS, refer to the following Adding New Packages to Existing Script Languages section.

Usage Notes

  • BucketFS provides a default bucket that contains pre-installed script languages (Java, Python, R). For storing larger amounts of user data we recommend that you create a separate BucketFS instance on a separate partition.
  • The data in BucketFS is replicated locally on every server and automatically synchronized. Therefore, you should not store very large amounts of data in BucketFS.
  • The data in BucketFS is not part of the database backups and has to be backed up manually if required.
  • In a fresh installation of Exasol, the default BucketFS service does not have a TLS port defined. As an admin, you need to add the HTTP or HTTPS port number. The recommended default port for BucketFS service is 2580.