BucketFS

The BucketFS file system is a synchronous file system available in the Exasol cluster. This section provides you with an overview of BucketFS in Exasol and how to use it.

What is BucketFS?

The BucketFS file system is a synchronous file system (also known as replicated file system) available in the Exasol cluster. Each cluster node can connect to this service through the HTTP interface and will see exactly the same content.

The data is replicated locally on every server and automatically synchronized. Therefore, you shouldn't store large amounts of data there.

The data in BucketFS is not part of the database backups and has to be backed up manually if required.

One BucketFS service contains number of so-called buckets, and every bucket stores number of files. Each bucket can have different access privileges. Folders are not supported directly, but if you specify an upload path including folders, these will be created. If all files from a folder are deleted, the folder will be dropped automatically.

Writing data is an atomic operation. There is no lock on files, so the latest write operation will overwrite the file. In contrast to the database itself, BucketFS is a pure file-based system and has no transactional semantic.

When scripts are executed in parallel on the Exasol cluster, there are some use cases where all instances have to access the same external data. Your algorithms could, for example, use a statistical model or weather data. For such requirements, it's obviously possible to use an external service (for example, a file server). But in terms of performance, it is quite handy to have such data available locally on the cluster nodes. The Exasol BucketFS file system is developed for such use cases, where data should be stored synchronously and replicated across the cluster.

Watch this video to know how to setup and use BucketFS within UDF scripts.

Setting Up BucketFS and Creating Buckets

You can configure the BucketFS in the EXAoperation user interface. There is a pre-installed default BucketFS service for the configured data disk. If you want to create additional file system services, you need to specify only the data disk and specify ports for HTTP(s). If you follow the link of a BucketFS Id, you can create and configure any number of buckets within this BucketFS. Beside the bucket name, you have to specify read/write passwords and define whether the bucket should be public readable (accessible for everyone).

By default, there is a bucket in the default BucketFS that contains the pre-installed script languages (Java, Python, R). However, for storing larger user data it is recommended to create a separate BucketFS instance on a separate partition.

In a fresh installation of Exasol, the default BucketFS service does not have a TLS port defined. As an admin, you need to add the HTTP or HTTPS port number. The recommended default port for BucketFS service is 2580.

For detailed steps on how to create a new BucketFS service and how to create new buckets in a BucketFS service, refer to the BucketFS Setup section.

Next Steps

To know how to setup BucketFS, refer to the BucketFS Setup section. Once the BucketFS is setup, refer to the Access Control section to know how to access BucketFS.

See Also:

For additional information on BucketFS and how to expand the script languages (for example, installing additional R packages) or even integrate completely new languages into the script framework using BucketFS, refer to the following Adding New Packages to Existing Script Languages section.