BucketFS

This section describes the BucketFS file system in Exasol.

What is BucketFS?

BucketFS is a synchronous file system that is available on all database nodes in an Exasol clusters. Each node in the cluster can connect to the BucketFS service and will see the same content as the other nodes.

A BucketFS service contains a number of buckets that can store a number of files. Each bucket can have different access privileges. Folders are not supported directly, but if you specify an upload path including folders, these will be created. If all files from a folder are deleted, the folder will automatically be dropped.

Each configured data disk in Exasol has a preinstalled BucketFS service with a default bucket. You can create additional BucketFS services as needed.

Writing data to BucketFS is an atomic operation. There is no lock on files, so the latest write operation will overwrite the file. In contrast to the database itself, BucketFS is a pure file-based system and has no transactional semantic.

When scripts are executed in parallel on the Exasol cluster, sometimes all instances need to access the same external data. Your algorithms could for example use a statistical model, or weather data. For such requirements it is obviously possible to use an external service such as a file server. In terms of performance it may however be better to have such data available locally on the cluster nodes. The Exasol BucketFS is developed for such use cases, where data should be stored synchronously and replicated across the cluster.

For detailed steps on how to create a new BucketFS service and create new buckets, see BucketFS Setup.

For information about how to expand script languages (for example, installing additional R packages) or integrate new languages into the script framework using BucketFS, see Adding New Packages to Existing Script Languages.

Usage notes

  • The default BucketFS service has a default bucket with the Java, Python and R script languages preinstalled. For storing larger amounts of user data we recommend that you create a separate BucketFS service on a separate partition.

    For information about how to create additional BucketFS services, see Create New BucketFS Service.

  • The data in BucketFS is replicated locally on every server and automatically synchronized. For performance reasons you should therefore not store very large amounts of data in BucketFS.

  • The data in BucketFS is not part of the database backups and must be backed up manually if needed.

    For information about how to download and upload files in BucketFS, see Manage Buckets and Files in BucketFS.

  • The default BucketFS service has HTTP deactivated and HTTPS activated on port 2581 by default.

    To change the ports in the BucketFS service, use the ConfD job bucketfs_modify.