Database Access in BucketFS

This section explains how to access the database from BucketFS.

BucketFS buckets are regarded as standard external data sources by the database. Access control to the database is therefore established using a database CONNECTION object. This connection contains the path to the bucket and the read password.

Passwords can be configured when creating a bucket and on existing buckets. For more details, see Create New Bucket.

A publicly readable bucket is accessible to all users that can connect to the cluster, not just database users. We recommend that that you do not make buckets publicly readable, and that you restrict read and write access to buckets using secure passwords.

To make a bucket accessible to the database user, you must also grant the connection to the user using the GRANT CONNECTION command. For example:

CREATE CONNECTION my_bucket_access TO 'bucketfs:bfsdefault/bucket1'
  IDENTIFIED BY 'readpw';

GRANT CONNECTION my_bucket_access TO my_user;

Write access from scripts is only possible over HTTP/HTTPS. However, you still need to be careful with the parallelism of script processes.

When you have defined a connection to the bucket and granted it to a user, you can create a script that lists the files using a local path. In the following example, the equivalent local path for the bucket bucket1 is /buckets/bfsdefault/bucket1.

--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
 
SELECT ls('/buckets/bfsdefault/bucket1');
FILES
---------------------
file1
tar1 
SELECT ls('/buckets/bfsdefault/bucket1/tar1/');
FILES
---------------------
a
b

Archive files (.zip, .tar, .tar.gz, or .tgz) are always extracted to enable scripts to access the contained files on the local file system. When you access the bucket from the outside (using curl) you will only see the archive file, while locally you will use the extracted files.

If you store archive files in BucketFS, both the archive file and the extracted files are stored. This means that storage space is required for the extracted files as well as for the archive files.

If you want to work directly on an archive file, you can prevent it from being extracted by changing the file extension so it is not recognized as an archive (for example, .zipx instead of .zip).

When you have established access to BucketFS from within the database, you can use the files in the buckets in your UDFs. For more information, see Adding New Packages to Existing Script Languages.