Database Access

For the database, buckets look similar to standard external data sources. Therefore the access control to the database is established by using a database CONNECTION object. This connection contains the path to the bucket and the read password. Additionally, for the bucket to be accessible/visible to the user, you need to grant the connection to the user using the GRANT CONNECTION command.

A public readable bucket is accessible to all users, and not limited to just database users. Therefore, we recommend that the bucket is not public readable, and users can access it with the appropriate read and write passwords.

CREATE CONNECTION my_bucket_access TO 'bucketfs:bfsdefault/bucket1'
  IDENTIFIED BY 'readpw';

GRANT CONNECTION my_bucket_access TO my_user;

Similar to external clients, write access from scripts is only possible through HTTP(s), however, you still need to be careful with the parallelism of script processes.

After you have defined a CONNECTION to the bucket and granted it to a user, you can create a script that lists the files from a local path, as shown in the example below. You can see that the equivalent local path for the previously created bucket bucket1 is /buckets/bfsdefault/bucket1

--/
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT "LS" ("my_path" VARCHAR(100)) EMITS ("FILES" VARCHAR(100)) AS
import os
def run(ctx):
    for line in os.listdir(ctx.my_path):
        ctx.emit(line)
/
 
SELECT ls('/buckets/bfsdefault/bucket1');
FILES
---------------------
file1
tar1
 
SELECT ls('/buckets/bfsdefault/bucket1/tar1/');
FILES
---------------------
a
b

As shown in the example, archives (.zip, .tar, .tar.gz, or .tgz) are always extracted for the script access on the local file system. From outside (via curl), you see the archive while you can locally use the extracted files from within the scripts.

If you store archives (.zip, .tar, .tar.gz, or .tgz) in the BucketFS, both the original files and the extracted files are stored and therefore, need storage space twice.

If you want to work on an archive directly, you can avoid the extraction by renaming the file extension (for example, .zipx instead of .zip).

Once you have access to bucketfs from within the database, you can use the bucketfs in your UDFs. For more information on how use it, refer to Adding New Packages to Existing Script Languages section.