Loading Data from Google BigQuery

This section provides you with information on how to connect to Google BigQuery to Exasol and then load data. One of the ways to import data is using Simba BigQuery JDBC driver.

To view details about data type mappings or to migrate data, see the BigQuery to Exasol migration scripts in our GitHub repository.

Prerequisites

  • You need a Google Cloud account (http://cloud.google.com).
  • Download Simba BigQuery JDBC driver.
  • You need a BucketFS bucket.
  • Exasol instance with access to the internet.

Procedure

Configuring Google BigQuery to connect with Exasol includes the following steps.

Step 1: Create Service Account

  1. Log in to Google Developers Console, and create a new project or select an existing project.
  2. In the sidebar, select IAM & Admin section and then choose Service accounts.
  3. Click Create Service Account for your project.
  4. Enter the service account details and grant the corresponding BigQuery roles from the Role dropdown list.
  5. Select the checkbox next to Furnish a new private key and ensure JSON is selected as Key type. Save the JSON file to your computer.
  6. Click Create. For addition information, refer to Creating and managing service accounts on Google.

Step 2: Add the JDBC Driver

To add a JDBC driver, you must create a configuration file called settings.cfg. Create the file with the below settings.

DRIVERNAME=BIGQUERY
DRIVERMAIN=com.simba.googlebigquery.jdbc42.Driver
PREFIX=jdbc:bigquery:
NOSECURITY=YES
FETCHSIZE=100000
INSERTSIZE=-1

The BigQuery JDBC driver contains many jar files which must be specified in the file. You can use the below bash command to generate the appropriate JAR entries:

ls /path/to/bigquery/jars -p | grep .jar | grep -v / | sed 's/[^ ]* */JAR=&/g'

Append the output to the end of settings.cfg. The entire settings.cfg file should look similar to the below example.

The list of files may differ depending on the version of the driver.

DRIVERNAME=BIGQUERY
DRIVERMAIN=com.simba.googlebigquery.jdbc42.Driver
PREFIX=jdbc:bigquery:
NOSECURITY=YES
FETCHSIZE=100000
INSERTSIZE=-1
JAR=GoogleBigQueryJDBC42.jar
JAR=animal-sniffer-annotations-1.21.jar
JAR=annotations-4.1.1.4.jar
JAR=api-common-2.1.5.jar
JAR=auto-value-annotations-1.9.jar
JAR=avro-1.11.0.jar
JAR=commons-codec-1.15.jar
JAR=commons-compress-1.21.jar
JAR=commons-logging-1.2.jar
JAR=conscrypt-openjdk-uber-2.5.1.jar
JAR=error_prone_annotations-2.11.0.jar
JAR=failureaccess-1.0.1.jar
JAR=gax-2.13.0.jar
JAR=gax-grpc-2.13.0.jar
JAR=google-api-client-1.33.4.jar
JAR=google-api-services-bigquery-v2-rev20220326-1.32.1.jar
JAR=google-auth-library-credentials-1.6.0.jar
JAR=google-auth-library-oauth2-http-1.6.0.jar
JAR=google-cloud-bigquerystorage-2.12.0.jar
JAR=google-cloud-core-2.5.11.jar
JAR=google-http-client-1.41.5.jar
JAR=google-http-client-apache-v2-1.41.5.jar
JAR=google-http-client-gson-1.41.5.jar
JAR=google-oauth-client-1.33.1.jar
JAR=grpc-alts-1.45.0.jar
JAR=grpc-api-1.45.0.jar
JAR=grpc-auth-1.45.0.jar
JAR=grpc-context-1.45.0.jar
JAR=grpc-core-1.45.0.jar
JAR=grpc-google-cloud-bigquerystorage-v1-2.12.0.jar
JAR=grpc-google-cloud-bigquerystorage-v1beta1-0.136.0.jar
JAR=grpc-google-cloud-bigquerystorage-v1beta2-0.136.0.jar
JAR=grpc-grpclb-1.45.0.jar
JAR=grpc-netty-shaded-1.45.0.jar
JAR=grpc-protobuf-1.45.0.jar
JAR=grpc-protobuf-lite-1.45.0.jar
JAR=grpc-stub-1.45.0.jar
JAR=gson-2.9.0.jar
JAR=guava-31.1-jre.jar
JAR=httpclient-4.5.13.jar
JAR=httpcore-4.4.15.jar
JAR=j2objc-annotations-1.3.jar
JAR=jackson-annotations-2.12.5.jar
JAR=jackson-core-2.12.5.jar
JAR=jackson-databind-2.12.5.jar
JAR=javax.annotation-api-1.3.2.jar
JAR=joda-time-2.10.10.jar
JAR=json-20200518.jar
JAR=jsr305-3.0.2.jar
JAR=listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
JAR=opencensus-api-0.31.0.jar
JAR=opencensus-contrib-http-util-0.31.0.jar
JAR=perfmark-api-0.23.0.jar
JAR=proto-google-cloud-bigquerystorage-v1-2.12.0.jar
JAR=proto-google-cloud-bigquerystorage-v1beta1-0.136.0.jar
JAR=proto-google-cloud-bigquerystorage-v1beta2-0.136.0.jar
JAR=proto-google-common-protos-2.8.0.jar
JAR=proto-google-iam-v1-1.2.10.jar
JAR=protobuf-java-3.19.4.jar
JAR=protobuf-java-util-3.19.4.jar
JAR=slf4j-api-1.7.32.jar
JAR=threetenbp-1.6.0.jar

Follow the procedure described in Add JDBC Drivers to upload the settings.cfg configuration file and the JDBC driver jar file(s).

To upload all files in your current directory to BucketFS, run the following command:

Replace WRITE_PW and DATABASE_NODE_IP with the appropriate values.

find -name '*' -type f -exec curl --insecure -X PUT -T {} https://w:$WRITE_PW@$DATABASE_NODE_IP:2581/default/drivers/jdbc/bigquery/{} \;

The option --insecure or -k tells curl to bypass the TLS certificate check. This option allows you to connect to a HTTPS server that does not have a valid certificate. Only use this option if certificate verification is not possible and you trust the server.

Step 3: Upload JSON Key File to BucketFS.

Upload the saved JSON key file to the same bucket you uploaded the JDBC driver to. For more information, see Manage Buckets and Files in BucketFS.

Step 4: Create Database Connection

Create the database connection through SQL using the following pattern:

CREATE CONNECTION BQ_CON TO 'jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<your-project-id>;OAuthType=0;Timeout=10000;OAuthServiceAcctEmail=<your-service-account>;OAuthPvtKeyPath=/exa/data/bucketfs/<bucketfs-service>/.dest/<bucket-name>/<your-account-keyfile>;Location=<location>[;EnableSession=1]';

The Location parameter defines where data can be stored and processed. The location can be either a region (a specific geographic place, such as Frankfurt) or a multi-region (EU or US). For more information, see BigQuery locations.

The EnableSession=1 property is only required when using Simba BigQuery JDBC Driver version 1.3.0.1001 or later. For more information, see BigQuery: Transaction control statements are supported only in scripts or sessions.

Examples

Connection details:

  • Project ID: exa-migration
  • Service account: migration-test@exa-migration.iam.gserviceaccount.com
  • BucketFS service: bfsdefault
  • Bucket name: default
  • JSON key-file: /drivers/jdbc/bigquery/my-key.json
  • Location: EU (multi-region)

The resulting connection statement:

CREATE CONNECTION BQ_CON TO 'jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=exa-migration;OAuthType=0;Timeout=10000;OAuthServiceAcctEmail=migration-test@exa-migration.iam.gserviceaccount.com;OAuthPvtKeyPath=/exa/data/bucketfs/bfsdefault/.dest/default/drivers/jdbc/bigquery/my-key.json;Location=EU;EnableSession=1';

You can test your connection using this connection:

IMPORT FROM JDBC AT BQ_CON STATEMENT 'SELECT 1';

Load Data

You can use the IMPORT statement to load data using the connection you created above. IMPORT supports loading data from a table or a SQL statement.