Load data from Apache Hive

This section explains how to connect and load data from Apache Hive using the Cloudera Hive JDBC driver.

Prerequisites

  • Hive must be reachable from the Exasol system.

  • The user credentials in the connection must be valid.

Download driver

Download the compatible Cloudera Hive JDBC driver.

If you want to use another driver, contact Support.

Add JDBC driver

To add the JDBC driver, create a configuration file called settings.cfg with the following settings:

DRIVERNAME=HiveCloudera
PREFIX=jdbc:hive2:
FETCHSIZE=100000
INSERTSIZE=-1

Follow the procedure described in Add JDBC Driver to upload the settings.cfg configuration and JDBC driver files to Exasol.

With some driver versions you may receive an error message indicating file permission issues. In this case you may have to disable the security manager by adding the line NOSECURITY=YES to the settings.cfg file.

Create connection

To create the connection, use the following statements. Replace the connection string and credentials as needed.

-- Connecting EXAloader via Cloudera Hive driver to Cloudera, 
-- MapR, and Hortonworks Hadoop distributions

-- cloudera-quickstart-vm-5.13.0-0-vmware
create or replace connection hive_conn to 'jdbc:hive2://192.168.42.133:10000' user 
'cloudera' identified by 'cloudera';

-- MapR-Sandbox-For-Hadoop-6.1.0-vmware
create or replace connection hive_conn to 'jdbc:hive2://192.168.42.134:10000' user 
'mapr' identified by 'mapr';

-- Azure Hortonworks Sandbox with HDP 2.6.4
create or replace connection hive_conn to 'jdbc:hive2://192.168.42.1:10000' user 
'raj_ops' identified by 'raj_ops';

To test the connection, use the following statements:

-- EXPORT test for Cloudera driver
export exa_syscat
    into jdbc driver = 'HiveCloudera' at hive_conn table exa_syscat created by
    'create table exa_syscat (schema_name varchar(128), object_name varchar(128),
 object_type varchar(15), object_comment varchar(2000))'
    replace;

-- IMPORT test for Cloudera driver
import into(schema_name varchar(128), object_name varchar(128), object_type varchar(15), 
object_comment varchar(2000))
    from jdbc driver = 'HiveCloudera' at hive_conn table exa_syscat;

Load data

You can use the IMPORT statement to load data using the connection you created above. IMPORT supports loading data from a table or a SQL statement.

The IMPORT statement allows you to use Kerberos connections in your connection string. For more details, see the example in connection_def.