Importing Parquet files

Learn how to load data from Parquet files stored locally, on remote file servers, or in the cloud.

You can use the IMPORT command to load data from Parquet files that are stored on cloud storage services, on remote file servers using FTP/FTPS, SFTP, and HTTP/HTTPS, and on local systems. Exasol supports the Apache Iceberg open table format in Parquet files.

Multiple Parquet files can be imported in parallel across all cluster nodes, enabling higher throughput for larger datasets. Single Parquet files can be read using multiple parallel operations for faster and more efficient data read operations.

In Exasol 2026.1 and later you can use glob patterns to import multiple Parquet files dynamically and recursively from S3-compatible storage. To learn more, see Load files using pattern matching (S3).

This article describes how to import Parquet files from cloud storage. To learn how to import Parquet files stored on local or remote systems, see Load data from local or remote files.

All Parquet import features are fully supported in Exasol 2026.1.0 and later, with partial support in earlier versions. For more details about which features are supported in your Exasol version, see the Release notes.

Connect to storage

This article describes the available features and considerations when importing Parquet files. To learn how to connect to your storage service or local storage, see the following articles:

Supported data types

The following Parquet data types are supported:

INT8, INT16, INT32, INT64, UINT8, UINT16, UINT32, UINT64, FLOAT, DOUBLE

Additional supported data types:

Decimal128, String, LargeString, Date32, Date64, Bool, Timestamp

Source data tracking

Additional SQL options enable row-origin tracking by appending SHA256 file hash and source row number as metadata during the import.

The target table must contain suitable columns to store the metadata.

SQL option Description Column type
SOURCE FILE HASH_SHA256 Allows secure mapping of rows to their respective file origins.

CHAR(64)

VARCHAR(n) with n>=64

HASHTYPE (256 BIT)

SOURCE ROW NUMBER Captures the position of the row in the source file (0-based index). DECIMAL(p,s)

Column mapping

This feature is available in Exasol 2025.1.5 and later.

You can specify source column names by using WITH SOURCE COLUMN NAMES=() in the IMPORT statement. This allows you to explicitly map Parquet columns to columns in the target table.

Usage notes

  • Column names are case sensitive. If there is a case mismatch between the source/target column names, the target column defaults to NULL.

  • The columns in the source must be in the exact same order as the target columns.

  • Duplicate column names in the Parquet file are not allowed. The query will abort with error ETL-2231.

  • Columns in the source that are not mapped to a target column are ignored.

  • If a source column is missing, the corresponding target column defaults to NULL.

SkipCols cannot be used together with the SOURCE COLUMN NAMES option. If both are specified in the same query, the import operation fails with the exception ETL-2230.

Type compatibility

Automatic casting is done for compatible Parquet data types. For example, INT32 → INT64. Unsupported types will result in a runtime exception.

Target column type

Supported Parquet types

Unsupported types

Integer / Decimal

int8, int16, int32, int64, uint8, uint16, uint32, uint64, decimal128, boolean

float, double, string, timestamp, etc.

Double

int8, int16, int32, int64, uint8, uint16, uint32, uint64, float, double

decimal128, string, etc.

(Varchar, Char, Hash)

string, large_string

numeric, binary, bool, timestamp

Date

date32, date64

string, numeric, timestamp, etc.

Boolean

boolean

numeric, string, date, timestamp

Timestamp

timestamp, date32, date64

numeric, string, bool

Usage

Scenario 1: Column mapping for diverse schemas

When working with datasets where source file schemas vary:

  • Map target table columns to specific Parquet columns using the WITH SOURCE COLUMN NAMES=() option.

  • Unmapped Parquet columns are ignored.

Scenario 2: Strict import schema enforcement

  • Ensure that all target table columns have corresponding mapped source columns for the import operation to succeed.

Best practices

  • Use the WITH SOURCE COLUMN NAMES=() option when loading from schema-diverse Parquet datasets to enforce mapping consistency.

  • Make sure that there is no case mismatch between the column names in the Parquet file and the target database table to avoid unexpected NULL values.

For an example of how to use column mapping, see Example 3 - Use explicit column mapping.

Column skipping

The SkipCols configuration parameter enables selective exclusion of columns from the imported Parquet data based on their index positions (0-based). This provides precise control over which columns to include or exclude while importing data. Skipped columns are defined as part of the connection string using ranges or discrete column indices.

SkipCols cannot be used together with the SOURCE COLUMN NAMES option. If both are specified in the same query, the import operation fails with the exception ETL-2230.

For more information and examples of use, see Configuration parameters and Examples.

Load files using pattern matching (S3)

This feature is available in Exasol 2026.1 and later.

Pattern matching is only supported when importing Parquet files from S3 storage.

The FILE parameter in IMPORT FROM PARQUET supports glob patterns when accessing data from S3-compatible storage systems. This allows importing multiple files without specifying each file explicitly. Glob patterns are interpreted in the file path and enable flexible selection of files across directories and subdirectories.

Supported patterns

Pattern

Description

*

Matches zero or more characters

?

Matches exactly one character

[abc]

Matches one character from the specified set

[!abc] or [^abc]

Matches one character not in the specified set

/**/

Matches directories recursively

The characters *, ?, [, ] are interpreted as glob pattern operators if they are used in the file path. To match them literally, escape them using a backslash /

Examples

Import all Parquet files from the bucket root:
Copy
IMPORT INTO my_table
FROM PARQUET
AT 's3://bucket/'
USER '...'
IDENTIFIED BY '...'
FILE '*.parquet';
Import recursively from a directory and its subdirectories:
Copy
...
FILE 'data/**/*.parquet';
Import files from a directory (non-recursive):
Copy
...
FILE 'data/*';
Match single character:
Copy
...
FILE 'data/file?.parquet';
Match a character set:
Copy
...
FILE 'data/file[abc].parquet';
Exclude characters:
Copy
...
FILE 'data/file[!abc].parquet';

Usage notes

  • Glob patterns are supported only in the FILE clause.

  • The bucket name in the AT clause must be specified explicitly and does not support glob patterns.

  • Glob patterns are supported only for S3-compatible storage systems.

  • Recursive matching is only enabled when using the exact syntax /**/. For example:

    • data/**/*.parquet → recursive

    • data**/*.parquet → not recursive

Error handling

  • If no files match the specified pattern, the query fails.

  • If any matched file is not a valid Parquet file or has an incompatible schema, the entire import fails.

  • If multiple patterns are specified, each pattern must match at least one file.

  • Glob patterns are not allowed in the connection definition (AT clause).

  • Using glob patterns with unsupported storage systems results in an error.

Permissions

The user must have permission to use the s3:ListBucket action in AWS. Without this permission, the import fails. For more information, refer to the Amazon AWS documentation.

Configuration parameters

Configuration parameters can be set to be query-specific using the connection string, and as global defaults using database parameters. If a parameter is set in the connection string, it will override the default database parameter.

Usage notes

  • Connection string parameters are case-insensitive.

  • Connection string parameters override database parameter values when both are provided.

  • Default values are sourced from the database parameters if not expressly set.

Parameter Description Validation
MaxConnections

Number of Parquet files imported in parallel on a single node.

Must be > 0
MaxConcurrentReads

Number of parallel read operations per file.

Set to 1 for sequential reads.

0 is not allowed.

Set to 1 when importing local files.

Must be > 0
MaxBatchFetchSize Maximum simultaneous buffers fetched during import. Must be > 0
MaxRows Maximum number of rows to read in a batch. Must be > 0
MaxRetry Maximum number of attempts to connect. Must be ≥ 0
SkipCols

Skip specific column indexes during import.

Example: SkipCols=1,3..8,11 excludes columns 1, 3, 4, 5, 6, 7, 8, and 11. All other columns remain part of the imported dataset.

Only integers (0to 9), commas (,) to separate values, and ranges with two dots (..) are allowed.

A range must have exactly two integers, one before and one after the two dots (..). For example, 3..5 is valid, but 3.. or ..5 are invalid.

The starting value must be less than or equal to the ending value. For example, 3..5 is valid, but 5..3 is invalid.

AWS S3 and S3-compatible storage only
DebugMode Enable AWS structured logging. Must be ≥ 0
EndPoint URL when connecting to S3-compatible storage service (not AWS) -
UseProxy

Boolean, specifies if the proxy specified in the -etlProxy database parameter should be used when connecting to an S3-compatible storage service (not AWS)

1 (default) = use proxy if one is specified

0 = do not use a proxy

-

Log levels (AWS)

The log level determines the granularity and amount of information logged by the system. Select a log level to tailor the logging output based on the use case, such as error debugging, performance monitoring, or high-level event tracking.

To enable structured logging for AWS, the database parameters -etlLogLevel and -traceLevel must both be set to at least 2.

Value Log level Description

0

OFF (default)

Disables logging completely. No log messages are written.

1

FATAL

Logs only critical errors that cause the application or service to terminate.

2

ERROR

Logs errors that occur during execution but do not stop the application.

3

WARN

Logs events that are potentially harmful to the process but are recoverable.

4

INFO

Logs informational messages about the execution flow, such as successful initializations and key events.

5

DEBUG

Logs detailed debug information, typically for development or troubleshooting purposes.

≥ 6

TRACE

Logs highly detailed information, useful for deep debugging and issue resolution.

Schema inference

This feature is supported in Exasol 2026.1 and later.

The schema inference feature allows you to import Parquet files without first creating a target table in the database. The loader will automatically create the table if it does not exist. The syntax is:

CREATE OR REPLACE TABLE <table_name> AS (IMPORT FROM PARQUET <standard parquet import statement options>)

If the target table already exists, an exception is thrown and the import job is aborted.

Column mapping is not supported with schema inference.

Data type mapping

The data type of the target table is derived from the Parquet data type using the following mapping:

Parquet data type Exasol data type

PARQUET_BOOL

BOOLEAN

PARQUET_INT8

DECIMAL(3,0)

PARQUET_INT16

DECIMAL(5,0)

PARQUET_INT32

DECIMAL(10,0)

PARQUET_INT64

DECIMAL(20,0)

PARQUET_UINT8

DECIMAL(3,0)

PARQUET_UINT16

DECIMAL(5,0)

PARQUET_UINT32

DECIMAL(10,0)

PARQUET_UINT64

DECIMAL(20,0)

PARQUET_DECIMAL128

DECIMAL(p,s)

Precision and scale (p,s) are set according to the Parquet file metadata.

p and s must be ≤ 36

PARQUET_FLOAT

DOUBLE PRECISION

PARQUET_DOUBLE

DOUBLE PRECISION

PARQUET_STRING

VARCHAR(n)

n (VARCHAR length) must be ≤ 2,000,000

PARQUET_LARGE_STRING

VARCHAR(n)

n (VARCHAR length) must be ≤ 2,000,000

PARQUET_DATE32

DATE

PARQUET_DATE64

DATE

PARQUET_TIMESTAMP

TIMESTAMP(p)

Example:
Copy
CREATE OR REPLACE TABLE my_schema.my_table 
    AS (
    IMPORT FROM PARQUET AT 'my_s3_bucket.s3.eu-west-1.amazonaws.com;MaxConnections=2;MaxBatchFetchSize=4;SkipCols=1,3..8,11'' 
    USER 'my_user' 
    IDENTIFIED BY 'my_secret_key' FILE 'dup2.parquet'
    )
;

Examples

Exaple 1: Create a connection object
Copy
-- Amazon S3 connection
CREATE OR REPLACE CONNECTION my_conn
    TO 'https://my_s3_bucket.s3.eu-west-1.amazonaws.com'
    USER 'my_access_key' IDENTIFIED BY 'my_secret_key'
;

-- S3-compatible storage
CREATE OR REPLACE CONNECTION my_conn
    TO 's3://my_bucket;EndPoint=https://example.com:9000'
    USER 'my_user' IDENTIFIED BY 'my_secret'
;

-- Azure Blob Storage
CREATE CONNECTION my_conn
    TO 'DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net'
    USER 'my_account_name' IDENTIFIED BY 'my_account_key'
;
Example 2 - Create schema and table with file tracking

In this example we connect using a stored connection (see examples above), create a new schema and table, then load data from two Parquet files, using SQL options to store source metadata for row-origin tracking.

Copy
CREATE SCHEMA IF NOT EXISTS MY_SCHEMA;

CREATE TABLE MY_SCHEMA.MY_TABLE (sourcefile HASHTYPE (256 BIT), c1 INT, rownumber INT);

IMPORT INTO MY_SCHEMA.MY_TABLE
    FROM PARQUET AT my_conn
    FILE 'nested/path/to/test_1.parquet' FILE 'nested/path/to/test_2.parquet' 
    WITH 
      SOURCE FILE HASH_SHA256 = sourcefile
      SOURCE ROW NUMBER = rownumber 
;
Example 3 - Use explicit column mapping
Copy
CREATE SCHEMA IF NOT EXISTS MY_SCHEMA;

CREATE TABLE MY_SCHEMA.MY_TABLE (
  c1 INT,
  name VARCHAR(256),
  mapped_column VARCHAR(256),
  sourcefile HASHTYPE (256 BIT),
  rownumber INT
);

IMPORT INTO MY_SCHEMA.MY_TABLE
  FROM PARQUET AT my_s3_bucket
  FILE 'file_1.parquet'
  FILE 'file_2.parquet'
  WITH
    SOURCE COLUMN NAMES = ('col1', 'col2', 'renamed_col')
    SOURCE FILE HASH_SHA256 = sourcefile
    SOURCE ROW NUMBER = rownumber
;
Example 4 - Use configuration parameters in connection string

The following examples show a connection to AWS. The procedure is identical for all storage types.

Copy
-- Limit the number of files to import concurrently to 2
-- Increase the number of buffers to fetch during the import to 4
-- Exclude columns 1, 3, 4, 5, 6, 7, 8, and 11 from the imported dataset 
CREATE OR REPLACE CONNECTION my_conn
    TO 'https://my_s3_bucket.s3.eu-west-1.amazonaws.com;MaxConnections=2;MaxBatchFetchSize=4;SkipCols=1,3..8,11'
    USER 'my_access_key' IDENTIFIED BY 'my_secret_key'
;

-- Set a dynamic batch size
CREATE OR REPLACE CONNECTION my_conn 
    TO 'https://my_s3_bucket.s3.eu-west-1.amazonaws.com/;MaxRows=2048' 
    USER 'my_access_key' IDENTIFIED BY 'my_secret_key'
;