Profiling

This article explains how to use profiling in Exasol to analyze and improve performance.

Exasol omits complex tuning mechanisms such as execution hints, creation of different index types, calculation of table statistics, etc. Queries in Exasol are instead analyzed by the query optimizer, and the corresponding tuning actions are automatically executed. This is a key factor in the high performance of an Exasol database.

However, there are scenarios where you want to know how much time certain parts of a query take to execute in order to analyze long-running queries and rewrite them if needed. In Exasol, you can use the profiling feature to analyze queries. When profiling is enabled, information about a query is gathered during execution and can be retrieved from the system tables.

Enable profiling

By default, profiling is only activated for the running queries and provides information through the system tables EXA_DBA_PROFILE_RUNNING and EXA_USER_PROFILE_RUNNING.

You can also enable profiling for a session or system wide using the statements ALTER SESSION or ALTER SYSTEM. When profiling is enabled session or system wide, the corresponding profiling data is gathered during the execution of queries and is collected in the system tables EXA_USER_PROFILE_LAST_DAY and EXA_DBA_PROFILE_LAST_DAY.

For example, to enable profiling for queries in the current session, use the following statement:

ALTER SESSION SET PROFILE='ON';

Enabling profiling on the system level may impact performance. We recommend that you only enable profiling for individual sessions, or enable it only temporarily on the system level for troubleshooting.

Data is committed to the system tables periodically. This means that there is usually a delay after executing a query until you can access the profile in the system tables. If you want to analyze the profile directly after executing a query, use the command FLUSH STATISTICS to force the commit of statistical data.

What’s in a profile

A profile includes information about the execution time (DURATION), CPU usage (CPU), memory usage (TEMP_DB_RAM_PEAK and PERSISTENT_DB_RAM_PEAK), and network usage (NET) for the different parts of a query execution.

The profile also shows the number of processed rows (OBJECT_ROWS) and the number of resulting rows (OUT_ROWS), plus any additional information about each execution part (REMARKS).

The following tables describe the columns in a profile and the different execution parts.

Profile columns

Column Description

SESSION_ID

Unique identifier of the session.

STMT_ID

ID of the statement within a session.

COMMAND_NAME

Name of the statement.

For example: SELECT, COMMIT, or MERGE.

COMMAND_CLASS The class of the SQL command/statement (DDL, DML, DCL, DQL)

PART_ID

ID of the execution part within the statement.

PART_NAME

Name of the execution part.

For more details, see Execution parts.

PART_INFO

Extended information about the execution part:

GLOBAL = global action (for example, a global join)

EXPRESSION INDEX = non-persistent join index which is built on an expression

NL JOIN = nested loop join (cross product)

REPLICATED = replicated object (for example, small tables)

TEMPORARY = temporary object (for intermediate results)

IN SNAPSHOT MODE = one or more objects were accessed in snapshot mode

NOTE: some execution parts do not contain any information in this column.

OBJECT_SCHEMA

Schema of the processed object.

OBJECT_NAME

Name of the processed object.

OBJECT_ROWS

Number of rows of the processed object.

OUT_ROWS

Number of result rows of the execution part.

DURATION

Duration of the execution part in seconds.

CPU

Average CPU utilization in percent of the execution part over the duration of the execution.

TEMP_DB_RAM_PEAK

Maximum usage of temporary DB memory in MiB of the execution part, measured cluster wide.

PERSISTENT_DB_RAM_PEAK

Maximum usage of modified, persistent DB memory in MiB of the execution part, measured cluster wide.

HDD_READ

Average hard disk read ratio in MiB per second over the duration of the execution, measured per node.

A value < 0 indicates that data had to be loaded into the main memory.

HDD_WRITE

Average hard disk write ratio in MiB per second over the duration of the execution, measured per node.

This column only reflects the data written during a commit. For other statements its value is NULL.

NET

Average network traffic ratio in MiB per second over the duration of the execution (sum of send/receive per node).

REMARKS

Additional information about the execution part such as the index type (global or local) and which index is used for a join. For example:

LOCAL INDEX (SALES_ID)

SALES_POSITIONS(SALES_ID) => LOCAL INDEX (SALES_ID)

For analytic functions, this column provides information about the number of partition key expressions (#Partitionkeys), order key combinations for the same partition keys (#Orderings), and analytical expressions (#Functions) used. For example:

#Partitionkeys=2, #Orderings=1, #Functions=2

SQL_TEXT

The complete statement text of the corresponding query.

Execution parts

PART_NAME Description

ANALYTIC FUNCTION

Computation of analytic functions (without sorting of data).

Includes one separate ANALYTIC FUNCTION entry for each distinct partition by and/or order by clause.

COLUMN STATISTICS

Computation of statistics on one or more columns, required for internal database function.

COMMIT

Commit of the transaction, persistent write to disk.

COMPILE/EXECUTE

Compilation and execution of the statement (including query optimization).

CONNECT BY

Computation of hierarchical queries.

CONSTRAINT CHECK

Checking constraints (primary/foreign key, NOT NULL).

CREATE UNION

Under certain circumstances, the optimizer can create a combined table out of several tables connected by UNION ALL and process it much faster.

DELETE

Deleting data (DELETE, TRUNCATE or MERGE).

DISTRIBUTE / PARTITION

Distribution and/or partition of data.

EXISTS

EXISTS or IN computation.

EXPORT

Execution of the EXPORT command.

FULL JOIN

Full outer join to a table.

GROUP BY

Calculation of the GROUP BY aggregation.

GROUPING SETS

Calculation of the GROUPING SETS aggregation.

IMPORT

Execution of the IMPORT command.

INDEX CREATE

Creation of internal indexes. For more information, see Indexes.

INDEX INSERT

Update on internal indexes. For more information, see Indexes.

INDEX REBUILD

Recreation of internal indexes. For more information, see Indexes.

INSERT

Inserting data in a table (INSERT, MERGE, or IMPORT), in temporary data blocks, or in a result set.

JOIN

Join to a table.

OUTER JOIN

Outer Join to a table.

PREFERENCE PROCESSING

Processing of Skyline queries. For more information, see Skyline.

PUSHDOWN

Pushdown SQL statement generated by the adapter for queries on virtual objects.

QUERY CACHE RESULT

Accessing the Query Cache.

RECOMPRESS

Recompressing data.

REPLICATE

Cluster-wide replication of data. For example, replicating small tables to avoid global joins.

ROLLBACK

Rollback of the transaction.

SCAN

Scan of a table. The OBJECT_SCHEMA and OBJECT_NAME columns show which tables were scanned.

SORT

Sorting the data (ORDER BY, also in case of analytic functions).

SYSTEM TABLE

Collecting the data for a system table.

UNION TABLE

This part is created for each individual UNION ALL optimized table.

UPDATE

Update of data (UPDATE or MERGE).

WAIT FOR COMMIT

Waiting until another transaction finishes.

Examples

Example 1
-- switch on profiling
ALTER SESSION SET PROFILE='ON';

-- run query
SELECT YEAR(o_orderdate) AS "YEAR", COUNT(*)
FROM orders
GROUP BY YEAR(o_orderdate)
ORDER BY 1 LIMIT 5;
            
YEAR COUNT(*)
1992 227089
1993 226645
1994 227597
1995 228637
1996 228626
-- switch off profiling again
ALTER SESSION SET PROFILE='OFF';

-- update statistics
FLUSH STATISTICS;

-- read profiling information
SELECT part_id, part_name, object_name, object_rows, out_rows, duration
    FROM exa_user_profile_last_day
    WHERE CURRENT_STATEMENT-5 = stmt_id 
    AND CURRENT_SESSION=session_id;
  PART_NAME OBJECT_NAME OBJECT_ROWS OUT_ROWS DURATION
1 COMPILE / EXECUTE       0.020672
2 SCAN ORDERS 1500000 1500000 0.247681
3 GROUP BY tmp_subselect0 0 7 0.020913
4 SORT tmp_subselect0 7 5 0.006341
Example 2

The following example

-- enable profiling for the session
ALTER SESSION SET PROFILE='ON';

-- run the query
SELECT COUNT(*
    FROM sales s JOIN sales_positions sp 
    ON s.sales_id=sp.sales_id;

-- disable profiling again
ALTER SESSION SET PROFILE='OFF';

-- update statistics
FLUSH STATISTICS;

-- get statement info
SELECT DISTINCT STMT_ID,SQL_TEXT
    FROM EXA_USER_PROFILE_LAST_DAY
    WHERE SESSION_ID=CURRENT_SESSION
    ORDER BY STMT_ID DESC;
STMT_ID SQL_TEXT
12 SELECT COUNT(*) FROM sales s JOIN sales_positions sp ON s.sales_id=sp.sales_id;
11 ALTER SESSION SET PROFILE='ON'
-- get profile information
SELECT
PART_ID,PART_NAME,PART_INFO,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION,
CPU,TEMP_DB_RAM_PEAK,HDD_READ,HDD_WRITE,NET,REMARKS
    FROM exa_user_profile_last_day
    WHERE STMT_ID=12
    AND SESSION_ID=current_session;
PART_ID PART_NAME PART_INFO OBJECT_NAME OBJECT_ROWS OUT_ROWS DURATION CPU TEMP_DB_RAM_PEAK HDD_READ HDD_WRITE NET REMARKS
1 COMPILE / EXECUTE (null) (null) (null) (null) 3.647 95.9 37.5 0.0 (null) 0.2 (null)
4 INDEX CREATE (null) SALES_POSITIONS 455140507 455140507 21.965 78.5 2130.0 0.1 (null) 0.0 LOCAL INDEX (SALES_ID)
5 SCAN (null) SALES 43611378 43611378 0.013 99.3 36.0 0.0 (null) 0.0 (null)
6 JOIN (null) SALES_POSITIONS 455140507 455140507 1.583 (null) (null) (null) (null) (null) SALES(SALES_ID) => LOCAL INDEX (SALES_ID)
7 GROUP BY GLOBAL on TEMPORARY table tmp_subselect0 0 1 0.076 97.3 37.5 0.0 (null) 0.0 (null)

This profile shows that the two tables were joined for the first time - hence the INDEX CREATE part.

SALES_POSITIONS is indexed, while SALES becomes the root table for the following join. This is a local join because the two tables are distributed on the join column SALES_ID.