Profiling

This article explains how to use profiling in Exasol to analyze and improve performance.

Exasol omits complex tuning mechanisms such as execution hints, creation of different index types, calculation of table statistics, etc. Queries in Exasol are instead analyzed by the query optimizer, and the corresponding tuning actions are automatically executed. This is a key factor in the high performance of an Exasol database.

However, there are scenarios where you want to know how much time certain parts of a query take to execute in order to analyze long-running queries and rewrite them if needed. In Exasol, you can use the profiling feature to analyze queries. When profiling is enabled, information about a query is gathered during execution and can be retrieved from the system tables.

Enable profiling

By default, profiling is only activated for the running queries and provides information through the system tables EXA_DBA_PROFILE_RUNNING and EXA_USER_PROFILE_RUNNING.

You can also enable profiling for a session or system wide using the statements ALTER SESSION or ALTER SYSTEM. When profiling is enabled session or system wide, the corresponding profiling data is gathered during the execution of queries and is collected in the system tables EXA_USER_PROFILE_LAST_DAY and EXA_DBA_PROFILE_LAST_DAY.

For example, to enable profiling for queries in the current session, use the following statement:

Copy

ALTER SESSION SET PROFILE='ON';

Enabling profiling on the system level may impact performance. We recommend that you only enable profiling for individual sessions, or enable it only temporarily on the system level for troubleshooting.

Data is committed to the system tables periodically. This means that there is usually a delay after executing a query until you can access the profile in the system tables. If you want to analyze the profile directly after executing a query, use the command FLUSH STATISTICS to force the commit of statistical data.

What’s in a profile

A profile includes information about the execution time (DURATION), CPU usage (CPU), memory usage (TEMP_DB_RAM_PEAK and PERSISTENT_DB_RAM_PEAK), and network usage (NET) for the different parts of a query execution.

The profile also shows the number of processed rows (OBJECT_ROWS) and the number of resulting rows (OUT_ROWS), plus any additional information about each execution part (REMARKS).

The following tables describe the columns in a profile and the different execution parts.

Profile columns

Column	Description
`SESSION_ID`	Unique identifier of the session.
`STMT_ID`	ID of the statement within a session.
`COMMAND_NAME`	Name of the statement. For example: SELECT, COMMIT, or MERGE.
`COMMAND_CLASS`	The class of the SQL command/statement (DDL, DML, DCL, DQL)
`PART_ID`	ID of the execution part within the statement.
`PART_NAME`	Name of the execution part. For more details, see Execution parts.
`PART_INFO`	Extended information about the execution part: GLOBAL = global action (for example, a global join) EXPRESSION INDEX = non-persistent join index which is built on an expression NL JOIN = nested loop join (cross product) REPLICATED = replicated object (for example, small tables) TEMPORARY = temporary object (for intermediate results) IN SNAPSHOT MODE = one or more objects were accessed in snapshot mode NOTE: some execution parts do not contain any information in this column.
`OBJECT_SCHEMA`	Schema of the processed object.
`OBJECT_NAME`	Name of the processed object.
`OBJECT_ROWS`	Number of rows of the processed object.
`OUT_ROWS`	Number of result rows of the execution part.
`DURATION`	Duration of the execution part in seconds.
`CPU`	Average CPU utilization in percent of the execution part over the duration of the execution.
`TEMP_DB_RAM_PEAK`	Maximum usage of temporary DB memory in MiB of the execution part, measured cluster wide.
`PERSISTENT_DB_RAM_PEAK`	Maximum usage of modified, persistent DB memory in MiB of the execution part, measured cluster wide.
`HDD_READ`	Average hard disk read ratio in MiB per second over the duration of the execution, measured per node. A value < 0 indicates that data had to be loaded into the main memory.
`HDD_WRITE`	Average hard disk write ratio in MiB per second over the duration of the execution, measured per node. This column only reflects the data written during a commit. For other statements its value is `NULL`.
`NET`	Average network traffic ratio in MiB per second over the duration of the execution (sum of send/receive per node).
`REMARKS`	Additional information about the execution part such as the index type (global or local) and which index is used for a join. For example: `LOCAL INDEX (SALES_ID)` `SALES_POSITIONS(SALES_ID) => LOCAL INDEX (SALES_ID)` For analytic functions, this column provides information about the number of partition key expressions (#Partitionkeys), order key combinations for the same partition keys (#Orderings), and analytical expressions (#Functions) used. For example: `#Partitionkeys=2, #Orderings=1, #Functions=2`
`SQL_TEXT`	The complete statement text of the corresponding query.

Execution parts

PART_NAME	Description
ANALYTIC FUNCTION	Computation of analytic functions (without sorting of data). Includes one separate `ANALYTIC FUNCTION` entry for each distinct partition by and/or order by clause.
COLUMN STATISTICS	Computation of statistics on one or more columns, required for internal database function.
COMMIT	Commit of the transaction, persistent write to disk.
COMPILE/EXECUTE	Compilation and execution of the statement (including query optimization).
CONNECT BY	Computation of hierarchical queries.
CONSTRAINT CHECK	Checking constraints (primary/foreign key, NOT NULL).
CREATE UNION	Under certain circumstances, the optimizer can create a combined table out of several tables connected by UNION ALL and process it much faster.
DELETE	Deleting data (DELETE, TRUNCATE or MERGE).
DISTRIBUTE / PARTITION	Distribution and/or partition of data.
EXISTS	EXISTS or IN computation.
EXPORT	Execution of the EXPORT command.
FULL JOIN	Full outer join to a table.
GROUP BY	Calculation of the GROUP BY aggregation.
GROUPING SETS	Calculation of the GROUPING SETS aggregation.
IMPORT	Execution of the IMPORT command.
INDEX CREATE	Creation of internal indexes. For more information, see Indexes.
INDEX INSERT	Update on internal indexes. For more information, see Indexes.
INDEX REBUILD	Recreation of internal indexes. For more information, see Indexes.
INSERT	Inserting data in a table (INSERT, MERGE, or IMPORT), in temporary data blocks, or in a result set.
JOIN	Join to a table.
OUTER JOIN	Outer Join to a table.
PREFERENCE PROCESSING	Processing of Skyline queries. For more information, see Skyline.
PUSHDOWN	Pushdown SQL statement generated by the adapter for queries on virtual objects.
QUERY CACHE RESULT	Accessing the Query Cache.
RECOMPRESS	Recompressing data.
REPLICATE	Cluster-wide replication of data. For example, replicating small tables to avoid global joins.
ROLLBACK	Rollback of the transaction.
SCAN	Scan of a table. The `OBJECT_SCHEMA` and `OBJECT_NAME` columns show which tables were scanned.
SORT	Sorting the data (ORDER BY, also in case of analytic functions).
SYSTEM TABLE	Collecting the data for a system table.
UNION TABLE	This part is created for each individual UNION ALL optimized table.
UPDATE	Update of data (UPDATE or MERGE).
WAIT FOR COMMIT	Waiting until another transaction finishes.

Examples

Example 1

Copy

-- switch on profiling
ALTER SESSION SET PROFILE='ON';

-- run query
SELECT YEAR(o_orderdate) AS "YEAR", COUNT(*)
FROM orders
GROUP BY YEAR(o_orderdate)
ORDER BY 1 LIMIT 5;

YEAR	COUNT(*)
1992	227089
1993	226645
1994	227597
1995	228637
1996	228626

Copy

-- switch off profiling again
ALTER SESSION SET PROFILE='OFF';

-- update statistics
FLUSH STATISTICS;

-- read profiling information
SELECT part_id, part_name, object_name, object_rows, out_rows, duration
    FROM exa_user_profile_last_day
    WHERE CURRENT_STATEMENT-5 = stmt_id 
    AND CURRENT_SESSION=session_id;

	PART_NAME	OBJECT_NAME	OBJECT_ROWS	OUT_ROWS	DURATION
1	COMPILE / EXECUTE				0.020672
2	SCAN	ORDERS	1500000	1500000	0.247681
3	GROUP BY	tmp_subselect0	0	7	0.020913
4	SORT	tmp_subselect0	7	5	0.006341

Example 2

The following example

Copy

-- enable profiling for the session
ALTER SESSION SET PROFILE='ON';

-- run the query
SELECT COUNT(*) 
    FROM sales s JOIN sales_positions sp 
    ON s.sales_id=sp.sales_id;

-- disable profiling again
ALTER SESSION SET PROFILE='OFF';

-- update statistics
FLUSH STATISTICS;

-- get statement info
SELECT DISTINCT STMT_ID,SQL_TEXT
    FROM EXA_USER_PROFILE_LAST_DAY
    WHERE SESSION_ID=CURRENT_SESSION
    ORDER BY STMT_ID DESC;

STMT_ID	SQL_TEXT
12	SELECT COUNT(*) FROM sales s JOIN sales_positions sp ON s.sales_id=sp.sales_id;
11	ALTER SESSION SET PROFILE='ON'

Copy

-- get profile information
SELECT
PART_ID,PART_NAME,PART_INFO,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION,
CPU,TEMP_DB_RAM_PEAK,HDD_READ,HDD_WRITE,NET,REMARKS
    FROM exa_user_profile_last_day
    WHERE STMT_ID=12
    AND SESSION_ID=current_session;

PART_ID	PART_NAME	PART_INFO	OBJECT_NAME	OBJECT_ROWS	OUT_ROWS	DURATION	CPU	TEMP_DB_RAM_PEAK	HDD_READ	HDD_WRITE	NET	REMARKS
1	COMPILE / EXECUTE	(null)	(null)	(null)	(null)	3.647	95.9	37.5	0.0	(null)	0.2	(null)
4	INDEX CREATE	(null)	SALES_POSITIONS	455140507	455140507	21.965	78.5	2130.0	0.1	(null)	0.0	LOCAL INDEX (SALES_ID)
5	SCAN	(null)	SALES	43611378	43611378	0.013	99.3	36.0	0.0	(null)	0.0	(null)
6	JOIN	(null)	SALES_POSITIONS	455140507	455140507	1.583	(null)	(null)	(null)	(null)	(null)	SALES(SALES_ID) => LOCAL INDEX (SALES_ID)
7	GROUP BY	GLOBAL on TEMPORARY table	tmp_subselect0	0	1	0.076	97.3	37.5	0.0	(null)	0.0	(null)

This profile shows that the two tables were joined for the first time - hence the INDEX CREATE part.

SALES_POSITIONS is indexed, while SALES becomes the root table for the following join. This is a local join because the two tables are distributed on the join column SALES_ID.

Profiling

Enable profiling

What’s in a profile

Profile columns

Execution parts

Examples

Example 1

Example 2

PRODUCT

RESOURCES