Profiling
This article explains how to use profiling in Exasol to analyze and improve performance.
Exasol omits complex tuning mechanisms such as execution hints, creation of different index types, calculation of table statistics, etc. Queries in Exasol are instead analyzed by the query optimizer, and the corresponding tuning actions are automatically executed. This is a key factor in the high performance of an Exasol database.
However, there are scenarios where you want to know how much time certain parts of a query take to execute in order to analyze long-running queries and rewrite them if needed. In Exasol, you can use the profiling feature to analyze queries. When profiling is enabled, information about a query is gathered during execution and can be retrieved from the system tables.
Enable profiling
By default, profiling is only activated for the running queries and provides information through the system tables EXA_DBA_PROFILE_RUNNING and EXA_USER_PROFILE_RUNNING.
You can also enable profiling for a session or system wide using the statements ALTER SESSION or ALTER SYSTEM. When profiling is enabled session or system wide, the corresponding profiling data is gathered during the execution of queries and is collected in the system tables EXA_USER_PROFILE_LAST_DAY and EXA_DBA_PROFILE_LAST_DAY.
For example, to enable profiling for queries in the current session, use the following statement:
Enabling profiling on the system level may impact performance. We recommend that you only enable profiling for individual sessions, or enable it only temporarily on the system level for troubleshooting.
Data is committed to the system tables periodically. This means that there is usually a delay after executing a query until you can access the profile in the system tables. If you want
to analyze the profile directly after executing a query, use the command FLUSH STATISTICS
to force the commit of statistical data.
What’s in a profile
A profile includes information about the execution time (DURATION
), CPU usage (CPU
), memory usage (TEMP_DB_RAM_PEAK
and PERSISTENT_DB_RAM_PEAK
), and network usage (NET
) for the different parts of a query execution.
The profile also shows the number of processed
rows (OBJECT_ROWS
) and the number of resulting rows (OUT_ROWS
), plus any additional information about each execution part
(REMARKS
).
The following tables describe the columns in a profile and the different execution parts.
Profile columns
Column | Description |
---|---|
|
Unique identifier of the session. |
|
ID of the statement within a session. |
|
Name of the statement. For example: SELECT, COMMIT, or MERGE. |
COMMAND_CLASS
|
The class of the SQL command/statement (DDL, DML, DCL, DQL) |
|
ID of the execution part within the statement. |
|
Name of the execution part. For more details, see Execution parts. |
|
Extended information about the execution part: GLOBAL = global action (for example, a global join) EXPRESSION INDEX = non-persistent join index which is built on an expression NL JOIN = nested loop join (cross product) REPLICATED = replicated object (for example, small tables) TEMPORARY = temporary object (for intermediate results) IN SNAPSHOT MODE = one or more objects were accessed in snapshot mode NOTE: some execution parts do not contain any information in this column. |
|
Schema of the processed object. |
|
Name of the processed object. |
|
Number of rows of the processed object. |
|
Number of result rows of the execution part. |
|
Duration of the execution part in seconds. |
|
Average CPU utilization in percent of the execution part over the duration of the execution. |
|
Maximum usage of temporary DB memory in MiB of the execution part, measured cluster wide. |
|
Maximum usage of modified, persistent DB memory in MiB of the execution part, measured cluster wide. |
|
Average hard disk read ratio in MiB per second over the duration of the execution, measured per node. A value < 0 indicates that data had to be loaded into the main memory. |
|
Average hard disk write ratio in MiB per second over the duration of the execution, measured per node. This column only reflects the data written during a commit. For other statements its value is |
|
Average network traffic ratio in MiB per second over the duration of the execution (sum of send/receive per node). |
|
Additional information about the execution part such as the index type (global or local) and which index is used for a join. For example:
For analytic functions, this column provides information about the number of partition key expressions (#Partitionkeys), order key combinations for the same partition keys (#Orderings), and analytical expressions (#Functions) used. For example:
|
|
The complete statement text of the corresponding query. |
Execution parts
PART_NAME | Description |
---|---|
ANALYTIC FUNCTION |
Computation of analytic functions (without sorting of data). Includes one separate |
COLUMN STATISTICS |
Computation of statistics on one or more columns, required for internal database function. |
COMMIT |
Commit of the transaction, persistent write to disk. |
COMPILE/EXECUTE |
Compilation and execution of the statement (including query optimization). |
CONNECT BY |
Computation of hierarchical queries. |
CONSTRAINT CHECK |
Checking constraints (primary/foreign key, NOT NULL). |
CREATE UNION |
Under certain circumstances, the optimizer can create a combined table out of several tables connected by UNION ALL and process it much faster. |
DELETE |
Deleting data (DELETE, TRUNCATE or MERGE). |
DISTRIBUTE / PARTITION |
Distribution and/or partition of data. |
EXISTS |
EXISTS or IN computation. |
EXPORT |
Execution of the EXPORT command. |
FULL JOIN |
Full outer join to a table. |
GROUP BY |
Calculation of the GROUP BY aggregation. |
GROUPING SETS |
Calculation of the GROUPING SETS aggregation. |
IMPORT |
Execution of the IMPORT command. |
INDEX CREATE |
Creation of internal indexes. For more information, see Indexes. |
INDEX INSERT |
Update on internal indexes. For more information, see Indexes. |
INDEX REBUILD |
Recreation of internal indexes. For more information, see Indexes. |
INSERT |
Inserting data in a table (INSERT, MERGE, or IMPORT), in temporary data blocks, or in a result set. |
JOIN |
Join to a table. |
OUTER JOIN |
Outer Join to a table. |
PREFERENCE PROCESSING |
Processing of Skyline queries. |
PUSHDOWN |
Pushdown SQL statement generated by the adapter for queries on virtual objects. |
QUERY CACHE RESULT |
Accessing the Query Cache. |
RECOMPRESS |
Recompressing data. |
REPLICATE |
Cluster-wide replication of data. For example, replicating small tables to avoid global joins. |
ROLLBACK |
Rollback of the transaction. |
SCAN |
Scan of a table. The |
SORT |
Sorting the data (ORDER BY, also in case of analytic functions). |
SYSTEM TABLE |
Collecting the data for a system table. |
UNION TABLE |
This part is created for each individual UNION ALL optimized table. |
UPDATE |
Update of data (UPDATE or MERGE). |
WAIT FOR COMMIT |
Waiting until another transaction finishes. |
Examples
Example 1
-- switch on profiling
ALTER SESSION SET PROFILE='ON';
-- run query
SELECT YEAR(o_orderdate) AS "YEAR", COUNT(*)
FROM orders
GROUP BY YEAR(o_orderdate)
ORDER BY 1 LIMIT 5;
YEAR | COUNT(*) |
---|---|
1992 | 227089 |
1993 | 226645 |
1994 | 227597 |
1995 | 228637 |
1996 | 228626 |
-- switch off profiling again
ALTER SESSION SET PROFILE='OFF';
-- update statistics
FLUSH STATISTICS;
-- read profiling information
SELECT part_id, part_name, object_name, object_rows, out_rows, duration
FROM exa_user_profile_last_day
WHERE CURRENT_STATEMENT-5 = stmt_id
AND CURRENT_SESSION=session_id;
PART_NAME | OBJECT_NAME | OBJECT_ROWS | OUT_ROWS | DURATION | |
---|---|---|---|---|---|
1 | COMPILE / EXECUTE | 0.020672 | |||
2 | SCAN | ORDERS | 1500000 | 1500000 | 0.247681 |
3 | GROUP BY | tmp_subselect0 | 0 | 7 | 0.020913 |
4 | SORT | tmp_subselect0 | 7 | 5 | 0.006341 |
Example 2
The following example
-- enable profiling for the session
ALTER SESSION SET PROFILE='ON';
-- run the query
SELECT COUNT(*)
FROM sales s JOIN sales_positions sp
ON s.sales_id=sp.sales_id;
-- disable profiling again
ALTER SESSION SET PROFILE='OFF';
-- update statistics
FLUSH STATISTICS;
-- get statement info
SELECT DISTINCT STMT_ID,SQL_TEXT
FROM EXA_USER_PROFILE_LAST_DAY
WHERE SESSION_ID=CURRENT_SESSION
ORDER BY STMT_ID DESC;
STMT_ID | SQL_TEXT |
---|---|
12 | SELECT COUNT(*) FROM sales s JOIN sales_positions sp ON s.sales_id=sp.sales_id; |
11 | ALTER SESSION SET PROFILE='ON' |
-- get profile information
SELECT
PART_ID,PART_NAME,PART_INFO,OBJECT_NAME,OBJECT_ROWS,OUT_ROWS,DURATION,
CPU,TEMP_DB_RAM_PEAK,HDD_READ,HDD_WRITE,NET,REMARKS
FROM exa_user_profile_last_day
WHERE STMT_ID=12
AND SESSION_ID=current_session;
PART_ID | PART_NAME | PART_INFO | OBJECT_NAME | OBJECT_ROWS | OUT_ROWS | DURATION | CPU | TEMP_DB_RAM_PEAK | HDD_READ | HDD_WRITE | NET | REMARKS |
1 | COMPILE / EXECUTE | (null) | (null) | (null) | (null) | 3.647 | 95.9 | 37.5 | 0.0 | (null) | 0.2 | (null) |
4 | INDEX CREATE | (null) | SALES_POSITIONS | 455140507 | 455140507 | 21.965 | 78.5 | 2130.0 | 0.1 | (null) | 0.0 | LOCAL INDEX (SALES_ID) |
5 | SCAN | (null) | SALES | 43611378 | 43611378 | 0.013 | 99.3 | 36.0 | 0.0 | (null) | 0.0 | (null) |
6 | JOIN | (null) | SALES_POSITIONS | 455140507 | 455140507 | 1.583 | (null) | (null) | (null) | (null) | (null) | SALES(SALES_ID) => LOCAL INDEX (SALES_ID) |
7 | GROUP BY | GLOBAL on TEMPORARY table | tmp_subselect0 | 0 | 1 | 0.076 | 97.3 | 37.5 | 0.0 | (null) | 0.0 | (null) |
This profile shows that the two tables were joined for the first time - hence the INDEX CREATE
part.
SALES_POSITIONS
is indexed, while SALES
becomes the root table for the following join. This is a local join because the two tables are distributed on the join column SALES_ID
.