Profiling
This section explains how to use the profiling feature in Exasol.
Exasol deliberately omits complex tuning mechanisms for customers, for example, execution hints, creation of different index types, calculation of table statistics and so on. Queries in Exasol are analyzed by the query optimizer and corresponding tuning actions are executed fully automatically.
However, there are situations where you want to know how much time the certain execution parts of a query take. Long running queries can then be analyzed and maybe re-written. This kind of information can also be provided to Exasol to continuously improve the query optimizer.
In Exasol, you can switch on the profiling feature on demand. Afterward, the corresponding information is gathered and provided to the customer through system tables. Further details are described in the following sections.
Enable Profiling
By default, the profiling is only activated for the running queries and provides the current status through the system tables EXA_DBA_PROFILE_RUNNING and EXA_USER_PROFILE_RUNNING.
You can enable the general profiling feature through the statements ALTER SESSION or ALTER SYSTEM by setting the option profile='on'. After that, the corresponding profiling data is gathered during the execution of queries and is collected in the system tables EXA_USER_PROFILE_LAST_DAY and EXA_DBA_PROFILE_LAST_DAY.
This profiling data is part of the statistical system tables which are committed periodically. Profiling data is only kept for 24 hours, after that, old query profiles are deleted from the system tables. Therefore, a delay occurs until the data is provided within the system tables. If you want to analyze the profiling information directly after executing a query, you can use the command FLUSH STATISTICS to enforce the COMMIT of the statistical data.
Enabling profiling on the system level may impact performance. Therefore, Exasol recommends enabling profiling only for individual sessions or temporarily on the system level for troubleshooting.
How to Read Profiling
Profiling, when enabled, lets you analyze queries. The profile of each query includes key information on the execution parts, such as the execution time (DURATION), the CPU usage (CPU), the memory usage (TEMP_DB_RAM_PEAK and PERSISTENT_DB_RAM_PEAK), or the network communication (NET). Additionally, the number of processed rows (OBJECT_ROWS), the number of resulting rows (OUT_ROWS), and additional information about a part (REMARKS) is gathered.
Some parts do not contain additional information in PART_INFO .
Profiling consists of the following three common execution parts amongst other parts, for a full list of execution parts, refer to the PART_NAME in the table below:
- COMPILE/EXECUTE: This is the first part. This shows you the time the database needs to compile the query. Information such as CPU, TEMP_DB_RAM_PEAK, hard disk, network, and so on are a part of the COMPILE/EXECUTE step.
- SCAN: Next is the SCAN. This stands for scanning a table. OBJECT_SCHEMA and OBJECT_NAME show which tables were scanned.
- INSERT: Insert does not actually insert into a table. However, it inserts the found rows into a result set. The result set is a temporary table.
The following is a list of all profiling information of a running query.
Column | Meaning |
---|---|
SESSION_ID |
The Session ID is the unique identifier of the session. |
STMT_ID |
Serially numbered ID of statement within a session. |
COMMAND_NAME |
Name of the statement (for example, SELECT, COMMIT, or MERGE). |
PART_ID |
Serially numbered ID of the execution part within the statement. |
PART_NAME |
Name of the execution part. For descriptions of the different execution parts, see the Execution Parts table below. |
PART_INFO |
Extended information of the execution part:
|
OBJECT_SCHEMA |
Schema of the processed object. |
OBJECT_NAME |
Name of the processed object. |
OBJECT_ROWS |
Number of rows of the processed object. |
OUT_ROWS |
Number of result rows of the execution part. |
DURATION |
Duration of the execution part in seconds. |
CPU |
CPU utilization in percent of the execution part (averaged over the duration). |
TEMP_DB_RAM_PEAK |
Usage of temporary DB memory of the execution part in MiB (cluster wide, maximum over the duration). |
PERSISTENT_DB_RAM_PEAK |
Usage of modified, persistent DB memory of the execution part in MiB (cluster wide, maximum over the duration). |
HDD_READ |
Hard disk read ratio in MiB per second (per node, averaged over the duration).
If this value is larger than 0, then data had to be loaded into the main memory. |
HDD_WRITE |
Hard disk write ratio in MiB per second (per node, averaged over the duration).
This column reflects only the data written during a COMMIT. For other statements its value is NULL. |
NET |
Network traffic ratio in MiB per second (sum of send/receive, per node, aver aged over the duration). |
REMARKS |
Any additional information such as index type and columns, or which index is used for a join. Some examples for the REMARKS column:
|
SQL_TEXT |
Displays the complete statement text of the corresponding query. |
Execution Parts
PART_NAME | Meaning |
---|---|
COMPILE/EXECUTE |
Compilation and execution of the statement (including query optimization) |
SCAN |
Scan of a table |
JOIN |
Join to a table |
FULL JOIN |
Full outer join to a table |
OUTER JOIN |
Outer Join to a table |
EXISTS |
EXISTS or IN computation |
GROUP BY |
Calculation of the GROUP BY aggregation |
GROUPING SETS |
Calculation of the GROUPING SETS aggregation |
SORT |
Sorting the data (ORDER BY, also in case of analytic functions) |
ANALYTIC FUNCTION |
Computation of analytic functions (without sorting of data). It includes one separate ANALYTIC FUNCTION entry for each distinct partition by and/or order by clause |
CONNECT BY |
Computation of hierarchical queries |
PREFERENCE PROCESSING |
Processing of Skyline queries |
PUSHDOWN |
Pushdown SQL statement generated by the adapter for queries on virtual objects |
QUERY CACHE RESULT |
Accessing the Query Cache |
CREATE UNION |
Under certain circumstances, the optimizer can create a combined table out of several tables connected by UNION ALL, and process it much faster |
UNION TABLE |
This part is created for each individual UNION ALL optimized table |
INSERT |
Inserting data in a table (INSERT, MERGE, or IMPORT), in temporary data blocks, or in a result set |
UPDATE |
Update of data (UPDATE or MERGE) |
DELETE |
Deleting data (DELETE, TRUNCATE or MERGE) |
IMPORT |
Execution of the IMPORT command |
EXPORT |
Execution of the EXPORT command |
COMMIT |
Commit of the transaction, persistent write to hard disk |
ROLLBACK |
Rollback of the transaction |
WAIT FOR COMMIT |
Waiting until another transaction finishes |
INDEX CREATE |
Creation of internal indexes. For more information, see Indexes. |
INDEX INSERT |
Update on internal indexes. For more information, see Indexes. |
INDEX REBUILD |
Recreation of internal indexes. For more information, see Indexes. |
CONSTRAINT CHECK |
Checking constraints (primary/foreign key, NOT NULL) |
DISTRIBUTE / PARTITION |
Distribution and/or partition of data |
RECOMPRESS |
Recompressing data |
REPLICATE |
Cluster-wide replication of data (e.g. replicating small tables to avoid global joins) |
SYSTEM TABLE |
Collecting the data for a system table |
COLUMN STATISTICS |
Computation of statistics on one or more columns, required for internal database function |
How to Use Profiling
When the profiling for a query is enabled, you can see the exact execution steps of the query and important information about the details of this step, including:
- Which objects were involved
- Results of filtering activities
- If data was read or written to disk
- Network and CPU usage
You can use this information to better understand exactly what a query is doing and potentially improve any long-running parts of the query.
The below queries will enable the profiling for the session. You can also set profiling system-wide.
Since profiling is enabled for the session, you can now run queries. For the following example we have used the TPCH dataset in our Public Demo system. This below query will gather information on the customers who have returned the highest value items since 1997.
SELECT C.C_CUSTKEY, C.C_NAME,C_PHONE, COUNT(L.L_LINENUMBER) TOTAL_RETURNED, SUM(L.L_EXTENDEDPRICE) DOLLARS_RETURNED FROM LINEITEM L
LEFT JOIN ORDERS O ON L.L_ORDERKEY = O.O_ORDERKEY
LEFT JOIN CUSTOMER C ON C.C_CUSTKEY = O.O_CUSTKEY
WHERE YEAR(O_ORDERDATE) > '1997'
GROUP BY 1,2,3
ORDER BY LOCAL.DOLLARS_RETURNED DESC;
Now perform a flush statistics and view the profile from this query.
FLUSH STATISTICS;
SELECT * FROM EXA_USER_PROFILE_LAST_DAY WHERE SESSION_ID = CURRENT_SESSION ORDER BY STMT_ID, PART_ID;
Click on the following image to display the profile:
Based on this information from the image, we can see that the optimizer improved the query. It used ORDERS as the base table and applied a filter on the ORDERS table, which reduced the number of rows processed from 1.5 billion rows to about 133 million. This operation was performed completely in-memory and did not read any data from the disk.
Afterward, it joined the CUSTOMER and LINEITEM to that table. The REMARKS column shows us the columns that we were joining on. The JOIN on LINEITEM was a GLOBAL join, which means that proper distribution keys could improve the query duration by converting the global join to a local join. For more information on this, see Profiling in the Performance Guide.
Next, we performed the GROUP BY operation to get the COUNT and SUM that we specified in the SELECT clause. This operation required a high amount of Network traffic because data was transferred between the nodes and used over 30% of available CPU resources. The results of this operation were stored into a temporary table and generated about 21 GiB of temporary data. Finally, our temporary data was sorted, and the results were sent to the client.
For more information, you can take Exasol Performance Management training course, and view some best practices regarding profiling.
Example
The following examples shows how the profiling data looks like for a simple query.
-- switch on profiling
ALTER SESSION SET PROFILE='ON';
-- run query
SELECT YEAR(o_orderdate) AS "YEAR", COUNT(*)
FROM orders
GROUP BY YEAR(o_orderdate)
ORDER BY 1 LIMIT 5;
Results
YEAR | COUNT(*) |
---|---|
1992 | 227089 |
1993 | 226645 |
1994 | 227597 |
1995 | 228637 |
1996 | 228626 |
-- read profiling information
SELECT part_id, part_name, object_name, object_rows, out_rows, duration
FROM exa_user_profile_last_day
WHERE CURRENT_STATEMENT-5 = stmt_id AND CURRENT_SESSION=session_id;
Results
PART_NAME | OBJECT_NAME | OBJECT_ROWS | OUT_ROWS | DURATION | |
---|---|---|---|---|---|
1 | COMPILE / EXECUTE | 0.020672 | |||
2 | SCAN | ORDERS | 1500000 | 1500000 | 0.247681 |
3 | GROUP BY | tmp_subselect0 | 0 | 7 | 0.020913 |
4 | SORT | tmp_subselect0 | 7 | 5 | 0.006341 |