Indexes
This article explains how indexes in Exasol work.
Exasol is a self-tuning database, and this is especially true when it comes to indexes. Unlike other DBMS’s, indexes are automatically generated, reused, and discarded by the system as necessary. The database will determine when an index is needed and automatically create it during query execution if it does not already exist. Indexes are used for joins and under certain circumstances for filtering the data of a table (index scan).
Index creation
During query execution, the database will determine when an index is needed. If the needed index does not exist, it will be automatically created. For this reason, you may notice that a query takes longer to run on the first iteration because it is creating the necessary indexes. Specifically, an index may only be created for JOIN
operations. You can view this process by looking at the Profiling for the given query with the part name INDEX CREATE
.
Indexes are only persisted if the transaction is committed. Since the indexes are created within the query execution, we recommend sending a COMMIT statement after every SELECT statement in order to persist the index. Tools or processes which send ROLLBACK commands after SELECT statements may interfere with this process and require manual intervention to persist the index. In some cases, repeated queries may cause indexes to be re-created over and over again because the index is never persisted, which causes worse performance.
Depending on the size of the objects, the query which creates an index may take significantly longer. Subsequent runs of the query can then use these indexes, and the query will run much faster than in the first execution.
Index maintenance
During DML operations, the indexes are also updated to account for the new data. Depending on the amount of changed data, the index will either be altered or will be completely re-created. With this mechanism, you do not need to worry about re-creating indexes as they are updated or re-created automatically. Based on the DML operation, an index is maintained in the following ways:
-
INSERT: New index entries are added.
-
DELETE: Rows are marked as deleted until 25% of rows have been deleted. At that point, the table is reorganized, the deleted rows are dropped, and all indexes are automatically rebuilt.
-
UPDATE: Statements that affect less than 15% of rows will update index key entries. If more than 15% of rows are updated, the index is automatically rebuilt.
Indexes that are not used (read) after 35 days are automatically dropped. Write operations do not reset the 35 day counter.
If an index does not exist and the following statement updates the database, an index will be created but will not be persisted since it is based on outdated table data. The next DML operation on that table will automatically recreate this index. In the meantime, the index will however not be visible in metadata system tables such as EXA_DBA_INDICES.
Types of indexes
Local index
A local index stores information on a per-node basis. For example, given a local index on (A.x)
, the index part on node 1 will only contain references to rows of A
that are stored on node 1. Local indexes are perfect for table scans and local joins, as all information is available without requiring network traffic.
Global index
A global index stores information on a per-table basis, but behaves like a table with a distribution key. For example, all references to a certain key are stored on a well-defined node in the cluster, even if the rows referenced reside on different or multiple nodes. If a distribution key is set on the table and the index contains all columns of that distribution key, the index will be distributed in line with the table, effectively making it a local index.
Expression indexes
Expression indexes are temporary indexes that are created and used during the computation of a JOIN
. These indexes are created in any of the following conditions:
-
The JOIN condition needs to compute an expression.
-
The data types of the join conditions do not match, and an implicit cast needs to be performed.
The following example shows in which cases these expression indexes are created:
-- Preparation
create or replace table t1 (id int, name varchar(20000));
create or replace table t2 (id int, name varchar(20000));
insert into t1 values (1, 'Jeff'),(2,'John'),(3,'Jacob');
insert into t2 values (1, 'Jeff'),(2,'John'),(4,'Max');
The following query will generate an expression index because there is an additional expression in the join condition (t1.id + 1
):
By modifying the data types to no longer match, the queries will also create an expression index. This happens because the database performs an implicit CAST in order to compare the data types. For example:
Expression indexes are never persisted after the query finishes, so it will take time for subsequent queries to build the same expression index again. Therefore, Exasol recommends to avoid expressions in join conditions where possible and ensure that the data types of the columns being joined on match.
Manual index operations
Although the database handles all index operations automatically, there may be situations where you want to create or drop them manually. In rare cases, additional indexes on filter columns may improve query performance.
Only perform these operations based on the advice and guidance of Exasol Support.
Create index
Drop index
System tables
The following system tables provide additional information on the available indexes in the database:
The overall compressed size of all indexes are found in the AUXILIARY_SIZE
column in EXA_DB_SIZE_LAST_DAY and similar tables.