Skip to main content

Indexes and Hierarchies

Indexed representations are used for columns that have a reasonably small number of unique values. Multi-dimensional indexes, called hierarchies, are useful in many analysis scenarios.

Indexes

Consider a filter device for a categorical column: The filter shows the range of values for the column, and when the filter is adjusted, the set of selected rows is modified accordingly. In order to support filtering, the system needs efficient ways both to find the unique values of a column, and, for a set of unique values, find the corresponding set of rows.

If the number of unique values is smaller than the number of rows, it is more efficient to sort the values in the indexed representation. The indexing procedure creates simple mappings for row index to value index, and vice versa.

Creating an indexed representation of a column, that is then discarded

Hierarchies

An indexed column is an example of a one-dimensional index. Multi-dimensional indexes, known as data hierarchies, are also supported.

Consider a two-level hierarchy based on a date column as an example. The first level uses the year part of the date values to partition the row set into disjointed subsets, one for each year present. The second level is based on the month part, and further partitions the subsets. Using such a hierarchy it is easy to find all rows corresponding to a specific year, or all rows for a certain month of a specific year:

Creating a date hierarchy from a date column