In some cases, values in the grouping columns of the input table are clustered, so that similar values appear close together. For example, if a table contains a column that is always set to the current date, all rows with a single date are relatively close within the table. The Clustered Hash Group By algorithm exploits this clustering.
The optimizer may use Clustered Hash Group By when grouping tables that are significantly larger than the available memory. In particular, it is effective when the HAVING predicate returns only a small proportion of rows.
The Clustered Hash Group By algorithm can lead to significant wasted work on the part of the optimizer if it is chosen in an environment where data is being updated at the same time that queries are being executed. Clustered Hash Group By is therefore most appropriate for OLAP workloads characterized by occasional batch-style updates and read-based queries. Set the optimization_workload option to OLAP to instruct the optimizer that it should include the Clustered Hash Group By algorithm in the possibilities it investigates. See optimization_workload option [database].
When creating an index or foreign key that can be used in an OLAP workload, specify the FOR OLAP WORKLOAD clause. This clause causes the database server to maintain a statistic used by Clustered Group By Hash regarding the maximum page distance between two rows within the same key. See CREATE INDEX statement, CREATE TABLE statement, and ALTER TABLE statement.
For more information about OLAP, see OLAP Support.