The Hash Distinct algorithm reads its input, and builds an in-memory hash table. If an input row is found in the hash table, it is ignored; otherwise, it is written to a work table. If the input does not completely fit into the in-memory hash table, it is partitioned into smaller work tables, and processed recursively.
The Hash Distinct algorithm:
works very well if the distinct rows fit into an in-memory table, irrespective of the total number of rows in the input.
uses a work table, and as such can provide insensitive or value sensitive semantics.
returns a row as soon as it finds one that has not previously been returned. However, the results of a hash distinct must be fully materialized before returning from the query. If necessary, the optimizer adds a work table to the execution plan to ensure this.
locks the rows of its input.
The optimizer avoids generating access plans using the hash distinct algorithm if it detects that a low memory situation may occur during query execution. If the Hash Distinct algorithm executes in an environment where there is very little cache memory available, then it is not able to complete. In this case, the Hash Distinct method discards its interim results, and the Indexed Distinct algorithm is used instead.
Memory governor limits are dependent upon the server's multiprogramming level and the number of active connections. See Threading in SQL Anywhere, and Setting the database server's multiprogramming level.
The Indexed Distinct algorithm maintains a work table of the unique rows from the input. As rows are read from the input, an index on the work table is searched to find a previously seen duplicate of the input row. If one is found, the input row is ignored. Otherwise, the input row is inserted into the work table. The work table index is created on all the columns of the SELECT-list; in order to improve index performance, a hash expression is included as the first expression. This hash expression is a computed value embodying the values of all the columns in the SELECT-list.
The Indexed Distinct method returns distinct rows as they are encountered. This allows it to return the first few rows quickly compared to other duplicate elimination methods. The Indexed Distinct algorithm only stores two rows in memory at a time, and can work well in extremely low memory situations. However, if the number of distinct rows is large, the execution cost of the Indexed Distinct algorithm is typically worse than Hash Distinct. The work table used to store distinct rows may not fit in cache, leading to rereading work table pages many times in a random access pattern.
Since the Indexed Distinct method uses a work table, it cannot provide fully-sensitive semantics; however, it also does not provide fully-insensitive semantics, and another work table is required for insensitive cursors. The Indexed Distinct method locks the rows of its input.
When the Indexed Distinct algorithm is needed due to low memory, a performance counter is incremented. You can see this using the QueryLowMemoryStrategy database or connection property, in the QueryLowMemoryStrategy statistic in the graphical plan (when run with statistics), or in the Query: Low Memory Strategies counter in the Windows Performance Monitor. See Performance Monitor statistics.