The most important component of the column statistics used by the optimizer are **histograms**. Histograms store information about the distribution of values in a column. In SQL Anywhere, a histogram represents the data
distribution for a column by dividing the domain of the column into a set of consecutive value ranges (also called **buckets**) and by remembering, for each value range (or bucket), the number of rows in the table for which the column value falls in
the bucket.

SQL Anywhere pays particular attention to single column values that are present in a large number of rows in the table. Significant
single value selectivities are maintained in singleton histogram buckets (for example, buckets that encompass a single value
in the column domain). SQL Anywhere tries to maintain a minimum number of singleton buckets in each histogram, usually between
10 and 100 depending upon the size of the table. Additionally, all single values with selectivities greater than 1% are kept
as singleton buckets. As a result, a histogram for a given column remembers the top *N* single value selectivities for the column where the value of *N* is dependent upon the size of the table and the number of single value selectivities that are greater than 1%.

Once the minimum number of value ranges has been met, low-selectivity frequencies are replaced by large-selectivity frequencies as they come along. The histogram will only have more than the minimum number of singleton value ranges after it has seen enough values with a selectivity of greater than 1%.

For more information about column statistics, see SYSCOLSTAT system view.

Discuss this page in DocCommentXchange. Send feedback about this page using email. |
Copyright © 2009, iAnywhere Solutions, Inc. - SQL Anywhere 11.0.1 |