You can tune the performance of your queries by choosing a specific size for the maximum hash. A hash key represents the actual values of the indexed column. An index hash key aims to avoid the expensive operation of finding, loading, and then unpacking the rows to determine the indexed value, by including enough of the actual row data with a row ID, which is always part of an index entry when no hash is used.
A row ID allows UltraLite to locate the actual row data in the database file. If you set the hash size to 0 (that is, disable index hashing), then the index entry only contains this row ID. For all other hash sizes, the hash key—which can contain all or part of the transformed data in that row—is stored along with the row ID in the index page.
However, how much row data the hash key includes is partly determined by the maximum hash size property you configure, and partly by how much is actually needed for the data type of the column. See Choosing an optimal hash size.
The value of an index hash maintains the order of the actual row data of indexed columns. For example, if you have indexed a LastName column for a table called Employees, you may see four names ordered as follows:
This means if you hashed the first six letters your hash keys for these row values would appear as follows:
While these entries look the same, note that the first Anders in the list is used to represent the actual row value of Anders, but last Anders in the list is used to represent the actual row value Anderson.
Now, consider the following statement:
SELECT * FROM Employees WHERE LastName = 'Andersen'
If the Employees table only contained a very high proportion of names similar to Andersen, then as the hash currently stands, it may not offer enough uniqueness in the hash to gain any performance benefits. This is because UltraLite cannot determine if any of the hash keys actually meets the conditions of this statement. When duplicate index hash keys like these exist, UltraLite still needs to find the table row that matches the row ID in question, then load and then unpack the data so the value can be evaluated.
You only gain a performance benefit when UltraLite can discern enough uniqueness among each hash key, thereby making query condition evaluation immediate to the index itself. No further processing would then be required. For example, if the Employees table had thousands of names, then there is still enough benefit to be gained by a hash of six letters. This is because the index for all row entries would contain a proportionate number of unique hash keys. However, if the Employees table again only contained an inordinate number of names similar to Andersen, then you may want to hash at least seven letters so there are proportionately more unique hash keys used in the index. This would then create the following index hash keys:
That would mean that if you used the previous statement, only two of the four row values would need to be unpacked and evaluated, rather than all four.