begriffs open source - ai-pg/blob - full-docs/txt/hash-index.txt

   1
   2 65.6. Hash Indexes #
   3
   4    65.6.1. Overview
   5    65.6.2. Implementation
   6
   7 65.6.1. Overview #
   8
   9    PostgreSQL includes an implementation of persistent on-disk hash
  10    indexes, which are fully crash recoverable. Any data type can be
  11    indexed by a hash index, including data types that do not have a
  12    well-defined linear ordering. Hash indexes store only the hash value of
  13    the data being indexed, thus there are no restrictions on the size of
  14    the data column being indexed.
  15
  16    Hash indexes support only single-column indexes and do not allow
  17    uniqueness checking.
  18
  19    Hash indexes support only the = operator, so WHERE clauses that specify
  20    range operations will not be able to take advantage of hash indexes.
  21
  22    Each hash index tuple stores just the 4-byte hash value, not the actual
  23    column value. As a result, hash indexes may be much smaller than
  24    B-trees when indexing longer data items such as UUIDs, URLs, etc. The
  25    absence of the column value also makes all hash index scans lossy. Hash
  26    indexes may take part in bitmap index scans and backward scans.
  27
  28    Hash indexes are best optimized for SELECT and UPDATE-heavy workloads
  29    that use equality scans on larger tables. In a B-tree index, searches
  30    must descend through the tree until the leaf page is found. In tables
  31    with millions of rows, this descent can increase access time to data.
  32    The equivalent of a leaf page in a hash index is referred to as a
  33    bucket page. In contrast, a hash index allows accessing the bucket
  34    pages directly, thereby potentially reducing index access time in
  35    larger tables. This reduction in "logical I/O" becomes even more
  36    pronounced on indexes/data larger than shared_buffers/RAM.
  37
  38    Hash indexes have been designed to cope with uneven distributions of
  39    hash values. Direct access to the bucket pages works well if the hash
  40    values are evenly distributed. When inserts mean that the bucket page
  41    becomes full, additional overflow pages are chained to that specific
  42    bucket page, locally expanding the storage for index tuples that match
  43    that hash value. When scanning a hash bucket during queries, we need to
  44    scan through all of the overflow pages. Thus an unbalanced hash index
  45    might actually be worse than a B-tree in terms of number of block
  46    accesses required, for some data.
  47
  48    As a result of the overflow cases, we can say that hash indexes are
  49    most suitable for unique, nearly unique data or data with a low number
  50    of rows per hash bucket. One possible way to avoid problems is to
  51    exclude highly non-unique values from the index using a partial index
  52    condition, but this may not be suitable in many cases.
  53
  54    Like B-Trees, hash indexes perform simple index tuple deletion. This is
  55    a deferred maintenance operation that deletes index tuples that are
  56    known to be safe to delete (those whose item identifier's LP_DEAD bit
  57    is already set). If an insert finds no space is available on a page we
  58    try to avoid creating a new overflow page by attempting to remove dead
  59    index tuples. Removal cannot occur if the page is pinned at that time.
  60    Deletion of dead index pointers also occurs during VACUUM.
  61
  62    If it can, VACUUM will also try to squeeze the index tuples onto as few
  63    overflow pages as possible, minimizing the overflow chain. If an
  64    overflow page becomes empty, overflow pages can be recycled for reuse
  65    in other buckets, though we never return them to the operating system.
  66    There is currently no provision to shrink a hash index, other than by
  67    rebuilding it with REINDEX. There is no provision for reducing the
  68    number of buckets, either.
  69
  70    Hash indexes may expand the number of bucket pages as the number of
  71    rows indexed grows. The hash key-to-bucket-number mapping is chosen so
  72    that the index can be incrementally expanded. When a new bucket is to
  73    be added to the index, exactly one existing bucket will need to be
  74    "split", with some of its tuples being transferred to the new bucket
  75    according to the updated key-to-bucket-number mapping.
  76
  77    The expansion occurs in the foreground, which could increase execution
  78    time for user inserts. Thus, hash indexes may not be suitable for
  79    tables with rapidly increasing number of rows.
  80
  81 65.6.2. Implementation #
  82
  83    There are four kinds of pages in a hash index: the meta page (page
  84    zero), which contains statically allocated control information; primary
  85    bucket pages; overflow pages; and bitmap pages, which keep track of
  86    overflow pages that have been freed and are available for re-use. For
  87    addressing purposes, bitmap pages are regarded as a subset of the
  88    overflow pages.
  89
  90    Both scanning the index and inserting tuples require locating the
  91    bucket where a given tuple ought to be located. To do this, we need the
  92    bucket count, highmask, and lowmask from the metapage; however, it's
  93    undesirable for performance reasons to have to have to lock and pin the
  94    metapage for every such operation. Instead, we retain a cached copy of
  95    the metapage in each backend's relcache entry. This will produce the
  96    correct bucket mapping as long as the target bucket hasn't been split
  97    since the last cache refresh.
  98
  99    Primary bucket pages and overflow pages are allocated independently
 100    since any given index might need more or fewer overflow pages relative
 101    to its number of buckets. The hash code uses an interesting set of
 102    addressing rules to support a variable number of overflow pages while
 103    not having to move primary bucket pages around after they are created.
 104
 105    Each row in the table indexed is represented by a single index tuple in
 106    the hash index. Hash index tuples are stored in bucket pages, and if
 107    they exist, overflow pages. We speed up searches by keeping the index
 108    entries in any one index page sorted by hash code, thus allowing binary
 109    search to be used within an index page. Note however that there is *no*
 110    assumption about the relative ordering of hash codes across different
 111    index pages of a bucket.
 112
 113    The bucket splitting algorithms to expand the hash index are too
 114    complex to be worthy of mention here, though are described in more
 115    detail in src/backend/access/hash/README. The split algorithm is crash
 116    safe and can be restarted if not completed successfully.