1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.6. bloom — bloom filter index access method</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="basic-archive.html" title="F.5. basic_archive — an example WAL archive module" /><link rel="next" href="btree-gin.html" title="F.7. btree_gin — GIN operator classes with B-tree behavior" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.6. bloom — bloom filter index access method</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="basic-archive.html" title="F.5. basic_archive — an example WAL archive module">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules and Extensions</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.7. btree_gin — GIN operator classes with B-tree behavior">Next</a></td></tr></table><hr /></div><div class="sect1" id="BLOOM"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.6. bloom — bloom filter index access method <a href="#BLOOM" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="bloom.html#BLOOM-PARAMETERS">F.6.1. Parameters</a></span></dt><dt><span class="sect2"><a href="bloom.html#BLOOM-EXAMPLES">F.6.2. Examples</a></span></dt><dt><span class="sect2"><a href="bloom.html#BLOOM-OPERATOR-CLASS-INTERFACE">F.6.3. Operator Class Interface</a></span></dt><dt><span class="sect2"><a href="bloom.html#BLOOM-LIMITATIONS">F.6.4. Limitations</a></span></dt><dt><span class="sect2"><a href="bloom.html#BLOOM-AUTHORS">F.6.5. Authors</a></span></dt></dl></div><a id="id-1.11.7.16.2" class="indexterm"></a><p>
3 <code class="literal">bloom</code> provides an index access method based on
4 <a class="ulink" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_top">Bloom filters</a>.
6 A Bloom filter is a space-efficient data structure that is used to test
7 whether an element is a member of a set. In the case of an index access
8 method, it allows fast exclusion of non-matching tuples via signatures
9 whose size is determined at index creation.
11 A signature is a lossy representation of the indexed attribute(s), and as
12 such is prone to reporting false positives; that is, it may be reported
13 that an element is in the set, when it is not. So index search results
14 must always be rechecked using the actual attribute values from the heap
15 entry. Larger signatures reduce the odds of a false positive and thus
16 reduce the number of useless heap visits, but of course also make the index
17 larger and hence slower to scan.
19 This type of index is most useful when a table has many attributes and
20 queries test arbitrary combinations of them. A traditional btree index is
21 faster than a bloom index, but it can require many btree indexes to support
22 all possible queries where one needs only a single bloom index. Note
23 however that bloom indexes only support equality queries, whereas btree
24 indexes can also perform inequality and range searches.
25 </p><div class="sect2" id="BLOOM-PARAMETERS"><div class="titlepage"><div><div><h3 class="title">F.6.1. Parameters <a href="#BLOOM-PARAMETERS" class="id_link">#</a></h3></div></div></div><p>
26 A <code class="literal">bloom</code> index accepts the following parameters in its
27 <code class="literal">WITH</code> clause:
28 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">length</code></span></dt><dd><p>
29 Length of each signature (index entry) in bits. It is rounded up to the
30 nearest multiple of <code class="literal">16</code>. The default is
31 <code class="literal">80</code> bits and the maximum is <code class="literal">4096</code>.
32 </p></dd></dl></div><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">col1 — col32</code></span></dt><dd><p>
33 Number of bits generated for each index column. Each parameter's name
34 refers to the number of the index column that it controls. The default
35 is <code class="literal">2</code> bits and the maximum is <code class="literal">4095</code>.
36 Parameters for index columns not actually used are ignored.
37 </p></dd></dl></div></div><div class="sect2" id="BLOOM-EXAMPLES"><div class="titlepage"><div><div><h3 class="title">F.6.2. Examples <a href="#BLOOM-EXAMPLES" class="id_link">#</a></h3></div></div></div><p>
38 This is an example of creating a bloom index:
39 </p><pre class="programlisting">
40 CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
41 WITH (length=80, col1=2, col2=2, col3=4);
43 The index is created with a signature length of 80 bits, with attributes
44 i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could
45 have omitted the <code class="literal">length</code>, <code class="literal">col1</code>,
46 and <code class="literal">col2</code> specifications since those have the default values.
48 Here is a more complete example of bloom index definition and usage, as
49 well as a comparison with equivalent btree indexes. The bloom index is
50 considerably smaller than the btree index, and can perform better.
51 </p><pre class="programlisting">
52 =# CREATE TABLE tbloom AS
54 (random() * 1000000)::int as i1,
55 (random() * 1000000)::int as i2,
56 (random() * 1000000)::int as i3,
57 (random() * 1000000)::int as i4,
58 (random() * 1000000)::int as i5,
59 (random() * 1000000)::int as i6
61 generate_series(1,10000000);
64 A sequential scan over this large table takes a long time:
65 </p><pre class="programlisting">
66 =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
68 ------------------------------------------------------------------------------------------------------
69 Seq Scan on tbloom (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0.00 loops=1)
70 Filter: ((i2 = 898732) AND (i5 = 123451))
71 Rows Removed by Filter: 10000000
72 Buffers: shared hit=63744
73 Planning Time: 0.346 ms
74 Execution Time: 357.076 ms
78 Even with the btree index defined the result will still be a
80 </p><pre class="programlisting">
81 =# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
83 =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
88 =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
90 ------------------------------------------------------------------------------------------------------
91 Seq Scan on tbloom (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0.00 loops=1)
92 Filter: ((i2 = 898732) AND (i5 = 123451))
93 Rows Removed by Filter: 10000000
94 Buffers: shared hit=63744
95 Planning Time: 0.138 ms
96 Execution Time: 351.035 ms
100 Having the bloom index defined on the table is better than btree in
101 handling this type of search:
102 </p><pre class="programlisting">
103 =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
105 =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
110 =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
112 ---------------------------------------------------------------------------------------------------------------------
113 Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0.00 loops=1)
114 Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
115 Rows Removed by Index Recheck: 2300
116 Heap Blocks: exact=2256
117 Buffers: shared hit=21864
118 -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300.00 loops=1)
119 Index Cond: ((i2 = 898732) AND (i5 = 123451))
121 Buffers: shared hit=19608
122 Planning Time: 0.099 ms
123 Execution Time: 22.632 ms
127 Now, the main problem with the btree search is that btree is inefficient
128 when the search conditions do not constrain the leading index column(s).
129 A better strategy for btree is to create a separate index on each column.
130 Then the planner will choose something like this:
131 </p><pre class="programlisting">
132 =# CREATE INDEX btreeidx1 ON tbloom (i1);
134 =# CREATE INDEX btreeidx2 ON tbloom (i2);
136 =# CREATE INDEX btreeidx3 ON tbloom (i3);
138 =# CREATE INDEX btreeidx4 ON tbloom (i4);
140 =# CREATE INDEX btreeidx5 ON tbloom (i5);
142 =# CREATE INDEX btreeidx6 ON tbloom (i6);
144 =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
146 ---------------------------------------------------------------------------------------------------------------------------
147 Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.032..0.033 rows=0.00 loops=1)
148 Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
149 Buffers: shared read=6
150 -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.047..0.047 rows=0.00 loops=1)
151 Buffers: shared hit=6
152 -> Bitmap Index Scan on btreeidx5 (cost=0.00..4.52 rows=11 width=0) (actual time=0.026..0.026 rows=7.00 loops=1)
153 Index Cond: (i5 = 123451)
155 Buffers: shared hit=3
156 -> Bitmap Index Scan on btreeidx2 (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8.00 loops=1)
157 Index Cond: (i2 = 898732)
159 Buffers: shared hit=3
160 Planning Time: 0.264 ms
161 Execution Time: 0.047 ms
164 Although this query runs much faster than with either of the single
165 indexes, we pay a penalty in index size. Each of the single-column
166 btree indexes occupies 88.5 MB, so the total space needed is 531 MB,
167 over three times the space used by the bloom index.
168 </p></div><div class="sect2" id="BLOOM-OPERATOR-CLASS-INTERFACE"><div class="titlepage"><div><div><h3 class="title">F.6.3. Operator Class Interface <a href="#BLOOM-OPERATOR-CLASS-INTERFACE" class="id_link">#</a></h3></div></div></div><p>
169 An operator class for bloom indexes requires only a hash function for the
170 indexed data type and an equality operator for searching. This example
171 shows the operator class definition for the <code class="type">text</code> data type:
172 </p><pre class="programlisting">
173 CREATE OPERATOR CLASS text_ops
174 DEFAULT FOR TYPE text USING bloom AS
175 OPERATOR 1 =(text, text),
176 FUNCTION 1 hashtext(text);
177 </pre></div><div class="sect2" id="BLOOM-LIMITATIONS"><div class="titlepage"><div><div><h3 class="title">F.6.4. Limitations <a href="#BLOOM-LIMITATIONS" class="id_link">#</a></h3></div></div></div><p>
178 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
179 Only operator classes for <code class="type">int4</code> and <code class="type">text</code> are
180 included with the module.
181 </p></li><li class="listitem"><p>
182 Only the <code class="literal">=</code> operator is supported for search. But
183 it is possible to add support for arrays with union and intersection
184 operations in the future.
185 </p></li><li class="listitem"><p>
186 <code class="literal">bloom</code> access method doesn't support
187 <code class="literal">UNIQUE</code> indexes.
188 </p></li><li class="listitem"><p>
189 <code class="literal">bloom</code> access method doesn't support searching for
190 <code class="literal">NULL</code> values.
191 </p></li></ul></div><p>
192 </p></div><div class="sect2" id="BLOOM-AUTHORS"><div class="titlepage"><div><div><h3 class="title">F.6.5. Authors <a href="#BLOOM-AUTHORS" class="id_link">#</a></h3></div></div></div><p>
193 Teodor Sigaev <code class="email"><<a class="email" href="mailto:teodor@postgrespro.ru">teodor@postgrespro.ru</a>></code>,
194 Postgres Professional, Moscow, Russia
196 Alexander Korotkov <code class="email"><<a class="email" href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>></code>,
197 Postgres Professional, Moscow, Russia
199 Oleg Bartunov <code class="email"><<a class="email" href="mailto:obartunov@postgrespro.ru">obartunov@postgrespro.ru</a>></code>,
200 Postgres Professional, Moscow, Russia
201 </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="basic-archive.html" title="F.5. basic_archive — an example WAL archive module">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.7. btree_gin — GIN operator classes with B-tree behavior">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.5. basic_archive — an example WAL archive module </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> F.7. btree_gin — GIN operator classes with B-tree behavior</td></tr></table></div></body></html>