1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>12.4. Additional Features</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="textsearch-controls.html" title="12.3. Controlling Text Search" /><link rel="next" href="textsearch-parsers.html" title="12.5. Parsers" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">12.4. Additional Features</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="textsearch-controls.html" title="12.3. Controlling Text Search">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><th width="60%" align="center">Chapter 12. Full Text Search</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="textsearch-parsers.html" title="12.5. Parsers">Next</a></td></tr></table><hr /></div><div class="sect1" id="TEXTSEARCH-FEATURES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">12.4. Additional Features <a href="#TEXTSEARCH-FEATURES" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="textsearch-features.html#TEXTSEARCH-MANIPULATE-TSVECTOR">12.4.1. Manipulating Documents</a></span></dt><dt><span class="sect2"><a href="textsearch-features.html#TEXTSEARCH-MANIPULATE-TSQUERY">12.4.2. Manipulating Queries</a></span></dt><dt><span class="sect2"><a href="textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS">12.4.3. Triggers for Automatic Updates</a></span></dt><dt><span class="sect2"><a href="textsearch-features.html#TEXTSEARCH-STATISTICS">12.4.4. Gathering Document Statistics</a></span></dt></dl></div><p>
3 This section describes additional functions and operators that are
4 useful in connection with text search.
5 </p><div class="sect2" id="TEXTSEARCH-MANIPULATE-TSVECTOR"><div class="titlepage"><div><div><h3 class="title">12.4.1. Manipulating Documents <a href="#TEXTSEARCH-MANIPULATE-TSVECTOR" class="id_link">#</a></h3></div></div></div><p>
6 <a class="xref" href="textsearch-controls.html#TEXTSEARCH-PARSING-DOCUMENTS" title="12.3.1. Parsing Documents">Section 12.3.1</a> showed how raw textual
7 documents can be converted into <code class="type">tsvector</code> values.
8 <span class="productname">PostgreSQL</span> also provides functions and
9 operators that can be used to manipulate documents that are already
10 in <code class="type">tsvector</code> form.
11 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">
12 <a id="id-1.5.11.7.3.3.1.1.1" class="indexterm"></a>
14 <code class="literal"><code class="type">tsvector</code> || <code class="type">tsvector</code></code>
16 The <code class="type">tsvector</code> concatenation operator
17 returns a vector which combines the lexemes and positional information
18 of the two vectors given as arguments. Positions and weight labels
19 are retained during the concatenation.
20 Positions appearing in the right-hand vector are offset by the largest
21 position mentioned in the left-hand vector, so that the result is
22 nearly equivalent to the result of performing <code class="function">to_tsvector</code>
23 on the concatenation of the two original document strings. (The
24 equivalence is not exact, because any stop-words removed from the
25 end of the left-hand argument will not affect the result, whereas
26 they would have affected the positions of the lexemes in the
27 right-hand argument if textual concatenation were used.)
29 One advantage of using concatenation in the vector form, rather than
30 concatenating text before applying <code class="function">to_tsvector</code>, is that
31 you can use different configurations to parse different sections
32 of the document. Also, because the <code class="function">setweight</code> function
33 marks all lexemes of the given vector the same way, it is necessary
34 to parse the text and do <code class="function">setweight</code> before concatenating
35 if you want to label different parts of the document with different
37 </p></dd><dt><span class="term">
38 <a id="id-1.5.11.7.3.3.2.1.1" class="indexterm"></a>
40 <code class="literal">setweight(<em class="replaceable"><code>vector</code></em> <code class="type">tsvector</code>, <em class="replaceable"><code>weight</code></em> <code class="type">"char"</code>) returns <code class="type">tsvector</code></code>
42 <code class="function">setweight</code> returns a copy of the input vector in which every
43 position has been labeled with the given <em class="replaceable"><code>weight</code></em>, either
44 <code class="literal">A</code>, <code class="literal">B</code>, <code class="literal">C</code>, or
45 <code class="literal">D</code>. (<code class="literal">D</code> is the default for new
46 vectors and as such is not displayed on output.) These labels are
47 retained when vectors are concatenated, allowing words from different
48 parts of a document to be weighted differently by ranking functions.
50 Note that weight labels apply to <span class="emphasis"><em>positions</em></span>, not
51 <span class="emphasis"><em>lexemes</em></span>. If the input vector has been stripped of
52 positions then <code class="function">setweight</code> does nothing.
53 </p></dd><dt><span class="term">
54 <a id="id-1.5.11.7.3.3.3.1.1" class="indexterm"></a>
56 <code class="literal">length(<em class="replaceable"><code>vector</code></em> <code class="type">tsvector</code>) returns <code class="type">integer</code></code>
58 Returns the number of lexemes stored in the vector.
59 </p></dd><dt><span class="term">
60 <a id="id-1.5.11.7.3.3.4.1.1" class="indexterm"></a>
62 <code class="literal">strip(<em class="replaceable"><code>vector</code></em> <code class="type">tsvector</code>) returns <code class="type">tsvector</code></code>
64 Returns a vector that lists the same lexemes as the given vector, but
65 lacks any position or weight information. The result is usually much
66 smaller than an unstripped vector, but it is also less useful.
67 Relevance ranking does not work as well on stripped vectors as
68 unstripped ones. Also,
69 the <code class="literal"><-></code> (FOLLOWED BY) <code class="type">tsquery</code> operator
70 will never match stripped input, since it cannot determine the
71 distance between lexeme occurrences.
72 </p></dd></dl></div><p>
73 A full list of <code class="type">tsvector</code>-related functions is available
74 in <a class="xref" href="functions-textsearch.html#TEXTSEARCH-FUNCTIONS-TABLE" title="Table 9.43. Text Search Functions">Table 9.43</a>.
75 </p></div><div class="sect2" id="TEXTSEARCH-MANIPULATE-TSQUERY"><div class="titlepage"><div><div><h3 class="title">12.4.2. Manipulating Queries <a href="#TEXTSEARCH-MANIPULATE-TSQUERY" class="id_link">#</a></h3></div></div></div><p>
76 <a class="xref" href="textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES" title="12.3.2. Parsing Queries">Section 12.3.2</a> showed how raw textual
77 queries can be converted into <code class="type">tsquery</code> values.
78 <span class="productname">PostgreSQL</span> also provides functions and
79 operators that can be used to manipulate queries that are already
80 in <code class="type">tsquery</code> form.
81 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">
82 <code class="literal"><code class="type">tsquery</code> && <code class="type">tsquery</code></code>
84 Returns the AND-combination of the two given queries.
85 </p></dd><dt><span class="term">
86 <code class="literal"><code class="type">tsquery</code> || <code class="type">tsquery</code></code>
88 Returns the OR-combination of the two given queries.
89 </p></dd><dt><span class="term">
90 <code class="literal">!! <code class="type">tsquery</code></code>
92 Returns the negation (NOT) of the given query.
93 </p></dd><dt><span class="term">
94 <code class="literal"><code class="type">tsquery</code> <-> <code class="type">tsquery</code></code>
96 Returns a query that searches for a match to the first given query
97 immediately followed by a match to the second given query, using
98 the <code class="literal"><-></code> (FOLLOWED BY)
99 <code class="type">tsquery</code> operator. For example:
101 </p><pre class="screen">
102 SELECT to_tsquery('fat') <-> to_tsquery('cat | rat');
104 ----------------------------
105 'fat' <-> ( 'cat' | 'rat' )
107 </p></dd><dt><span class="term">
108 <a id="id-1.5.11.7.4.3.5.1.1" class="indexterm"></a>
110 <code class="literal">tsquery_phrase(<em class="replaceable"><code>query1</code></em> <code class="type">tsquery</code>, <em class="replaceable"><code>query2</code></em> <code class="type">tsquery</code> [, <em class="replaceable"><code>distance</code></em> <code class="type">integer</code> ]) returns <code class="type">tsquery</code></code>
112 Returns a query that searches for a match to the first given query
113 followed by a match to the second given query at a distance of exactly
114 <em class="replaceable"><code>distance</code></em> lexemes, using
115 the <code class="literal"><<em class="replaceable"><code>N</code></em>></code>
116 <code class="type">tsquery</code> operator. For example:
118 </p><pre class="screen">
119 SELECT tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10);
122 'fat' <10> 'cat'
124 </p></dd><dt><span class="term">
125 <a id="id-1.5.11.7.4.3.6.1.1" class="indexterm"></a>
127 <code class="literal">numnode(<em class="replaceable"><code>query</code></em> <code class="type">tsquery</code>) returns <code class="type">integer</code></code>
129 Returns the number of nodes (lexemes plus operators) in a
130 <code class="type">tsquery</code>. This function is useful
131 to determine if the <em class="replaceable"><code>query</code></em> is meaningful
132 (returns > 0), or contains only stop words (returns 0).
135 </p><pre class="screen">
136 SELECT numnode(plainto_tsquery('the any'));
137 NOTICE: query contains only stopword(s) or doesn't contain lexeme(s), ignored
142 SELECT numnode('foo & bar'::tsquery);
147 </p></dd><dt><span class="term">
148 <a id="id-1.5.11.7.4.3.7.1.1" class="indexterm"></a>
150 <code class="literal">querytree(<em class="replaceable"><code>query</code></em> <code class="type">tsquery</code>) returns <code class="type">text</code></code>
152 Returns the portion of a <code class="type">tsquery</code> that can be used for
153 searching an index. This function is useful for detecting
154 unindexable queries, for example those containing only stop words
155 or only negated terms. For example:
157 </p><pre class="screen">
158 SELECT querytree(to_tsquery('defined'));
163 SELECT querytree(to_tsquery('!defined'));
168 </p></dd></dl></div><div class="sect3" id="TEXTSEARCH-QUERY-REWRITING"><div class="titlepage"><div><div><h4 class="title">12.4.2.1. Query Rewriting <a href="#TEXTSEARCH-QUERY-REWRITING" class="id_link">#</a></h4></div></div></div><a id="id-1.5.11.7.4.4.2" class="indexterm"></a><p>
169 The <code class="function">ts_rewrite</code> family of functions search a
170 given <code class="type">tsquery</code> for occurrences of a target
171 subquery, and replace each occurrence with a
172 substitute subquery. In essence this operation is a
173 <code class="type">tsquery</code>-specific version of substring replacement.
174 A target and substitute combination can be
175 thought of as a <em class="firstterm">query rewrite rule</em>. A collection
176 of such rewrite rules can be a powerful search aid.
177 For example, you can expand the search using synonyms
178 (e.g., <code class="literal">new york</code>, <code class="literal">big apple</code>, <code class="literal">nyc</code>,
179 <code class="literal">gotham</code>) or narrow the search to direct the user to some hot
180 topic. There is some overlap in functionality between this feature
181 and thesaurus dictionaries (<a class="xref" href="textsearch-dictionaries.html#TEXTSEARCH-THESAURUS" title="12.6.4. Thesaurus Dictionary">Section 12.6.4</a>).
182 However, you can modify a set of rewrite rules on-the-fly without
183 reindexing, whereas updating a thesaurus requires reindexing to be
185 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">
186 <code class="literal">ts_rewrite (<em class="replaceable"><code>query</code></em> <code class="type">tsquery</code>, <em class="replaceable"><code>target</code></em> <code class="type">tsquery</code>, <em class="replaceable"><code>substitute</code></em> <code class="type">tsquery</code>) returns <code class="type">tsquery</code></code>
188 This form of <code class="function">ts_rewrite</code> simply applies a single
189 rewrite rule: <em class="replaceable"><code>target</code></em>
190 is replaced by <em class="replaceable"><code>substitute</code></em>
191 wherever it appears in <em class="replaceable"><code>query</code></em>. For example:
193 </p><pre class="screen">
194 SELECT ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'c'::tsquery);
199 </p></dd><dt><span class="term">
200 <code class="literal">ts_rewrite (<em class="replaceable"><code>query</code></em> <code class="type">tsquery</code>, <em class="replaceable"><code>select</code></em> <code class="type">text</code>) returns <code class="type">tsquery</code></code>
202 This form of <code class="function">ts_rewrite</code> accepts a starting
203 <em class="replaceable"><code>query</code></em> and an SQL <em class="replaceable"><code>select</code></em> command, which
204 is given as a text string. The <em class="replaceable"><code>select</code></em> must yield two
205 columns of <code class="type">tsquery</code> type. For each row of the
206 <em class="replaceable"><code>select</code></em> result, occurrences of the first column value
207 (the target) are replaced by the second column value (the substitute)
208 within the current <em class="replaceable"><code>query</code></em> value. For example:
210 </p><pre class="screen">
211 CREATE TABLE aliases (t tsquery PRIMARY KEY, s tsquery);
212 INSERT INTO aliases VALUES('a', 'c');
214 SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases');
220 Note that when multiple rewrite rules are applied in this way,
221 the order of application can be important; so in practice you will
222 want the source query to <code class="literal">ORDER BY</code> some ordering key.
223 </p></dd></dl></div><p>
224 Let's consider a real-life astronomical example. We'll expand query
225 <code class="literal">supernovae</code> using table-driven rewriting rules:
227 </p><pre class="screen">
228 CREATE TABLE aliases (t tsquery primary key, s tsquery);
229 INSERT INTO aliases VALUES(to_tsquery('supernovae'), to_tsquery('supernovae|sn'));
231 SELECT ts_rewrite(to_tsquery('supernovae & crab'), 'SELECT * FROM aliases');
233 ---------------------------------
234 'crab' & ( 'supernova' | 'sn' )
237 We can change the rewriting rules just by updating the table:
239 </p><pre class="screen">
241 SET s = to_tsquery('supernovae|sn & !nebulae')
242 WHERE t = to_tsquery('supernovae');
244 SELECT ts_rewrite(to_tsquery('supernovae & crab'), 'SELECT * FROM aliases');
246 ---------------------------------------------
247 'crab' & ( 'supernova' | 'sn' & !'nebula' )
250 Rewriting can be slow when there are many rewriting rules, since it
251 checks every rule for a possible match. To filter out obvious non-candidate
252 rules we can use the containment operators for the <code class="type">tsquery</code>
253 type. In the example below, we select only those rules which might match
256 </p><pre class="screen">
257 SELECT ts_rewrite('a & b'::tsquery,
258 'SELECT t,s FROM aliases WHERE ''a & b''::tsquery @> t');
263 </p></div></div><div class="sect2" id="TEXTSEARCH-UPDATE-TRIGGERS"><div class="titlepage"><div><div><h3 class="title">12.4.3. Triggers for Automatic Updates <a href="#TEXTSEARCH-UPDATE-TRIGGERS" class="id_link">#</a></h3></div></div></div><a id="id-1.5.11.7.5.2" class="indexterm"></a><div class="note"><h3 class="title">Note</h3><p>
264 The method described in this section has been obsoleted by the use of
265 stored generated columns, as described in <a class="xref" href="textsearch-tables.html#TEXTSEARCH-TABLES-INDEX" title="12.2.2. Creating Indexes">Section 12.2.2</a>.
267 When using a separate column to store the <code class="type">tsvector</code> representation
268 of your documents, it is necessary to create a trigger to update the
269 <code class="type">tsvector</code> column when the document content columns change.
270 Two built-in trigger functions are available for this, or you can write
272 </p><pre class="synopsis">
273 tsvector_update_trigger(<em class="replaceable"><code>tsvector_column_name</code></em>, <em class="replaceable"><code>config_name</code></em>, <em class="replaceable"><code>text_column_name</code></em> [<span class="optional">, ... </span>])
274 tsvector_update_trigger_column(<em class="replaceable"><code>tsvector_column_name</code></em>, <em class="replaceable"><code>config_column_name</code></em>, <em class="replaceable"><code>text_column_name</code></em> [<span class="optional">, ... </span>])
276 These trigger functions automatically compute a <code class="type">tsvector</code>
277 column from one or more textual columns, under the control of
278 parameters specified in the <code class="command">CREATE TRIGGER</code> command.
279 An example of their use is:
281 </p><pre class="screen">
282 CREATE TABLE messages (
288 CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
289 ON messages FOR EACH ROW EXECUTE FUNCTION
290 tsvector_update_trigger(tsv, 'pg_catalog.english', title, body);
292 INSERT INTO messages VALUES('title here', 'the body text is here');
294 SELECT * FROM messages;
296 ------------+-----------------------+----------------------------
297 title here | the body text is here | 'bodi':4 'text':5 'titl':1
299 SELECT title, body FROM messages WHERE tsv @@ to_tsquery('title & body');
301 ------------+-----------------------
302 title here | the body text is here
305 Having created this trigger, any change in <code class="structfield">title</code> or
306 <code class="structfield">body</code> will automatically be reflected into
307 <code class="structfield">tsv</code>, without the application having to worry about it.
309 The first trigger argument must be the name of the <code class="type">tsvector</code>
310 column to be updated. The second argument specifies the text search
311 configuration to be used to perform the conversion. For
312 <code class="function">tsvector_update_trigger</code>, the configuration name is simply
313 given as the second trigger argument. It must be schema-qualified as
314 shown above, so that the trigger behavior will not change with changes
315 in <code class="varname">search_path</code>. For
316 <code class="function">tsvector_update_trigger_column</code>, the second trigger argument
317 is the name of another table column, which must be of type
318 <code class="type">regconfig</code>. This allows a per-row selection of configuration
319 to be made. The remaining argument(s) are the names of textual columns
320 (of type <code class="type">text</code>, <code class="type">varchar</code>, or <code class="type">char</code>). These
321 will be included in the document in the order given. NULL values will
322 be skipped (but the other columns will still be indexed).
324 A limitation of these built-in triggers is that they treat all the
325 input columns alike. To process columns differently — for
326 example, to weight title differently from body — it is necessary
327 to write a custom trigger. Here is an example using
328 <span class="application">PL/pgSQL</span> as the trigger language:
330 </p><pre class="programlisting">
331 CREATE FUNCTION messages_trigger() RETURNS trigger AS $$
334 setweight(to_tsvector('pg_catalog.english', coalesce(new.title,'')), 'A') ||
335 setweight(to_tsvector('pg_catalog.english', coalesce(new.body,'')), 'D');
340 CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
341 ON messages FOR EACH ROW EXECUTE FUNCTION messages_trigger();
344 Keep in mind that it is important to specify the configuration name
345 explicitly when creating <code class="type">tsvector</code> values inside triggers,
346 so that the column's contents will not be affected by changes to
347 <code class="varname">default_text_search_config</code>. Failure to do this is likely to
348 lead to problems such as search results changing after a dump and restore.
349 </p></div><div class="sect2" id="TEXTSEARCH-STATISTICS"><div class="titlepage"><div><div><h3 class="title">12.4.4. Gathering Document Statistics <a href="#TEXTSEARCH-STATISTICS" class="id_link">#</a></h3></div></div></div><a id="id-1.5.11.7.6.2" class="indexterm"></a><p>
350 The function <code class="function">ts_stat</code> is useful for checking your
351 configuration and for finding stop-word candidates.
352 </p><pre class="synopsis">
353 ts_stat(<em class="replaceable"><code>sqlquery</code></em> <code class="type">text</code>, [<span class="optional"> <em class="replaceable"><code>weights</code></em> <code class="type">text</code>, </span>]
354 OUT <em class="replaceable"><code>word</code></em> <code class="type">text</code>, OUT <em class="replaceable"><code>ndoc</code></em> <code class="type">integer</code>,
355 OUT <em class="replaceable"><code>nentry</code></em> <code class="type">integer</code>) returns <code class="type">setof record</code>
357 <em class="replaceable"><code>sqlquery</code></em> is a text value containing an SQL
358 query which must return a single <code class="type">tsvector</code> column.
359 <code class="function">ts_stat</code> executes the query and returns statistics about
360 each distinct lexeme (word) contained in the <code class="type">tsvector</code>
361 data. The columns returned are
363 </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; "><li class="listitem" style="list-style-type: disc"><p>
364 <em class="replaceable"><code>word</code></em> <code class="type">text</code> — the value of a lexeme
365 </p></li><li class="listitem" style="list-style-type: disc"><p>
366 <em class="replaceable"><code>ndoc</code></em> <code class="type">integer</code> — number of documents
367 (<code class="type">tsvector</code>s) the word occurred in
368 </p></li><li class="listitem" style="list-style-type: disc"><p>
369 <em class="replaceable"><code>nentry</code></em> <code class="type">integer</code> — total number of
370 occurrences of the word
371 </p></li></ul></div><p>
373 If <em class="replaceable"><code>weights</code></em> is supplied, only occurrences
374 having one of those weights are counted.
376 For example, to find the ten most frequent words in a document collection:
378 </p><pre class="programlisting">
379 SELECT * FROM ts_stat('SELECT vector FROM apod')
380 ORDER BY nentry DESC, ndoc DESC, word
384 The same, but counting only word occurrences with weight <code class="literal">A</code>
385 or <code class="literal">B</code>:
387 </p><pre class="programlisting">
388 SELECT * FROM ts_stat('SELECT vector FROM apod', 'ab')
389 ORDER BY nentry DESC, ndoc DESC, word
392 </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="textsearch-controls.html" title="12.3. Controlling Text Search">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="textsearch-parsers.html" title="12.5. Parsers">Next</a></td></tr><tr><td width="40%" align="left" valign="top">12.3. Controlling Text Search </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 12.5. Parsers</td></tr></table></div></body></html>