2 9.13. Text Search Functions and Operators #
4 Table 9.42, Table 9.43 and Table 9.44 summarize the functions and
5 operators that are provided for full text searching. See Chapter 12 for
6 a detailed explanation of PostgreSQL's text search facility.
8 Table 9.42. Text Search Operators
16 tsvector @@ tsquery → boolean
18 tsquery @@ tsvector → boolean
20 Does tsvector match tsquery? (The arguments can be given in either
23 to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') → t
25 text @@ tsquery → boolean
27 Does text string, after implicit invocation of to_tsvector(), match
30 'fat cats ate rats' @@ to_tsquery('cat & rat') → t
32 tsvector || tsvector → tsvector
34 Concatenates two tsvectors. If both inputs contain lexeme positions,
35 the second input's positions are adjusted accordingly.
37 'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector → 'a':1 'b':2,5 'c':3
40 tsquery && tsquery → tsquery
42 ANDs two tsquerys together, producing a query that matches documents
43 that match both input queries.
45 'fat | rat'::tsquery && 'cat'::tsquery → ( 'fat' | 'rat' ) & 'cat'
47 tsquery || tsquery → tsquery
49 ORs two tsquerys together, producing a query that matches documents
50 that match either input query.
52 'fat | rat'::tsquery || 'cat'::tsquery → 'fat' | 'rat' | 'cat'
56 Negates a tsquery, producing a query that matches documents that do not
57 match the input query.
59 !! 'cat'::tsquery → !'cat'
61 tsquery <-> tsquery → tsquery
63 Constructs a phrase query, which matches if the two input queries match
64 at successive lexemes.
66 to_tsquery('fat') <-> to_tsquery('rat') → 'fat' <-> 'rat'
68 tsquery @> tsquery → boolean
70 Does first tsquery contain the second? (This considers only whether all
71 the lexemes appearing in one query appear in the other, ignoring the
74 'cat'::tsquery @> 'cat & rat'::tsquery → f
76 tsquery <@ tsquery → boolean
78 Is first tsquery contained in the second? (This considers only whether
79 all the lexemes appearing in one query appear in the other, ignoring
80 the combining operators.)
82 'cat'::tsquery <@ 'cat & rat'::tsquery → t
84 'cat'::tsquery <@ '!cat & rat'::tsquery → t
86 In addition to these specialized operators, the usual comparison
87 operators shown in Table 9.1 are available for types tsvector and
88 tsquery. These are not very useful for text searching but allow, for
89 example, unique indexes to be built on columns of these types.
91 Table 9.43. Text Search Functions
99 array_to_tsvector ( text[] ) → tsvector
101 Converts an array of text strings to a tsvector. The given strings are
102 used as lexemes as-is, without further processing. Array elements must
103 not be empty strings or NULL.
105 array_to_tsvector('{fat,cat,rat}'::text[]) → 'cat' 'fat' 'rat'
107 get_current_ts_config ( ) → regconfig
109 Returns the OID of the current default text search configuration (as
110 set by default_text_search_config).
112 get_current_ts_config() → english
114 length ( tsvector ) → integer
116 Returns the number of lexemes in the tsvector.
118 length('fat:2,4 cat:3 rat:5A'::tsvector) → 3
120 numnode ( tsquery ) → integer
122 Returns the number of lexemes plus operators in the tsquery.
124 numnode('(fat & rat) | cat'::tsquery) → 5
126 plainto_tsquery ( [ config regconfig, ] query text ) → tsquery
128 Converts text to a tsquery, normalizing words according to the
129 specified or default configuration. Any punctuation in the string is
130 ignored (it does not determine query operators). The resulting query
131 matches documents containing all non-stopwords in the text.
133 plainto_tsquery('english', 'The Fat Rats') → 'fat' & 'rat'
135 phraseto_tsquery ( [ config regconfig, ] query text ) → tsquery
137 Converts text to a tsquery, normalizing words according to the
138 specified or default configuration. Any punctuation in the string is
139 ignored (it does not determine query operators). The resulting query
140 matches phrases containing all non-stopwords in the text.
142 phraseto_tsquery('english', 'The Fat Rats') → 'fat' <-> 'rat'
144 phraseto_tsquery('english', 'The Cat and Rats') → 'cat' <2> 'rat'
146 websearch_to_tsquery ( [ config regconfig, ] query text ) → tsquery
148 Converts text to a tsquery, normalizing words according to the
149 specified or default configuration. Quoted word sequences are converted
150 to phrase tests. The word “or” is understood as producing an OR
151 operator, and a dash produces a NOT operator; other punctuation is
152 ignored. This approximates the behavior of some common web search
155 websearch_to_tsquery('english', '"fat rat" or cat dog') → 'fat' <->
156 'rat' | 'cat' & 'dog'
158 querytree ( tsquery ) → text
160 Produces a representation of the indexable portion of a tsquery. A
161 result that is empty or just T indicates a non-indexable query.
163 querytree('foo & ! bar'::tsquery) → 'foo'
165 setweight ( vector tsvector, weight "char" ) → tsvector
167 Assigns the specified weight to each element of the vector.
169 setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') → 'cat':3A 'fat':2A,4A
172 setweight ( vector tsvector, weight "char", lexemes text[] ) → tsvector
174 Assigns the specified weight to elements of the vector that are listed
175 in lexemes. The strings in lexemes are taken as lexemes as-is, without
176 further processing. Strings that do not match any lexeme in vector are
179 setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') →
180 'cat':3A 'fat':2,4 'rat':5A,6A
182 strip ( tsvector ) → tsvector
184 Removes positions and weights from the tsvector.
186 strip('fat:2,4 cat:3 rat:5A'::tsvector) → 'cat' 'fat' 'rat'
188 to_tsquery ( [ config regconfig, ] query text ) → tsquery
190 Converts text to a tsquery, normalizing words according to the
191 specified or default configuration. The words must be combined by valid
194 to_tsquery('english', 'The & Fat & Rats') → 'fat' & 'rat'
196 to_tsvector ( [ config regconfig, ] document text ) → tsvector
198 Converts text to a tsvector, normalizing words according to the
199 specified or default configuration. Position information is included in
202 to_tsvector('english', 'The Fat Rats') → 'fat':2 'rat':3
204 to_tsvector ( [ config regconfig, ] document json ) → tsvector
206 to_tsvector ( [ config regconfig, ] document jsonb ) → tsvector
208 Converts each string value in the JSON document to a tsvector,
209 normalizing words according to the specified or default configuration.
210 The results are then concatenated in document order to produce the
211 output. Position information is generated as though one stopword exists
212 between each pair of string values. (Beware that “document order” of
213 the fields of a JSON object is implementation-dependent when the input
214 is jsonb; observe the difference in the examples.)
216 to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::json) →
217 'dog':5 'fat':2 'rat':3
219 to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::jsonb) →
220 'dog':1 'fat':4 'rat':5
222 json_to_tsvector ( [ config regconfig, ] document json, filter jsonb )
225 jsonb_to_tsvector ( [ config regconfig, ] document jsonb, filter jsonb
228 Selects each item in the JSON document that is requested by the filter
229 and converts each one to a tsvector, normalizing words according to the
230 specified or default configuration. The results are then concatenated
231 in document order to produce the output. Position information is
232 generated as though one stopword exists between each pair of selected
233 items. (Beware that “document order” of the fields of a JSON object is
234 implementation-dependent when the input is jsonb.) The filter must be a
235 jsonb array containing zero or more of these keywords: "string" (to
236 include all string values), "numeric" (to include all numeric values),
237 "boolean" (to include all boolean values), "key" (to include all keys),
238 or "all" (to include all the above). As a special case, the filter can
239 also be a simple JSON value that is one of these keywords.
241 json_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json,
242 '["string", "numeric"]') → '123':5 'fat':2 'rat':3
244 json_to_tsvector('english', '{"cat": "The Fat Rats", "dog":
245 123}'::json, '"all"') → '123':9 'cat':1 'dog':7 'fat':4 'rat':5
247 ts_delete ( vector tsvector, lexeme text ) → tsvector
249 Removes any occurrence of the given lexeme from the vector. The lexeme
250 string is treated as a lexeme as-is, without further processing.
252 ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') → 'cat':3 'rat':5A
254 ts_delete ( vector tsvector, lexemes text[] ) → tsvector
256 Removes any occurrences of the lexemes in lexemes from the vector. The
257 strings in lexemes are taken as lexemes as-is, without further
258 processing. Strings that do not match any lexeme in vector are ignored.
260 ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) →
263 ts_filter ( vector tsvector, weights "char"[] ) → tsvector
265 Selects only elements with the given weights from the vector.
267 ts_filter('fat:2,4 cat:3b,7c rat:5A'::tsvector, '{a,b}') → 'cat':3B
270 ts_headline ( [ config regconfig, ] document text, query tsquery [,
271 options text ] ) → text
273 Displays, in an abbreviated form, the match(es) for the query in the
274 document, which must be raw text not a tsvector. Words in the document
275 are normalized according to the specified or default configuration
276 before matching to the query. Use of this function is discussed in
277 Section 12.3.4, which also describes the available options.
279 ts_headline('The fat cat ate the rat.', 'cat') → The fat <b>cat</b> ate
282 ts_headline ( [ config regconfig, ] document json, query tsquery [,
283 options text ] ) → text
285 ts_headline ( [ config regconfig, ] document jsonb, query tsquery [,
286 options text ] ) → text
288 Displays, in an abbreviated form, match(es) for the query that occur in
289 string values within the JSON document. See Section 12.3.4 for more
292 ts_headline('{"cat":"raining cats and dogs"}'::jsonb, 'cat') → {"cat":
293 "raining <b>cats</b> and dogs"}
295 ts_rank ( [ weights real[], ] vector tsvector, query tsquery [,
296 normalization integer ] ) → real
298 Computes a score showing how well the vector matches the query. See
299 Section 12.3.3 for details.
301 ts_rank(to_tsvector('raining cats and dogs'), 'cat') → 0.06079271
303 ts_rank_cd ( [ weights real[], ] vector tsvector, query tsquery [,
304 normalization integer ] ) → real
306 Computes a score showing how well the vector matches the query, using a
307 cover density algorithm. See Section 12.3.3 for details.
309 ts_rank_cd(to_tsvector('raining cats and dogs'), 'cat') → 0.1
311 ts_rewrite ( query tsquery, target tsquery, substitute tsquery ) →
314 Replaces occurrences of target with substitute within the query. See
315 Section 12.4.2.1 for details.
317 ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery) → 'b' &
320 ts_rewrite ( query tsquery, select text ) → tsquery
322 Replaces portions of the query according to target(s) and substitute(s)
323 obtained by executing a SELECT command. See Section 12.4.2.1 for
326 SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases') → 'b' &
329 tsquery_phrase ( query1 tsquery, query2 tsquery ) → tsquery
331 Constructs a phrase query that searches for matches of query1 and
332 query2 at successive lexemes (same as <-> operator).
334 tsquery_phrase(to_tsquery('fat'), to_tsquery('cat')) → 'fat' <-> 'cat'
336 tsquery_phrase ( query1 tsquery, query2 tsquery, distance integer ) →
339 Constructs a phrase query that searches for matches of query1 and
340 query2 that occur exactly distance lexemes apart.
342 tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10) → 'fat' <10>
345 tsvector_to_array ( tsvector ) → text[]
347 Converts a tsvector to an array of lexemes.
349 tsvector_to_array('fat:2,4 cat:3 rat:5A'::tsvector) → {cat,fat,rat}
351 unnest ( tsvector ) → setof record ( lexeme text, positions smallint[],
354 Expands a tsvector into a set of rows, one per lexeme.
356 select * from unnest('cat:3 fat:2,4 rat:5A'::tsvector) →
357 lexeme | positions | weights
358 --------+-----------+---------
365 All the text search functions that accept an optional regconfig
366 argument will use the configuration specified by
367 default_text_search_config when that argument is omitted.
369 The functions in Table 9.44 are listed separately because they are not
370 usually used in everyday text searching operations. They are primarily
371 helpful for development and debugging of new text search
374 Table 9.44. Text Search Debugging Functions
382 ts_debug ( [ config regconfig, ] document text ) → setof record ( alias
383 text, description text, token text, dictionaries regdictionary[],
384 dictionary regdictionary, lexemes text[] )
386 Extracts and normalizes tokens from the document according to the
387 specified or default text search configuration, and returns information
388 about how each token was processed. See Section 12.8.1 for details.
390 ts_debug('english', 'The Brightest supernovaes') → (asciiword,"Word,
391 all ASCII",The,{english_stem},english_stem,{}) ...
393 ts_lexize ( dict regdictionary, token text ) → text[]
395 Returns an array of replacement lexemes if the input token is known to
396 the dictionary, or an empty array if the token is known to the
397 dictionary but it is a stop word, or NULL if it is not a known word.
398 See Section 12.8.3 for details.
400 ts_lexize('english_stem', 'stars') → {star}
402 ts_parse ( parser_name text, document text ) → setof record ( tokid
403 integer, token text )
405 Extracts tokens from the document using the named parser. See
406 Section 12.8.2 for details.
408 ts_parse('default', 'foo - bar') → (1,foo) ...
410 ts_parse ( parser_oid oid, document text ) → setof record ( tokid
411 integer, token text )
413 Extracts tokens from the document using a parser specified by OID. See
414 Section 12.8.2 for details.
416 ts_parse(3722, 'foo - bar') → (1,foo) ...
418 ts_token_type ( parser_name text ) → setof record ( tokid integer,
419 alias text, description text )
421 Returns a table that describes each type of token the named parser can
422 recognize. See Section 12.8.2 for details.
424 ts_token_type('default') → (1,asciiword,"Word, all ASCII") ...
426 ts_token_type ( parser_oid oid ) → setof record ( tokid integer, alias
427 text, description text )
429 Returns a table that describes each type of token a parser specified by
430 OID can recognize. See Section 12.8.2 for details.
432 ts_token_type(3722) → (1,asciiword,"Word, all ASCII") ...
434 ts_stat ( sqlquery text [, weights text ] ) → setof record ( word text,
435 ndoc integer, nentry integer )
437 Executes the sqlquery, which must return a single tsvector column, and
438 returns statistics about each distinct lexeme contained in the data.
439 See Section 12.4.4 for details.
441 ts_stat('SELECT vector FROM apod') → (foo,10,15) ...