9.13. Text Search Functions and Operators #

   Table 9.42, Table 9.43 and Table 9.44 summarize the functions and
   operators that are provided for full text searching. See Chapter 12 for
   a detailed explanation of PostgreSQL's text search facility.

   Table 9.42. Text Search Operators

   Operator

   Description

   Example(s)

   tsvector @@ tsquery → boolean

   tsquery @@ tsvector → boolean

   Does tsvector match tsquery? (The arguments can be given in either
   order.)

   to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') → t

   text @@ tsquery → boolean

   Does text string, after implicit invocation of to_tsvector(), match
   tsquery?

   'fat cats ate rats' @@ to_tsquery('cat & rat') → t

   tsvector || tsvector → tsvector

   Concatenates two tsvectors. If both inputs contain lexeme positions,
   the second input's positions are adjusted accordingly.

   'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector → 'a':1 'b':2,5 'c':3
   'd':4

   tsquery && tsquery → tsquery

   ANDs two tsquerys together, producing a query that matches documents
   that match both input queries.

   'fat | rat'::tsquery && 'cat'::tsquery → ( 'fat' | 'rat' ) & 'cat'

   tsquery || tsquery → tsquery

   ORs two tsquerys together, producing a query that matches documents
   that match either input query.

   'fat | rat'::tsquery || 'cat'::tsquery → 'fat' | 'rat' | 'cat'

   !! tsquery → tsquery

   Negates a tsquery, producing a query that matches documents that do not
   match the input query.

   !! 'cat'::tsquery → !'cat'

   tsquery <-> tsquery → tsquery

   Constructs a phrase query, which matches if the two input queries match
   at successive lexemes.

   to_tsquery('fat') <-> to_tsquery('rat') → 'fat' <-> 'rat'

   tsquery @> tsquery → boolean

   Does first tsquery contain the second? (This considers only whether all
   the lexemes appearing in one query appear in the other, ignoring the
   combining operators.)

   'cat'::tsquery @> 'cat & rat'::tsquery → f

   tsquery <@ tsquery → boolean

   Is first tsquery contained in the second? (This considers only whether
   all the lexemes appearing in one query appear in the other, ignoring
   the combining operators.)

   'cat'::tsquery <@ 'cat & rat'::tsquery → t

   'cat'::tsquery <@ '!cat & rat'::tsquery → t

   In addition to these specialized operators, the usual comparison
   operators shown in Table 9.1 are available for types tsvector and
   tsquery. These are not very useful for text searching but allow, for
   example, unique indexes to be built on columns of these types.

   Table 9.43. Text Search Functions

   Function

   Description

   Example(s)

   array_to_tsvector ( text[] ) → tsvector

   Converts an array of text strings to a tsvector. The given strings are
   used as lexemes as-is, without further processing. Array elements must
   not be empty strings or NULL.

   array_to_tsvector('{fat,cat,rat}'::text[]) → 'cat' 'fat' 'rat'

   get_current_ts_config ( ) → regconfig

   Returns the OID of the current default text search configuration (as
   set by default_text_search_config).

   get_current_ts_config() → english

   length ( tsvector ) → integer

   Returns the number of lexemes in the tsvector.

   length('fat:2,4 cat:3 rat:5A'::tsvector) → 3

   numnode ( tsquery ) → integer

   Returns the number of lexemes plus operators in the tsquery.

   numnode('(fat & rat) | cat'::tsquery) → 5

   plainto_tsquery ( [ config regconfig, ] query text ) → tsquery

   Converts text to a tsquery, normalizing words according to the
   specified or default configuration. Any punctuation in the string is
   ignored (it does not determine query operators). The resulting query
   matches documents containing all non-stopwords in the text.

   plainto_tsquery('english', 'The Fat Rats') → 'fat' & 'rat'

   phraseto_tsquery ( [ config regconfig, ] query text ) → tsquery

   Converts text to a tsquery, normalizing words according to the
   specified or default configuration. Any punctuation in the string is
   ignored (it does not determine query operators). The resulting query
   matches phrases containing all non-stopwords in the text.

   phraseto_tsquery('english', 'The Fat Rats') → 'fat' <-> 'rat'

   phraseto_tsquery('english', 'The Cat and Rats') → 'cat' <2> 'rat'

   websearch_to_tsquery ( [ config regconfig, ] query text ) → tsquery

   Converts text to a tsquery, normalizing words according to the
   specified or default configuration. Quoted word sequences are converted
   to phrase tests. The word “or” is understood as producing an OR
   operator, and a dash produces a NOT operator; other punctuation is
   ignored. This approximates the behavior of some common web search
   tools.

   websearch_to_tsquery('english', '"fat rat" or cat dog') → 'fat' <->
   'rat' | 'cat' & 'dog'

   querytree ( tsquery ) → text

   Produces a representation of the indexable portion of a tsquery. A
   result that is empty or just T indicates a non-indexable query.

   querytree('foo & ! bar'::tsquery) → 'foo'

   setweight ( vector tsvector, weight "char" ) → tsvector

   Assigns the specified weight to each element of the vector.

   setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') → 'cat':3A 'fat':2A,4A
   'rat':5A

   setweight ( vector tsvector, weight "char", lexemes text[] ) → tsvector

   Assigns the specified weight to elements of the vector that are listed
   in lexemes. The strings in lexemes are taken as lexemes as-is, without
   further processing. Strings that do not match any lexeme in vector are
   ignored.

   setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') →
   'cat':3A 'fat':2,4 'rat':5A,6A

   strip ( tsvector ) → tsvector

   Removes positions and weights from the tsvector.

   strip('fat:2,4 cat:3 rat:5A'::tsvector) → 'cat' 'fat' 'rat'

   to_tsquery ( [ config regconfig, ] query text ) → tsquery

   Converts text to a tsquery, normalizing words according to the
   specified or default configuration. The words must be combined by valid
   tsquery operators.

   to_tsquery('english', 'The & Fat & Rats') → 'fat' & 'rat'

   to_tsvector ( [ config regconfig, ] document text ) → tsvector

   Converts text to a tsvector, normalizing words according to the
   specified or default configuration. Position information is included in
   the result.

   to_tsvector('english', 'The Fat Rats') → 'fat':2 'rat':3

   to_tsvector ( [ config regconfig, ] document json ) → tsvector

   to_tsvector ( [ config regconfig, ] document jsonb ) → tsvector

   Converts each string value in the JSON document to a tsvector,
   normalizing words according to the specified or default configuration.
   The results are then concatenated in document order to produce the
   output. Position information is generated as though one stopword exists
   between each pair of string values. (Beware that “document order” of
   the fields of a JSON object is implementation-dependent when the input
   is jsonb; observe the difference in the examples.)

   to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::json) →
   'dog':5 'fat':2 'rat':3

   to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::jsonb) →
   'dog':1 'fat':4 'rat':5

   json_to_tsvector ( [ config regconfig, ] document json, filter jsonb )
   → tsvector

   jsonb_to_tsvector ( [ config regconfig, ] document jsonb, filter jsonb
   ) → tsvector

   Selects each item in the JSON document that is requested by the filter
   and converts each one to a tsvector, normalizing words according to the
   specified or default configuration. The results are then concatenated
   in document order to produce the output. Position information is
   generated as though one stopword exists between each pair of selected
   items. (Beware that “document order” of the fields of a JSON object is
   implementation-dependent when the input is jsonb.) The filter must be a
   jsonb array containing zero or more of these keywords: "string" (to
   include all string values), "numeric" (to include all numeric values),
   "boolean" (to include all boolean values), "key" (to include all keys),
   or "all" (to include all the above). As a special case, the filter can
   also be a simple JSON value that is one of these keywords.

   json_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json,
   '["string", "numeric"]') → '123':5 'fat':2 'rat':3

   json_to_tsvector('english', '{"cat": "The Fat Rats", "dog":
   123}'::json, '"all"') → '123':9 'cat':1 'dog':7 'fat':4 'rat':5

   ts_delete ( vector tsvector, lexeme text ) → tsvector

   Removes any occurrence of the given lexeme from the vector. The lexeme
   string is treated as a lexeme as-is, without further processing.

   ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') → 'cat':3 'rat':5A

   ts_delete ( vector tsvector, lexemes text[] ) → tsvector

   Removes any occurrences of the lexemes in lexemes from the vector. The
   strings in lexemes are taken as lexemes as-is, without further
   processing. Strings that do not match any lexeme in vector are ignored.

   ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) →
   'cat':3

   ts_filter ( vector tsvector, weights "char"[] ) → tsvector

   Selects only elements with the given weights from the vector.

   ts_filter('fat:2,4 cat:3b,7c rat:5A'::tsvector, '{a,b}') → 'cat':3B
   'rat':5A

   ts_headline ( [ config regconfig, ] document text, query tsquery [,
   options text ] ) → text

   Displays, in an abbreviated form, the match(es) for the query in the
   document, which must be raw text not a tsvector. Words in the document
   are normalized according to the specified or default configuration
   before matching to the query. Use of this function is discussed in
   Section 12.3.4, which also describes the available options.

   ts_headline('The fat cat ate the rat.', 'cat') → The fat <b>cat</b> ate
   the rat.

   ts_headline ( [ config regconfig, ] document json, query tsquery [,
   options text ] ) → text

   ts_headline ( [ config regconfig, ] document jsonb, query tsquery [,
   options text ] ) → text

   Displays, in an abbreviated form, match(es) for the query that occur in
   string values within the JSON document. See Section 12.3.4 for more
   details.

   ts_headline('{"cat":"raining cats and dogs"}'::jsonb, 'cat') → {"cat":
   "raining <b>cats</b> and dogs"}

   ts_rank ( [ weights real[], ] vector tsvector, query tsquery [,
   normalization integer ] ) → real

   Computes a score showing how well the vector matches the query. See
   Section 12.3.3 for details.

   ts_rank(to_tsvector('raining cats and dogs'), 'cat') → 0.06079271

   ts_rank_cd ( [ weights real[], ] vector tsvector, query tsquery [,
   normalization integer ] ) → real

   Computes a score showing how well the vector matches the query, using a
   cover density algorithm. See Section 12.3.3 for details.

   ts_rank_cd(to_tsvector('raining cats and dogs'), 'cat') → 0.1

   ts_rewrite ( query tsquery, target tsquery, substitute tsquery ) →
   tsquery

   Replaces occurrences of target with substitute within the query. See
   Section 12.4.2.1 for details.

   ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery) → 'b' &
   ( 'foo' | 'bar' )

   ts_rewrite ( query tsquery, select text ) → tsquery

   Replaces portions of the query according to target(s) and substitute(s)
   obtained by executing a SELECT command. See Section 12.4.2.1 for
   details.

   SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases') → 'b' &
   ( 'foo' | 'bar' )

   tsquery_phrase ( query1 tsquery, query2 tsquery ) → tsquery

   Constructs a phrase query that searches for matches of query1 and
   query2 at successive lexemes (same as <-> operator).

   tsquery_phrase(to_tsquery('fat'), to_tsquery('cat')) → 'fat' <-> 'cat'

   tsquery_phrase ( query1 tsquery, query2 tsquery, distance integer ) →
   tsquery

   Constructs a phrase query that searches for matches of query1 and
   query2 that occur exactly distance lexemes apart.

   tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10) → 'fat' <10>
   'cat'

   tsvector_to_array ( tsvector ) → text[]

   Converts a tsvector to an array of lexemes.

   tsvector_to_array('fat:2,4 cat:3 rat:5A'::tsvector) → {cat,fat,rat}

   unnest ( tsvector ) → setof record ( lexeme text, positions smallint[],
   weights text )

   Expands a tsvector into a set of rows, one per lexeme.

   select * from unnest('cat:3 fat:2,4 rat:5A'::tsvector) →
 lexeme | positions | weights
--------+-----------+---------
 cat    | {3}       | {D}
 fat    | {2,4}     | {D,D}
 rat    | {5}       | {A}

Note

   All the text search functions that accept an optional regconfig
   argument will use the configuration specified by
   default_text_search_config when that argument is omitted.

   The functions in Table 9.44 are listed separately because they are not
   usually used in everyday text searching operations. They are primarily
   helpful for development and debugging of new text search
   configurations.

   Table 9.44. Text Search Debugging Functions

   Function

   Description

   Example(s)

   ts_debug ( [ config regconfig, ] document text ) → setof record ( alias
   text, description text, token text, dictionaries regdictionary[],
   dictionary regdictionary, lexemes text[] )

   Extracts and normalizes tokens from the document according to the
   specified or default text search configuration, and returns information
   about how each token was processed. See Section 12.8.1 for details.

   ts_debug('english', 'The Brightest supernovaes') → (asciiword,"Word,
   all ASCII",The,{english_stem},english_stem,{}) ...

   ts_lexize ( dict regdictionary, token text ) → text[]

   Returns an array of replacement lexemes if the input token is known to
   the dictionary, or an empty array if the token is known to the
   dictionary but it is a stop word, or NULL if it is not a known word.
   See Section 12.8.3 for details.

   ts_lexize('english_stem', 'stars') → {star}

   ts_parse ( parser_name text, document text ) → setof record ( tokid
   integer, token text )

   Extracts tokens from the document using the named parser. See
   Section 12.8.2 for details.

   ts_parse('default', 'foo - bar') → (1,foo) ...

   ts_parse ( parser_oid oid, document text ) → setof record ( tokid
   integer, token text )

   Extracts tokens from the document using a parser specified by OID. See
   Section 12.8.2 for details.

   ts_parse(3722, 'foo - bar') → (1,foo) ...

   ts_token_type ( parser_name text ) → setof record ( tokid integer,
   alias text, description text )

   Returns a table that describes each type of token the named parser can
   recognize. See Section 12.8.2 for details.

   ts_token_type('default') → (1,asciiword,"Word, all ASCII") ...

   ts_token_type ( parser_oid oid ) → setof record ( tokid integer, alias
   text, description text )

   Returns a table that describes each type of token a parser specified by
   OID can recognize. See Section 12.8.2 for details.

   ts_token_type(3722) → (1,asciiword,"Word, all ASCII") ...

   ts_stat ( sqlquery text [, weights text ] ) → setof record ( word text,
   ndoc integer, nentry integer )

   Executes the sqlquery, which must return a single tsvector column, and
   returns statistics about each distinct lexeme contained in the data.
   See Section 12.4.4 for details.

   ts_stat('SELECT vector FROM apod') → (foo,10,15) ...