begriffs open source - ai-pg/blob - full-docs/txt/functions-textsearch.txt

   1
   2 9.13. Text Search Functions and Operators #
   3
   4    Table 9.42, Table 9.43 and Table 9.44 summarize the functions and
   5    operators that are provided for full text searching. See Chapter 12 for
   6    a detailed explanation of PostgreSQL's text search facility.
   7
   8    Table 9.42. Text Search Operators
   9
  10    Operator
  11
  12    Description
  13
  14    Example(s)
  15
  16    tsvector @@ tsquery → boolean
  17
  18    tsquery @@ tsvector → boolean
  19
  20    Does tsvector match tsquery? (The arguments can be given in either
  21    order.)
  22
  23    to_tsvector('fat cats ate rats') @@ to_tsquery('cat & rat') → t
  24
  25    text @@ tsquery → boolean
  26
  27    Does text string, after implicit invocation of to_tsvector(), match
  28    tsquery?
  29
  30    'fat cats ate rats' @@ to_tsquery('cat & rat') → t
  31
  32    tsvector || tsvector → tsvector
  33
  34    Concatenates two tsvectors. If both inputs contain lexeme positions,
  35    the second input's positions are adjusted accordingly.
  36
  37    'a:1 b:2'::tsvector || 'c:1 d:2 b:3'::tsvector → 'a':1 'b':2,5 'c':3
  38    'd':4
  39
  40    tsquery && tsquery → tsquery
  41
  42    ANDs two tsquerys together, producing a query that matches documents
  43    that match both input queries.
  44
  45    'fat | rat'::tsquery && 'cat'::tsquery → ( 'fat' | 'rat' ) & 'cat'
  46
  47    tsquery || tsquery → tsquery
  48
  49    ORs two tsquerys together, producing a query that matches documents
  50    that match either input query.
  51
  52    'fat | rat'::tsquery || 'cat'::tsquery → 'fat' | 'rat' | 'cat'
  53
  54    !! tsquery → tsquery
  55
  56    Negates a tsquery, producing a query that matches documents that do not
  57    match the input query.
  58
  59    !! 'cat'::tsquery → !'cat'
  60
  61    tsquery <-> tsquery → tsquery
  62
  63    Constructs a phrase query, which matches if the two input queries match
  64    at successive lexemes.
  65
  66    to_tsquery('fat') <-> to_tsquery('rat') → 'fat' <-> 'rat'
  67
  68    tsquery @> tsquery → boolean
  69
  70    Does first tsquery contain the second? (This considers only whether all
  71    the lexemes appearing in one query appear in the other, ignoring the
  72    combining operators.)
  73
  74    'cat'::tsquery @> 'cat & rat'::tsquery → f
  75
  76    tsquery <@ tsquery → boolean
  77
  78    Is first tsquery contained in the second? (This considers only whether
  79    all the lexemes appearing in one query appear in the other, ignoring
  80    the combining operators.)
  81
  82    'cat'::tsquery <@ 'cat & rat'::tsquery → t
  83
  84    'cat'::tsquery <@ '!cat & rat'::tsquery → t
  85
  86    In addition to these specialized operators, the usual comparison
  87    operators shown in Table 9.1 are available for types tsvector and
  88    tsquery. These are not very useful for text searching but allow, for
  89    example, unique indexes to be built on columns of these types.
  90
  91    Table 9.43. Text Search Functions
  92
  93    Function
  94
  95    Description
  96
  97    Example(s)
  98
  99    array_to_tsvector ( text[] ) → tsvector
 100
 101    Converts an array of text strings to a tsvector. The given strings are
 102    used as lexemes as-is, without further processing. Array elements must
 103    not be empty strings or NULL.
 104
 105    array_to_tsvector('{fat,cat,rat}'::text[]) → 'cat' 'fat' 'rat'
 106
 107    get_current_ts_config ( ) → regconfig
 108
 109    Returns the OID of the current default text search configuration (as
 110    set by default_text_search_config).
 111
 112    get_current_ts_config() → english
 113
 114    length ( tsvector ) → integer
 115
 116    Returns the number of lexemes in the tsvector.
 117
 118    length('fat:2,4 cat:3 rat:5A'::tsvector) → 3
 119
 120    numnode ( tsquery ) → integer
 121
 122    Returns the number of lexemes plus operators in the tsquery.
 123
 124    numnode('(fat & rat) | cat'::tsquery) → 5
 125
 126    plainto_tsquery ( [ config regconfig, ] query text ) → tsquery
 127
 128    Converts text to a tsquery, normalizing words according to the
 129    specified or default configuration. Any punctuation in the string is
 130    ignored (it does not determine query operators). The resulting query
 131    matches documents containing all non-stopwords in the text.
 132
 133    plainto_tsquery('english', 'The Fat Rats') → 'fat' & 'rat'
 134
 135    phraseto_tsquery ( [ config regconfig, ] query text ) → tsquery
 136
 137    Converts text to a tsquery, normalizing words according to the
 138    specified or default configuration. Any punctuation in the string is
 139    ignored (it does not determine query operators). The resulting query
 140    matches phrases containing all non-stopwords in the text.
 141
 142    phraseto_tsquery('english', 'The Fat Rats') → 'fat' <-> 'rat'
 143
 144    phraseto_tsquery('english', 'The Cat and Rats') → 'cat' <2> 'rat'
 145
 146    websearch_to_tsquery ( [ config regconfig, ] query text ) → tsquery
 147
 148    Converts text to a tsquery, normalizing words according to the
 149    specified or default configuration. Quoted word sequences are converted
 150    to phrase tests. The word “or” is understood as producing an OR
 151    operator, and a dash produces a NOT operator; other punctuation is
 152    ignored. This approximates the behavior of some common web search
 153    tools.
 154
 155    websearch_to_tsquery('english', '"fat rat" or cat dog') → 'fat' <->
 156    'rat' | 'cat' & 'dog'
 157
 158    querytree ( tsquery ) → text
 159
 160    Produces a representation of the indexable portion of a tsquery. A
 161    result that is empty or just T indicates a non-indexable query.
 162
 163    querytree('foo & ! bar'::tsquery) → 'foo'
 164
 165    setweight ( vector tsvector, weight "char" ) → tsvector
 166
 167    Assigns the specified weight to each element of the vector.
 168
 169    setweight('fat:2,4 cat:3 rat:5B'::tsvector, 'A') → 'cat':3A 'fat':2A,4A
 170    'rat':5A
 171
 172    setweight ( vector tsvector, weight "char", lexemes text[] ) → tsvector
 173
 174    Assigns the specified weight to elements of the vector that are listed
 175    in lexemes. The strings in lexemes are taken as lexemes as-is, without
 176    further processing. Strings that do not match any lexeme in vector are
 177    ignored.
 178
 179    setweight('fat:2,4 cat:3 rat:5,6B'::tsvector, 'A', '{cat,rat}') →
 180    'cat':3A 'fat':2,4 'rat':5A,6A
 181
 182    strip ( tsvector ) → tsvector
 183
 184    Removes positions and weights from the tsvector.
 185
 186    strip('fat:2,4 cat:3 rat:5A'::tsvector) → 'cat' 'fat' 'rat'
 187
 188    to_tsquery ( [ config regconfig, ] query text ) → tsquery
 189
 190    Converts text to a tsquery, normalizing words according to the
 191    specified or default configuration. The words must be combined by valid
 192    tsquery operators.
 193
 194    to_tsquery('english', 'The & Fat & Rats') → 'fat' & 'rat'
 195
 196    to_tsvector ( [ config regconfig, ] document text ) → tsvector
 197
 198    Converts text to a tsvector, normalizing words according to the
 199    specified or default configuration. Position information is included in
 200    the result.
 201
 202    to_tsvector('english', 'The Fat Rats') → 'fat':2 'rat':3
 203
 204    to_tsvector ( [ config regconfig, ] document json ) → tsvector
 205
 206    to_tsvector ( [ config regconfig, ] document jsonb ) → tsvector
 207
 208    Converts each string value in the JSON document to a tsvector,
 209    normalizing words according to the specified or default configuration.
 210    The results are then concatenated in document order to produce the
 211    output. Position information is generated as though one stopword exists
 212    between each pair of string values. (Beware that “document order” of
 213    the fields of a JSON object is implementation-dependent when the input
 214    is jsonb; observe the difference in the examples.)
 215
 216    to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::json) →
 217    'dog':5 'fat':2 'rat':3
 218
 219    to_tsvector('english', '{"aa": "The Fat Rats", "b": "dog"}'::jsonb) →
 220    'dog':1 'fat':4 'rat':5
 221
 222    json_to_tsvector ( [ config regconfig, ] document json, filter jsonb )
 223    → tsvector
 224
 225    jsonb_to_tsvector ( [ config regconfig, ] document jsonb, filter jsonb
 226    ) → tsvector
 227
 228    Selects each item in the JSON document that is requested by the filter
 229    and converts each one to a tsvector, normalizing words according to the
 230    specified or default configuration. The results are then concatenated
 231    in document order to produce the output. Position information is
 232    generated as though one stopword exists between each pair of selected
 233    items. (Beware that “document order” of the fields of a JSON object is
 234    implementation-dependent when the input is jsonb.) The filter must be a
 235    jsonb array containing zero or more of these keywords: "string" (to
 236    include all string values), "numeric" (to include all numeric values),
 237    "boolean" (to include all boolean values), "key" (to include all keys),
 238    or "all" (to include all the above). As a special case, the filter can
 239    also be a simple JSON value that is one of these keywords.
 240
 241    json_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json,
 242    '["string", "numeric"]') → '123':5 'fat':2 'rat':3
 243
 244    json_to_tsvector('english', '{"cat": "The Fat Rats", "dog":
 245    123}'::json, '"all"') → '123':9 'cat':1 'dog':7 'fat':4 'rat':5
 246
 247    ts_delete ( vector tsvector, lexeme text ) → tsvector
 248
 249    Removes any occurrence of the given lexeme from the vector. The lexeme
 250    string is treated as a lexeme as-is, without further processing.
 251
 252    ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, 'fat') → 'cat':3 'rat':5A
 253
 254    ts_delete ( vector tsvector, lexemes text[] ) → tsvector
 255
 256    Removes any occurrences of the lexemes in lexemes from the vector. The
 257    strings in lexemes are taken as lexemes as-is, without further
 258    processing. Strings that do not match any lexeme in vector are ignored.
 259
 260    ts_delete('fat:2,4 cat:3 rat:5A'::tsvector, ARRAY['fat','rat']) →
 261    'cat':3
 262
 263    ts_filter ( vector tsvector, weights "char"[] ) → tsvector
 264
 265    Selects only elements with the given weights from the vector.
 266
 267    ts_filter('fat:2,4 cat:3b,7c rat:5A'::tsvector, '{a,b}') → 'cat':3B
 268    'rat':5A
 269
 270    ts_headline ( [ config regconfig, ] document text, query tsquery [,
 271    options text ] ) → text
 272
 273    Displays, in an abbreviated form, the match(es) for the query in the
 274    document, which must be raw text not a tsvector. Words in the document
 275    are normalized according to the specified or default configuration
 276    before matching to the query. Use of this function is discussed in
 277    Section 12.3.4, which also describes the available options.
 278
 279    ts_headline('The fat cat ate the rat.', 'cat') → The fat <b>cat</b> ate
 280    the rat.
 281
 282    ts_headline ( [ config regconfig, ] document json, query tsquery [,
 283    options text ] ) → text
 284
 285    ts_headline ( [ config regconfig, ] document jsonb, query tsquery [,
 286    options text ] ) → text
 287
 288    Displays, in an abbreviated form, match(es) for the query that occur in
 289    string values within the JSON document. See Section 12.3.4 for more
 290    details.
 291
 292    ts_headline('{"cat":"raining cats and dogs"}'::jsonb, 'cat') → {"cat":
 293    "raining <b>cats</b> and dogs"}
 294
 295    ts_rank ( [ weights real[], ] vector tsvector, query tsquery [,
 296    normalization integer ] ) → real
 297
 298    Computes a score showing how well the vector matches the query. See
 299    Section 12.3.3 for details.
 300
 301    ts_rank(to_tsvector('raining cats and dogs'), 'cat') → 0.06079271
 302
 303    ts_rank_cd ( [ weights real[], ] vector tsvector, query tsquery [,
 304    normalization integer ] ) → real
 305
 306    Computes a score showing how well the vector matches the query, using a
 307    cover density algorithm. See Section 12.3.3 for details.
 308
 309    ts_rank_cd(to_tsvector('raining cats and dogs'), 'cat') → 0.1
 310
 311    ts_rewrite ( query tsquery, target tsquery, substitute tsquery ) →
 312    tsquery
 313
 314    Replaces occurrences of target with substitute within the query. See
 315    Section 12.4.2.1 for details.
 316
 317    ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery) → 'b' &
 318    ( 'foo' | 'bar' )
 319
 320    ts_rewrite ( query tsquery, select text ) → tsquery
 321
 322    Replaces portions of the query according to target(s) and substitute(s)
 323    obtained by executing a SELECT command. See Section 12.4.2.1 for
 324    details.
 325
 326    SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases') → 'b' &
 327    ( 'foo' | 'bar' )
 328
 329    tsquery_phrase ( query1 tsquery, query2 tsquery ) → tsquery
 330
 331    Constructs a phrase query that searches for matches of query1 and
 332    query2 at successive lexemes (same as <-> operator).
 333
 334    tsquery_phrase(to_tsquery('fat'), to_tsquery('cat')) → 'fat' <-> 'cat'
 335
 336    tsquery_phrase ( query1 tsquery, query2 tsquery, distance integer ) →
 337    tsquery
 338
 339    Constructs a phrase query that searches for matches of query1 and
 340    query2 that occur exactly distance lexemes apart.
 341
 342    tsquery_phrase(to_tsquery('fat'), to_tsquery('cat'), 10) → 'fat' <10>
 343    'cat'
 344
 345    tsvector_to_array ( tsvector ) → text[]
 346
 347    Converts a tsvector to an array of lexemes.
 348
 349    tsvector_to_array('fat:2,4 cat:3 rat:5A'::tsvector) → {cat,fat,rat}
 350
 351    unnest ( tsvector ) → setof record ( lexeme text, positions smallint[],
 352    weights text )
 353
 354    Expands a tsvector into a set of rows, one per lexeme.
 355
 356    select * from unnest('cat:3 fat:2,4 rat:5A'::tsvector) →
 357  lexeme | positions | weights
 358 --------+-----------+---------
 359  cat    | {3}       | {D}
 360  fat    | {2,4}     | {D,D}
 361  rat    | {5}       | {A}
 362
 363 Note
 364
 365    All the text search functions that accept an optional regconfig
 366    argument will use the configuration specified by
 367    default_text_search_config when that argument is omitted.
 368
 369    The functions in Table 9.44 are listed separately because they are not
 370    usually used in everyday text searching operations. They are primarily
 371    helpful for development and debugging of new text search
 372    configurations.
 373
 374    Table 9.44. Text Search Debugging Functions
 375
 376    Function
 377
 378    Description
 379
 380    Example(s)
 381
 382    ts_debug ( [ config regconfig, ] document text ) → setof record ( alias
 383    text, description text, token text, dictionaries regdictionary[],
 384    dictionary regdictionary, lexemes text[] )
 385
 386    Extracts and normalizes tokens from the document according to the
 387    specified or default text search configuration, and returns information
 388    about how each token was processed. See Section 12.8.1 for details.
 389
 390    ts_debug('english', 'The Brightest supernovaes') → (asciiword,"Word,
 391    all ASCII",The,{english_stem},english_stem,{}) ...
 392
 393    ts_lexize ( dict regdictionary, token text ) → text[]
 394
 395    Returns an array of replacement lexemes if the input token is known to
 396    the dictionary, or an empty array if the token is known to the
 397    dictionary but it is a stop word, or NULL if it is not a known word.
 398    See Section 12.8.3 for details.
 399
 400    ts_lexize('english_stem', 'stars') → {star}
 401
 402    ts_parse ( parser_name text, document text ) → setof record ( tokid
 403    integer, token text )
 404
 405    Extracts tokens from the document using the named parser. See
 406    Section 12.8.2 for details.
 407
 408    ts_parse('default', 'foo - bar') → (1,foo) ...
 409
 410    ts_parse ( parser_oid oid, document text ) → setof record ( tokid
 411    integer, token text )
 412
 413    Extracts tokens from the document using a parser specified by OID. See
 414    Section 12.8.2 for details.
 415
 416    ts_parse(3722, 'foo - bar') → (1,foo) ...
 417
 418    ts_token_type ( parser_name text ) → setof record ( tokid integer,
 419    alias text, description text )
 420
 421    Returns a table that describes each type of token the named parser can
 422    recognize. See Section 12.8.2 for details.
 423
 424    ts_token_type('default') → (1,asciiword,"Word, all ASCII") ...
 425
 426    ts_token_type ( parser_oid oid ) → setof record ( tokid integer, alias
 427    text, description text )
 428
 429    Returns a table that describes each type of token a parser specified by
 430    OID can recognize. See Section 12.8.2 for details.
 431
 432    ts_token_type(3722) → (1,asciiword,"Word, all ASCII") ...
 433
 434    ts_stat ( sqlquery text [, weights text ] ) → setof record ( word text,
 435    ndoc integer, nentry integer )
 436
 437    Executes the sqlquery, which must return a single tsvector column, and
 438    returns statistics about each distinct lexeme contained in the data.
 439    See Section 12.4.4 for details.
 440
 441    ts_stat('SELECT vector FROM apod') → (foo,10,15) ...