begriffs open source - ai-pg/blob - full-docs/txt/textsearch-tables.txt

   1
   2 12.2. Tables and Indexes #
   3
   4    12.2.1. Searching a Table
   5    12.2.2. Creating Indexes
   6
   7    The examples in the previous section illustrated full text matching
   8    using simple constant strings. This section shows how to search table
   9    data, optionally using indexes.
  10
  11 12.2.1. Searching a Table #
  12
  13    It is possible to do a full text search without an index. A simple
  14    query to print the title of each row that contains the word friend in
  15    its body field is:
  16 SELECT title
  17 FROM pgweb
  18 WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend');
  19
  20    This will also find related words such as friends and friendly, since
  21    all these are reduced to the same normalized lexeme.
  22
  23    The query above specifies that the english configuration is to be used
  24    to parse and normalize the strings. Alternatively we could omit the
  25    configuration parameters:
  26 SELECT title
  27 FROM pgweb
  28 WHERE to_tsvector(body) @@ to_tsquery('friend');
  29
  30    This query will use the configuration set by
  31    default_text_search_config.
  32
  33    A more complex example is to select the ten most recent documents that
  34    contain create and table in the title or body:
  35 SELECT title
  36 FROM pgweb
  37 WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table')
  38 ORDER BY last_mod_date DESC
  39 LIMIT 10;
  40
  41    For clarity we omitted the coalesce function calls which would be
  42    needed to find rows that contain NULL in one of the two fields.
  43
  44    Although these queries will work without an index, most applications
  45    will find this approach too slow, except perhaps for occasional ad-hoc
  46    searches. Practical use of text searching usually requires creating an
  47    index.
  48
  49 12.2.2. Creating Indexes #
  50
  51    We can create a GIN index (Section 12.9) to speed up text searches:
  52 CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', body));
  53
  54    Notice that the 2-argument version of to_tsvector is used. Only text
  55    search functions that specify a configuration name can be used in
  56    expression indexes (Section 11.7). This is because the index contents
  57    must be unaffected by default_text_search_config. If they were
  58    affected, the index contents might be inconsistent because different
  59    entries could contain tsvectors that were created with different text
  60    search configurations, and there would be no way to guess which was
  61    which. It would be impossible to dump and restore such an index
  62    correctly.
  63
  64    Because the two-argument version of to_tsvector was used in the index
  65    above, only a query reference that uses the 2-argument version of
  66    to_tsvector with the same configuration name will use that index. That
  67    is, WHERE to_tsvector('english', body) @@ 'a & b' can use the index,
  68    but WHERE to_tsvector(body) @@ 'a & b' cannot. This ensures that an
  69    index will be used only with the same configuration used to create the
  70    index entries.
  71
  72    It is possible to set up more complex expression indexes wherein the
  73    configuration name is specified by another column, e.g.:
  74 CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector(config_name, body));
  75
  76    where config_name is a column in the pgweb table. This allows mixed
  77    configurations in the same index while recording which configuration
  78    was used for each index entry. This would be useful, for example, if
  79    the document collection contained documents in different languages.
  80    Again, queries that are meant to use the index must be phrased to
  81    match, e.g., WHERE to_tsvector(config_name, body) @@ 'a & b'.
  82
  83    Indexes can even concatenate columns:
  84 CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' |
  85 | body));
  86
  87    Another approach is to create a separate tsvector column to hold the
  88    output of to_tsvector. To keep this column automatically up to date
  89    with its source data, use a stored generated column. This example is a
  90    concatenation of title and body, using coalesce to ensure that one
  91    field will still be indexed when the other is NULL:
  92 ALTER TABLE pgweb
  93     ADD COLUMN textsearchable_index_col tsvector
  94                GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') |
  95 | ' ' || coalesce(body, ''))) STORED;
  96
  97    Then we create a GIN index to speed up the search:
  98 CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);
  99
 100    Now we are ready to perform a fast full text search:
 101 SELECT title
 102 FROM pgweb
 103 WHERE textsearchable_index_col @@ to_tsquery('create & table')
 104 ORDER BY last_mod_date DESC
 105 LIMIT 10;
 106
 107    One advantage of the separate-column approach over an expression index
 108    is that it is not necessary to explicitly specify the text search
 109    configuration in queries in order to make use of the index. As shown in
 110    the example above, the query can depend on default_text_search_config.
 111    Another advantage is that searches will be faster, since it will not be
 112    necessary to redo the to_tsvector calls to verify index matches. (This
 113    is more important when using a GiST index than a GIN index; see
 114    Section 12.9.) The expression-index approach is simpler to set up,
 115    however, and it requires less disk space since the tsvector
 116    representation is not stored explicitly.