1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>12.8. Testing and Debugging Text Search</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="textsearch-configuration.html" title="12.7. Configuration Example" /><link rel="next" href="textsearch-indexes.html" title="12.9. Preferred Index Types for Text Search" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">12.8. Testing and Debugging Text Search</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="textsearch-configuration.html" title="12.7. Configuration Example">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><th width="60%" align="center">Chapter 12. Full Text Search</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="textsearch-indexes.html" title="12.9. Preferred Index Types for Text Search">Next</a></td></tr></table><hr /></div><div class="sect1" id="TEXTSEARCH-DEBUGGING"><div class="titlepage"><div><div><h2 class="title" style="clear: both">12.8. Testing and Debugging Text Search <a href="#TEXTSEARCH-DEBUGGING" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="textsearch-debugging.html#TEXTSEARCH-CONFIGURATION-TESTING">12.8.1. Configuration Testing</a></span></dt><dt><span class="sect2"><a href="textsearch-debugging.html#TEXTSEARCH-PARSER-TESTING">12.8.2. Parser Testing</a></span></dt><dt><span class="sect2"><a href="textsearch-debugging.html#TEXTSEARCH-DICTIONARY-TESTING">12.8.3. Dictionary Testing</a></span></dt></dl></div><p>
3 The behavior of a custom text search configuration can easily become
4 confusing. The functions described
5 in this section are useful for testing text search objects. You can
6 test a complete configuration, or test parsers and dictionaries separately.
7 </p><div class="sect2" id="TEXTSEARCH-CONFIGURATION-TESTING"><div class="titlepage"><div><div><h3 class="title">12.8.1. Configuration Testing <a href="#TEXTSEARCH-CONFIGURATION-TESTING" class="id_link">#</a></h3></div></div></div><p>
8 The function <code class="function">ts_debug</code> allows easy testing of a
9 text search configuration.
10 </p><a id="id-1.5.11.11.3.3" class="indexterm"></a><pre class="synopsis">
11 ts_debug([<span class="optional"> <em class="replaceable"><code>config</code></em> <code class="type">regconfig</code>, </span>] <em class="replaceable"><code>document</code></em> <code class="type">text</code>,
12 OUT <em class="replaceable"><code>alias</code></em> <code class="type">text</code>,
13 OUT <em class="replaceable"><code>description</code></em> <code class="type">text</code>,
14 OUT <em class="replaceable"><code>token</code></em> <code class="type">text</code>,
15 OUT <em class="replaceable"><code>dictionaries</code></em> <code class="type">regdictionary[]</code>,
16 OUT <em class="replaceable"><code>dictionary</code></em> <code class="type">regdictionary</code>,
17 OUT <em class="replaceable"><code>lexemes</code></em> <code class="type">text[]</code>)
20 <code class="function">ts_debug</code> displays information about every token of
21 <em class="replaceable"><code>document</code></em> as produced by the
22 parser and processed by the configured dictionaries. It uses the
23 configuration specified by <em class="replaceable"><code>config</code></em>,
24 or <code class="varname">default_text_search_config</code> if that argument is
27 <code class="function">ts_debug</code> returns one row for each token identified in the text
28 by the parser. The columns returned are
30 </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; "><li class="listitem" style="list-style-type: disc"><p>
31 <em class="replaceable"><code>alias</code></em> <code class="type">text</code> — short name of the token type
32 </p></li><li class="listitem" style="list-style-type: disc"><p>
33 <em class="replaceable"><code>description</code></em> <code class="type">text</code> — description of the
35 </p></li><li class="listitem" style="list-style-type: disc"><p>
36 <em class="replaceable"><code>token</code></em> <code class="type">text</code> — text of the token
37 </p></li><li class="listitem" style="list-style-type: disc"><p>
38 <em class="replaceable"><code>dictionaries</code></em> <code class="type">regdictionary[]</code> — the
39 dictionaries selected by the configuration for this token type
40 </p></li><li class="listitem" style="list-style-type: disc"><p>
41 <em class="replaceable"><code>dictionary</code></em> <code class="type">regdictionary</code> — the dictionary
42 that recognized the token, or <code class="literal">NULL</code> if none did
43 </p></li><li class="listitem" style="list-style-type: disc"><p>
44 <em class="replaceable"><code>lexemes</code></em> <code class="type">text[]</code> — the lexeme(s) produced
45 by the dictionary that recognized the token, or <code class="literal">NULL</code> if
46 none did; an empty array (<code class="literal">{}</code>) means it was recognized as a
48 </p></li></ul></div><p>
50 Here is a simple example:
52 </p><pre class="screen">
53 SELECT * FROM ts_debug('english', 'a fat cat sat on a mat - it ate a fat rats');
54 alias | description | token | dictionaries | dictionary | lexemes
55 -----------+-----------------+-------+----------------+--------------+---------
56 asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
57 blank | Space symbols | | {} | |
58 asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
59 blank | Space symbols | | {} | |
60 asciiword | Word, all ASCII | cat | {english_stem} | english_stem | {cat}
61 blank | Space symbols | | {} | |
62 asciiword | Word, all ASCII | sat | {english_stem} | english_stem | {sat}
63 blank | Space symbols | | {} | |
64 asciiword | Word, all ASCII | on | {english_stem} | english_stem | {}
65 blank | Space symbols | | {} | |
66 asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
67 blank | Space symbols | | {} | |
68 asciiword | Word, all ASCII | mat | {english_stem} | english_stem | {mat}
69 blank | Space symbols | | {} | |
70 blank | Space symbols | - | {} | |
71 asciiword | Word, all ASCII | it | {english_stem} | english_stem | {}
72 blank | Space symbols | | {} | |
73 asciiword | Word, all ASCII | ate | {english_stem} | english_stem | {ate}
74 blank | Space symbols | | {} | |
75 asciiword | Word, all ASCII | a | {english_stem} | english_stem | {}
76 blank | Space symbols | | {} | |
77 asciiword | Word, all ASCII | fat | {english_stem} | english_stem | {fat}
78 blank | Space symbols | | {} | |
79 asciiword | Word, all ASCII | rats | {english_stem} | english_stem | {rat}
82 For a more extensive demonstration, we
83 first create a <code class="literal">public.english</code> configuration and
84 Ispell dictionary for the English language:
85 </p><pre class="programlisting">
86 CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
88 CREATE TEXT SEARCH DICTIONARY english_ispell (
95 ALTER TEXT SEARCH CONFIGURATION public.english
96 ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
97 </pre><pre class="screen">
98 SELECT * FROM ts_debug('public.english', 'The Brightest supernovaes');
99 alias | description | token | dictionaries | dictionary | lexemes
100 -----------+-----------------+-------------+-------------------------------+----------------+-------------
101 asciiword | Word, all ASCII | The | {english_ispell,english_stem} | english_ispell | {}
102 blank | Space symbols | | {} | |
103 asciiword | Word, all ASCII | Brightest | {english_ispell,english_stem} | english_ispell | {bright}
104 blank | Space symbols | | {} | |
105 asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem | {supernova}
107 In this example, the word <code class="literal">Brightest</code> was recognized by the
108 parser as an <code class="literal">ASCII word</code> (alias <code class="literal">asciiword</code>).
109 For this token type the dictionary list is
110 <code class="literal">english_ispell</code> and
111 <code class="literal">english_stem</code>. The word was recognized by
112 <code class="literal">english_ispell</code>, which reduced it to the noun
113 <code class="literal">bright</code>. The word <code class="literal">supernovaes</code> is
114 unknown to the <code class="literal">english_ispell</code> dictionary so it
115 was passed to the next dictionary, and, fortunately, was recognized (in
116 fact, <code class="literal">english_stem</code> is a Snowball dictionary which
117 recognizes everything; that is why it was placed at the end of the
120 The word <code class="literal">The</code> was recognized by the
121 <code class="literal">english_ispell</code> dictionary as a stop word (<a class="xref" href="textsearch-dictionaries.html#TEXTSEARCH-STOPWORDS" title="12.6.1. Stop Words">Section 12.6.1</a>) and will not be indexed.
122 The spaces are discarded too, since the configuration provides no
123 dictionaries at all for them.
125 You can reduce the width of the output by explicitly specifying which columns
128 </p><pre class="screen">
129 SELECT alias, token, dictionary, lexemes
130 FROM ts_debug('public.english', 'The Brightest supernovaes');
131 alias | token | dictionary | lexemes
132 -----------+-------------+----------------+-------------
133 asciiword | The | english_ispell | {}
135 asciiword | Brightest | english_ispell | {bright}
137 asciiword | supernovaes | english_stem | {supernova}
139 </p></div><div class="sect2" id="TEXTSEARCH-PARSER-TESTING"><div class="titlepage"><div><div><h3 class="title">12.8.2. Parser Testing <a href="#TEXTSEARCH-PARSER-TESTING" class="id_link">#</a></h3></div></div></div><p>
140 The following functions allow direct testing of a text search parser.
141 </p><a id="id-1.5.11.11.4.3" class="indexterm"></a><pre class="synopsis">
142 ts_parse(<em class="replaceable"><code>parser_name</code></em> <code class="type">text</code>, <em class="replaceable"><code>document</code></em> <code class="type">text</code>,
143 OUT <em class="replaceable"><code>tokid</code></em> <code class="type">integer</code>, OUT <em class="replaceable"><code>token</code></em> <code class="type">text</code>) returns <code class="type">setof record</code>
144 ts_parse(<em class="replaceable"><code>parser_oid</code></em> <code class="type">oid</code>, <em class="replaceable"><code>document</code></em> <code class="type">text</code>,
145 OUT <em class="replaceable"><code>tokid</code></em> <code class="type">integer</code>, OUT <em class="replaceable"><code>token</code></em> <code class="type">text</code>) returns <code class="type">setof record</code>
147 <code class="function">ts_parse</code> parses the given <em class="replaceable"><code>document</code></em>
148 and returns a series of records, one for each token produced by
149 parsing. Each record includes a <code class="varname">tokid</code> showing the
150 assigned token type and a <code class="varname">token</code> which is the text of the
153 </p><pre class="screen">
154 SELECT * FROM ts_parse('default', '123 - a number');
164 </p><a id="id-1.5.11.11.4.6" class="indexterm"></a><pre class="synopsis">
165 ts_token_type(<em class="replaceable"><code>parser_name</code></em> <code class="type">text</code>, OUT <em class="replaceable"><code>tokid</code></em> <code class="type">integer</code>,
166 OUT <em class="replaceable"><code>alias</code></em> <code class="type">text</code>, OUT <em class="replaceable"><code>description</code></em> <code class="type">text</code>) returns <code class="type">setof record</code>
167 ts_token_type(<em class="replaceable"><code>parser_oid</code></em> <code class="type">oid</code>, OUT <em class="replaceable"><code>tokid</code></em> <code class="type">integer</code>,
168 OUT <em class="replaceable"><code>alias</code></em> <code class="type">text</code>, OUT <em class="replaceable"><code>description</code></em> <code class="type">text</code>) returns <code class="type">setof record</code>
170 <code class="function">ts_token_type</code> returns a table which describes each type of
171 token the specified parser can recognize. For each token type, the table
172 gives the integer <code class="varname">tokid</code> that the parser uses to label a
173 token of that type, the <code class="varname">alias</code> that names the token type
174 in configuration commands, and a short <code class="varname">description</code>. For
177 </p><pre class="screen">
178 SELECT * FROM ts_token_type('default');
179 tokid | alias | description
180 -------+-----------------+------------------------------------------
181 1 | asciiword | Word, all ASCII
182 2 | word | Word, all letters
183 3 | numword | Word, letters and digits
184 4 | email | Email address
187 7 | sfloat | Scientific notation
188 8 | version | Version number
189 9 | hword_numpart | Hyphenated word part, letters and digits
190 10 | hword_part | Hyphenated word part, all letters
191 11 | hword_asciipart | Hyphenated word part, all ASCII
192 12 | blank | Space symbols
194 14 | protocol | Protocol head
195 15 | numhword | Hyphenated word, letters and digits
196 16 | asciihword | Hyphenated word, all ASCII
197 17 | hword | Hyphenated word, all letters
198 18 | url_path | URL path
199 19 | file | File or path name
200 20 | float | Decimal notation
201 21 | int | Signed integer
202 22 | uint | Unsigned integer
203 23 | entity | XML entity
205 </p></div><div class="sect2" id="TEXTSEARCH-DICTIONARY-TESTING"><div class="titlepage"><div><div><h3 class="title">12.8.3. Dictionary Testing <a href="#TEXTSEARCH-DICTIONARY-TESTING" class="id_link">#</a></h3></div></div></div><p>
206 The <code class="function">ts_lexize</code> function facilitates dictionary testing.
207 </p><a id="id-1.5.11.11.5.3" class="indexterm"></a><pre class="synopsis">
208 ts_lexize(<em class="replaceable"><code>dict</code></em> <code class="type">regdictionary</code>, <em class="replaceable"><code>token</code></em> <code class="type">text</code>) returns <code class="type">text[]</code>
210 <code class="function">ts_lexize</code> returns an array of lexemes if the input
211 <em class="replaceable"><code>token</code></em> is known to the dictionary,
212 or an empty array if the token
213 is known to the dictionary but it is a stop word, or
214 <code class="literal">NULL</code> if it is an unknown word.
218 </p><pre class="screen">
219 SELECT ts_lexize('english_stem', 'stars');
224 SELECT ts_lexize('english_stem', 'a');
229 </p><div class="note"><h3 class="title">Note</h3><p>
230 The <code class="function">ts_lexize</code> function expects a single
231 <span class="emphasis"><em>token</em></span>, not text. Here is a case
232 where this can be confusing:
234 </p><pre class="screen">
235 SELECT ts_lexize('thesaurus_astro', 'supernovae stars') is null;
241 The thesaurus dictionary <code class="literal">thesaurus_astro</code> does know the
242 phrase <code class="literal">supernovae stars</code>, but <code class="function">ts_lexize</code>
243 fails since it does not parse the input text but treats it as a single
244 token. Use <code class="function">plainto_tsquery</code> or <code class="function">to_tsvector</code> to
245 test thesaurus dictionaries, for example:
247 </p><pre class="screen">
248 SELECT plainto_tsquery('supernovae stars');
253 </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="textsearch-configuration.html" title="12.7. Configuration Example">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="textsearch.html" title="Chapter 12. Full Text Search">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="textsearch-indexes.html" title="12.9. Preferred Index Types for Text Search">Next</a></td></tr><tr><td width="40%" align="left" valign="top">12.7. Configuration Example </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 12.9. Preferred Index Types for Text Search</td></tr></table></div></body></html>