1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.16. fuzzystrmatch — determine string similarities and distance</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="file-fdw.html" title="F.15. file_fdw — access data files in the server's file system" /><link rel="next" href="hstore.html" title="F.17. hstore — hstore key/value datatype" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.16. fuzzystrmatch — determine string similarities and distance</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="file-fdw.html" title="F.15. file_fdw — access data files in the server's file system">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules and Extensions</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="hstore.html" title="F.17. hstore — hstore key/value datatype">Next</a></td></tr></table><hr /></div><div class="sect1" id="FUZZYSTRMATCH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.16. fuzzystrmatch — determine string similarities and distance <a href="#FUZZYSTRMATCH" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-SOUNDEX">F.16.1. Soundex</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-DAITCH-MOKOTOFF">F.16.2. Daitch-Mokotoff Soundex</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-LEVENSHTEIN">F.16.3. Levenshtein</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-METAPHONE">F.16.4. Metaphone</a></span></dt><dt><span class="sect2"><a href="fuzzystrmatch.html#FUZZYSTRMATCH-DOUBLE-METAPHONE">F.16.5. Double Metaphone</a></span></dt></dl></div><a id="id-1.11.7.26.2" class="indexterm"></a><p>
3 The <code class="filename">fuzzystrmatch</code> module provides several
4 functions to determine similarities and distance between strings.
5 </p><div class="caution"><h3 class="title">Caution</h3><p>
6 At present, the <code class="function">soundex</code>, <code class="function">metaphone</code>,
7 <code class="function">dmetaphone</code>, and <code class="function">dmetaphone_alt</code> functions do
8 not work well with multibyte encodings (such as UTF-8).
9 Use <code class="function">daitch_mokotoff</code>
10 or <code class="function">levenshtein</code> with such data.
12 This module is considered <span class="quote">“<span class="quote">trusted</span>”</span>, that is, it can be
13 installed by non-superusers who have <code class="literal">CREATE</code> privilege
14 on the current database.
15 </p><div class="sect2" id="FUZZYSTRMATCH-SOUNDEX"><div class="titlepage"><div><div><h3 class="title">F.16.1. Soundex <a href="#FUZZYSTRMATCH-SOUNDEX" class="id_link">#</a></h3></div></div></div><p>
16 The Soundex system is a method of matching similar-sounding names
17 by converting them to the same code. It was initially used by the
18 United States Census in 1880, 1900, and 1910. Note that Soundex
19 is not very useful for non-English names.
21 The <code class="filename">fuzzystrmatch</code> module provides two functions
22 for working with Soundex codes:
23 </p><a id="id-1.11.7.26.6.4" class="indexterm"></a><a id="id-1.11.7.26.6.5" class="indexterm"></a><pre class="synopsis">
24 soundex(text) returns text
25 difference(text, text) returns int
27 The <code class="function">soundex</code> function converts a string to its Soundex code.
28 The <code class="function">difference</code> function converts two strings to their Soundex
29 codes and then reports the number of matching code positions. Since
30 Soundex codes have four characters, the result ranges from zero to four,
31 with zero being no match and four being an exact match. (Thus, the
32 function is misnamed — <code class="function">similarity</code> would have been
35 Here are some usage examples:
36 </p><pre class="programlisting">
37 SELECT soundex('hello world!');
39 SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
40 SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
41 SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
43 CREATE TABLE s (nm text);
45 INSERT INTO s VALUES ('john');
46 INSERT INTO s VALUES ('joan');
47 INSERT INTO s VALUES ('wobbly');
48 INSERT INTO s VALUES ('jack');
50 SELECT * FROM s WHERE soundex(nm) = soundex('john');
52 SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
53 </pre></div><div class="sect2" id="FUZZYSTRMATCH-DAITCH-MOKOTOFF"><div class="titlepage"><div><div><h3 class="title">F.16.2. Daitch-Mokotoff Soundex <a href="#FUZZYSTRMATCH-DAITCH-MOKOTOFF" class="id_link">#</a></h3></div></div></div><p>
54 Like the original Soundex system, Daitch-Mokotoff Soundex matches
55 similar-sounding names by converting them to the same code.
56 However, Daitch-Mokotoff Soundex is significantly more useful for
57 non-English names than the original system.
58 Major improvements over the original system include:
60 </p><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; "><li class="listitem" style="list-style-type: disc"><p>
61 The code is based on the first six meaningful letters rather than four.
62 </p></li><li class="listitem" style="list-style-type: disc"><p>
63 A letter or combination of letters maps into ten possible codes rather
65 </p></li><li class="listitem" style="list-style-type: disc"><p>
66 Where two consecutive letters have a single sound, they are coded as a
68 </p></li><li class="listitem" style="list-style-type: disc"><p>
69 When a letter or combination of letters may have different sounds,
70 multiple codes are emitted to cover all possibilities.
71 </p></li></ul></div><p>
72 </p><a id="id-1.11.7.26.7.3" class="indexterm"></a><p>
73 This function generates the Daitch-Mokotoff soundex codes for its input:
74 </p><pre class="synopsis">
75 daitch_mokotoff(<em class="parameter"><code>source</code></em> text) returns text[]
77 The result may contain one or more codes depending on how many plausible
78 pronunciations there are, so it is represented as an array.
80 Since a Daitch-Mokotoff soundex code consists of only 6 digits,
81 <em class="parameter"><code>source</code></em> should be preferably a single word or name.
83 Here are some examples:
84 </p><pre class="programlisting">
85 SELECT daitch_mokotoff('George');
90 SELECT daitch_mokotoff('John');
95 SELECT daitch_mokotoff('Bierschbach');
97 -----------------------------------------------------------
98 {794575,794574,794750,794740,745750,745740,747500,747400}
100 SELECT daitch_mokotoff('Schwartzenegger');
105 For matching of single names, returned text arrays can be matched
106 directly using the <code class="literal">&&</code> operator: any overlap
107 can be considered a match. A GIN index may
108 be used for efficiency, see <a class="xref" href="gin.html" title="65.4. GIN Indexes">Section 65.4</a> and this example:
109 </p><pre class="programlisting">
110 CREATE TABLE s (nm text);
111 CREATE INDEX ix_s_dm ON s USING gin (daitch_mokotoff(nm)) WITH (fastupdate = off);
113 INSERT INTO s (nm) VALUES
120 SELECT * FROM s WHERE daitch_mokotoff(nm) && daitch_mokotoff('Swartzenegger');
121 SELECT * FROM s WHERE daitch_mokotoff(nm) && daitch_mokotoff('Jane');
122 SELECT * FROM s WHERE daitch_mokotoff(nm) && daitch_mokotoff('Jens');
124 For indexing and matching of any number of names in any order, Full Text
125 Search features can be used. See <a class="xref" href="textsearch.html" title="Chapter 12. Full Text Search">Chapter 12</a> and this
127 </p><pre class="programlisting">
128 CREATE FUNCTION soundex_tsvector(v_name text) RETURNS tsvector
130 SELECT to_tsvector('simple',
131 string_agg(array_to_string(daitch_mokotoff(n), ' '), ' '))
132 FROM regexp_split_to_table(v_name, '\s+') AS n;
135 CREATE FUNCTION soundex_tsquery(v_name text) RETURNS tsquery
137 SELECT string_agg('(' || array_to_string(daitch_mokotoff(n), '|') || ')', '&')::tsquery
138 FROM regexp_split_to_table(v_name, '\s+') AS n;
141 CREATE TABLE s (nm text);
142 CREATE INDEX ix_s_txt ON s USING gin (soundex_tsvector(nm)) WITH (fastupdate = off);
144 INSERT INTO s (nm) VALUES
151 SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('john');
152 SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('jane doe');
153 SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('john public');
154 SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('besst, giorgio');
155 SELECT * FROM s WHERE soundex_tsvector(nm) @@ soundex_tsquery('Jameson John');
157 If it is desired to avoid recalculation of soundex codes during index
158 rechecks, an index on a separate column can be used instead of an index on
159 an expression. A stored generated column can be used for this; see
160 <a class="xref" href="ddl-generated-columns.html" title="5.4. Generated Columns">Section 5.4</a>.
161 </p></div><div class="sect2" id="FUZZYSTRMATCH-LEVENSHTEIN"><div class="titlepage"><div><div><h3 class="title">F.16.3. Levenshtein <a href="#FUZZYSTRMATCH-LEVENSHTEIN" class="id_link">#</a></h3></div></div></div><p>
162 This function calculates the Levenshtein distance between two strings:
163 </p><a id="id-1.11.7.26.8.3" class="indexterm"></a><a id="id-1.11.7.26.8.4" class="indexterm"></a><pre class="synopsis">
164 levenshtein(source text, target text, ins_cost int, del_cost int, sub_cost int) returns int
165 levenshtein(source text, target text) returns int
166 levenshtein_less_equal(source text, target text, ins_cost int, del_cost int, sub_cost int, max_d int) returns int
167 levenshtein_less_equal(source text, target text, max_d int) returns int
169 Both <code class="literal">source</code> and <code class="literal">target</code> can be any
170 non-null string, with a maximum of 255 characters. The cost parameters
171 specify how much to charge for a character insertion, deletion, or
172 substitution, respectively. You can omit the cost parameters, as in
173 the second version of the function; in that case they all default to 1.
175 <code class="function">levenshtein_less_equal</code> is an accelerated version of the
176 Levenshtein function for use when only small distances are of interest.
177 If the actual distance is less than or equal to <code class="literal">max_d</code>,
178 then <code class="function">levenshtein_less_equal</code> returns the correct
179 distance; otherwise it returns some value greater than <code class="literal">max_d</code>.
180 If <code class="literal">max_d</code> is negative then the behavior is the same as
181 <code class="function">levenshtein</code>.
184 </p><pre class="screen">
185 test=# SELECT levenshtein('GUMBO', 'GAMBOL');
191 test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2, 1, 1);
197 test=# SELECT levenshtein_less_equal('extensive', 'exhaustive', 2);
198 levenshtein_less_equal
199 ------------------------
203 test=# SELECT levenshtein_less_equal('extensive', 'exhaustive', 4);
204 levenshtein_less_equal
205 ------------------------
208 </pre></div><div class="sect2" id="FUZZYSTRMATCH-METAPHONE"><div class="titlepage"><div><div><h3 class="title">F.16.4. Metaphone <a href="#FUZZYSTRMATCH-METAPHONE" class="id_link">#</a></h3></div></div></div><p>
209 Metaphone, like Soundex, is based on the idea of constructing a
210 representative code for an input string. Two strings are then
211 deemed similar if they have the same codes.
213 This function calculates the metaphone code of an input string:
214 </p><a id="id-1.11.7.26.9.4" class="indexterm"></a><pre class="synopsis">
215 metaphone(source text, max_output_length int) returns text
217 <code class="literal">source</code> has to be a non-null string with a maximum of
218 255 characters. <code class="literal">max_output_length</code> sets the maximum
219 length of the output metaphone code; if longer, the output is truncated
223 </p><pre class="screen">
224 test=# SELECT metaphone('GUMBO', 4);
229 </pre></div><div class="sect2" id="FUZZYSTRMATCH-DOUBLE-METAPHONE"><div class="titlepage"><div><div><h3 class="title">F.16.5. Double Metaphone <a href="#FUZZYSTRMATCH-DOUBLE-METAPHONE" class="id_link">#</a></h3></div></div></div><p>
230 The Double Metaphone system computes two <span class="quote">“<span class="quote">sounds like</span>”</span> strings
231 for a given input string — a <span class="quote">“<span class="quote">primary</span>”</span> and an
232 <span class="quote">“<span class="quote">alternate</span>”</span>. In most cases they are the same, but for non-English
233 names especially they can be a bit different, depending on pronunciation.
234 These functions compute the primary and alternate codes:
235 </p><a id="id-1.11.7.26.10.3" class="indexterm"></a><a id="id-1.11.7.26.10.4" class="indexterm"></a><pre class="synopsis">
236 dmetaphone(source text) returns text
237 dmetaphone_alt(source text) returns text
239 There is no length limit on the input strings.
242 </p><pre class="screen">
243 test=# SELECT dmetaphone('gumbo');
248 </pre></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="file-fdw.html" title="F.15. file_fdw — access data files in the server's file system">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="hstore.html" title="F.17. hstore — hstore key/value datatype">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.15. file_fdw — access data files in the server's file system </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> F.17. hstore — hstore key/value datatype</td></tr></table></div></body></html>