1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>9.7. Pattern Matching</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators" /><link rel="next" href="functions-formatting.html" title="9.8. Data Type Formatting Functions" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">9.7. Pattern Matching</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="functions.html" title="Chapter 9. Functions and Operators">Up</a></td><th width="60%" align="center">Chapter 9. Functions and Operators</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="functions-formatting.html" title="9.8. Data Type Formatting Functions">Next</a></td></tr></table><hr /></div><div class="sect1" id="FUNCTIONS-MATCHING"><div class="titlepage"><div><div><h2 class="title" style="clear: both">9.7. Pattern Matching <a href="#FUNCTIONS-MATCHING" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-LIKE">9.7.1. <code class="function">LIKE</code></a></span></dt><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP">9.7.2. <code class="function">SIMILAR TO</code> Regular Expressions</a></span></dt><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-POSIX-REGEXP">9.7.3. <acronym class="acronym">POSIX</acronym> Regular Expressions</a></span></dt></dl></div><a id="id-1.5.8.13.2" class="indexterm"></a><p>
3 There are three separate approaches to pattern matching provided
4 by <span class="productname">PostgreSQL</span>: the traditional
5 <acronym class="acronym">SQL</acronym> <code class="function">LIKE</code> operator, the
6 more recent <code class="function">SIMILAR TO</code> operator (added in
7 SQL:1999), and <acronym class="acronym">POSIX</acronym>-style regular
8 expressions. Aside from the basic <span class="quote">“<span class="quote">does this string match
9 this pattern?</span>”</span> operators, functions are available to extract
10 or replace matching substrings and to split a string at matching
12 </p><div class="tip"><h3 class="title">Tip</h3><p>
13 If you have pattern matching needs that go beyond this,
14 consider writing a user-defined function in Perl or Tcl.
15 </p></div><div class="caution"><h3 class="title">Caution</h3><p>
16 While most regular-expression searches can be executed very quickly,
17 regular expressions can be contrived that take arbitrary amounts of
18 time and memory to process. Be wary of accepting regular-expression
19 search patterns from hostile sources. If you must do so, it is
20 advisable to impose a statement timeout.
22 Searches using <code class="function">SIMILAR TO</code> patterns have the same
23 security hazards, since <code class="function">SIMILAR TO</code> provides many
24 of the same capabilities as <acronym class="acronym">POSIX</acronym>-style regular
27 <code class="function">LIKE</code> searches, being much simpler than the other
28 two options, are safer to use with possibly-hostile pattern sources.
30 <code class="function">SIMILAR TO</code> and <acronym class="acronym">POSIX</acronym>-style regular
31 expressions do not support nondeterministic collations. If required, use
32 <code class="function">LIKE</code> or apply a different collation to the expression
33 to work around this limitation.
34 </p><div class="sect2" id="FUNCTIONS-LIKE"><div class="titlepage"><div><div><h3 class="title">9.7.1. <code class="function">LIKE</code> <a href="#FUNCTIONS-LIKE" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.7.2" class="indexterm"></a><pre class="synopsis">
35 <em class="replaceable"><code>string</code></em> LIKE <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
36 <em class="replaceable"><code>string</code></em> NOT LIKE <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
38 The <code class="function">LIKE</code> expression returns true if the
39 <em class="replaceable"><code>string</code></em> matches the supplied
40 <em class="replaceable"><code>pattern</code></em>. (As
41 expected, the <code class="function">NOT LIKE</code> expression returns
42 false if <code class="function">LIKE</code> returns true, and vice versa.
43 An equivalent expression is
44 <code class="literal">NOT (<em class="replaceable"><code>string</code></em> LIKE
45 <em class="replaceable"><code>pattern</code></em>)</code>.)
47 If <em class="replaceable"><code>pattern</code></em> does not contain percent
48 signs or underscores, then the pattern only represents the string
49 itself; in that case <code class="function">LIKE</code> acts like the
50 equals operator. An underscore (<code class="literal">_</code>) in
51 <em class="replaceable"><code>pattern</code></em> stands for (matches) any single
52 character; a percent sign (<code class="literal">%</code>) matches any sequence
53 of zero or more characters.
56 </p><pre class="programlisting">
57 'abc' LIKE 'abc' <em class="lineannotation"><span class="lineannotation">true</span></em>
58 'abc' LIKE 'a%' <em class="lineannotation"><span class="lineannotation">true</span></em>
59 'abc' LIKE '_b_' <em class="lineannotation"><span class="lineannotation">true</span></em>
60 'abc' LIKE 'c' <em class="lineannotation"><span class="lineannotation">false</span></em>
63 <code class="function">LIKE</code> pattern matching supports nondeterministic
64 collations (see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="23.2.2.4. Nondeterministic Collations">Section 23.2.2.4</a>), such as
65 case-insensitive collations or collations that, say, ignore punctuation.
66 So with a case-insensitive collation, one could have:
67 </p><pre class="programlisting">
68 'AbC' LIKE 'abc' COLLATE case_insensitive <em class="lineannotation"><span class="lineannotation">true</span></em>
69 'AbC' LIKE 'a%' COLLATE case_insensitive <em class="lineannotation"><span class="lineannotation">true</span></em>
71 With collations that ignore certain characters or in general that consider
72 strings of different lengths equal, the semantics can become a bit more
73 complicated. Consider these examples:
74 </p><pre class="programlisting">
75 '.foo.' LIKE 'foo' COLLATE ign_punct <em class="lineannotation"><span class="lineannotation">true</span></em>
76 '.foo.' LIKE 'f_o' COLLATE ign_punct <em class="lineannotation"><span class="lineannotation">true</span></em>
77 '.foo.' LIKE '_oo' COLLATE ign_punct <em class="lineannotation"><span class="lineannotation">false</span></em>
79 The way the matching works is that the pattern is partitioned into
80 sequences of wildcards and non-wildcard strings (wildcards being
81 <code class="literal">_</code> and <code class="literal">%</code>). For example, the pattern
82 <code class="literal">f_o</code> is partitioned into <code class="literal">f, _, o</code>, the
83 pattern <code class="literal">_oo</code> is partitioned into <code class="literal">_,
84 oo</code>. The input string matches the pattern if it can be
85 partitioned in such a way that the wildcards match one character or any
86 number of characters respectively and the non-wildcard partitions are
87 equal under the applicable collation. So for example, <code class="literal">'.foo.'
88 LIKE 'f_o' COLLATE ign_punct</code> is true because one can partition
89 <code class="literal">.foo.</code> into <code class="literal">.f, o, o.</code>, and then
90 <code class="literal">'.f' = 'f' COLLATE ign_punct</code>, <code class="literal">'o'</code>
91 matches the <code class="literal">_</code> wildcard, and <code class="literal">'o.' = 'o' COLLATE
92 ign_punct</code>. But <code class="literal">'.foo.' LIKE '_oo' COLLATE
93 ign_punct</code> is false because <code class="literal">.foo.</code> cannot be
94 partitioned in a way that the first character is any character and the
95 rest of the string compares equal to <code class="literal">oo</code>. (Note that
96 the single-character wildcard always matches exactly one character,
97 independent of the collation. So in this example, the
98 <code class="literal">_</code> would match <code class="literal">.</code>, but then the rest
99 of the input string won't match the rest of the pattern.)
101 <code class="function">LIKE</code> pattern matching always covers the entire
102 string. Therefore, if it's desired to match a sequence anywhere within
103 a string, the pattern must start and end with a percent sign.
105 To match a literal underscore or percent sign without matching
106 other characters, the respective character in
107 <em class="replaceable"><code>pattern</code></em> must be
108 preceded by the escape character. The default escape
109 character is the backslash but a different one can be selected by
110 using the <code class="literal">ESCAPE</code> clause. To match the escape
111 character itself, write two escape characters.
112 </p><div class="note"><h3 class="title">Note</h3><p>
113 If you have <a class="xref" href="runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS">standard_conforming_strings</a> turned off,
114 any backslashes you write in literal string constants will need to be
115 doubled. See <a class="xref" href="sql-syntax-lexical.html#SQL-SYNTAX-STRINGS" title="4.1.2.1. String Constants">Section 4.1.2.1</a> for more information.
117 It's also possible to select no escape character by writing
118 <code class="literal">ESCAPE ''</code>. This effectively disables the
119 escape mechanism, which makes it impossible to turn off the
120 special meaning of underscore and percent signs in the pattern.
122 According to the SQL standard, omitting <code class="literal">ESCAPE</code>
123 means there is no escape character (rather than defaulting to a
124 backslash), and a zero-length <code class="literal">ESCAPE</code> value is
125 disallowed. <span class="productname">PostgreSQL</span>'s behavior in
126 this regard is therefore slightly nonstandard.
128 The key word <code class="token">ILIKE</code> can be used instead of
129 <code class="token">LIKE</code> to make the match case-insensitive according to the
130 active locale. (But this does not support nondeterministic collations.)
131 This is not in the <acronym class="acronym">SQL</acronym> standard but is a
132 <span class="productname">PostgreSQL</span> extension.
134 The operator <code class="literal">~~</code> is equivalent to
135 <code class="function">LIKE</code>, and <code class="literal">~~*</code> corresponds to
136 <code class="function">ILIKE</code>. There are also
137 <code class="literal">!~~</code> and <code class="literal">!~~*</code> operators that
138 represent <code class="function">NOT LIKE</code> and <code class="function">NOT
139 ILIKE</code>, respectively. All of these operators are
140 <span class="productname">PostgreSQL</span>-specific. You may see these
141 operator names in <code class="command">EXPLAIN</code> output and similar
142 places, since the parser actually translates <code class="function">LIKE</code>
143 et al. to these operators.
145 The phrases <code class="function">LIKE</code>, <code class="function">ILIKE</code>,
146 <code class="function">NOT LIKE</code>, and <code class="function">NOT ILIKE</code> are
147 generally treated as operators
148 in <span class="productname">PostgreSQL</span> syntax; for example they can
149 be used in <em class="replaceable"><code>expression</code></em>
150 <em class="replaceable"><code>operator</code></em> ANY
151 (<em class="replaceable"><code>subquery</code></em>) constructs, although
152 an <code class="literal">ESCAPE</code> clause cannot be included there. In some
153 obscure cases it may be necessary to use the underlying operator names
156 Also see the starts-with operator <code class="literal">^@</code> and the
157 corresponding <code class="function">starts_with()</code> function, which are
158 useful in cases where simply matching the beginning of a string is
160 </p></div><div class="sect2" id="FUNCTIONS-SIMILARTO-REGEXP"><div class="titlepage"><div><div><h3 class="title">9.7.2. <code class="function">SIMILAR TO</code> Regular Expressions <a href="#FUNCTIONS-SIMILARTO-REGEXP" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.8.2" class="indexterm"></a><a id="id-1.5.8.13.8.3" class="indexterm"></a><a id="id-1.5.8.13.8.4" class="indexterm"></a><pre class="synopsis">
161 <em class="replaceable"><code>string</code></em> SIMILAR TO <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
162 <em class="replaceable"><code>string</code></em> NOT SIMILAR TO <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
164 The <code class="function">SIMILAR TO</code> operator returns true or
165 false depending on whether its pattern matches the given string.
166 It is similar to <code class="function">LIKE</code>, except that it
167 interprets the pattern using the SQL standard's definition of a
168 regular expression. SQL regular expressions are a curious cross
169 between <code class="function">LIKE</code> notation and common (POSIX) regular
172 Like <code class="function">LIKE</code>, the <code class="function">SIMILAR TO</code>
173 operator succeeds only if its pattern matches the entire string;
174 this is unlike common regular expression behavior where the pattern
175 can match any part of the string.
177 <code class="function">LIKE</code>, <code class="function">SIMILAR TO</code> uses
178 <code class="literal">_</code> and <code class="literal">%</code> as wildcard characters denoting
179 any single character and any string, respectively (these are
180 comparable to <code class="literal">.</code> and <code class="literal">.*</code> in POSIX regular
183 In addition to these facilities borrowed from <code class="function">LIKE</code>,
184 <code class="function">SIMILAR TO</code> supports these pattern-matching
185 metacharacters borrowed from POSIX regular expressions:
187 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
188 <code class="literal">|</code> denotes alternation (either of two alternatives).
189 </p></li><li class="listitem"><p>
190 <code class="literal">*</code> denotes repetition of the previous item zero
192 </p></li><li class="listitem"><p>
193 <code class="literal">+</code> denotes repetition of the previous item one
195 </p></li><li class="listitem"><p>
196 <code class="literal">?</code> denotes repetition of the previous item zero
198 </p></li><li class="listitem"><p>
199 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> denotes repetition
200 of the previous item exactly <em class="replaceable"><code>m</code></em> times.
201 </p></li><li class="listitem"><p>
202 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> denotes repetition
203 of the previous item <em class="replaceable"><code>m</code></em> or more times.
204 </p></li><li class="listitem"><p>
205 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code>
206 denotes repetition of the previous item at least <em class="replaceable"><code>m</code></em> and
207 not more than <em class="replaceable"><code>n</code></em> times.
208 </p></li><li class="listitem"><p>
209 Parentheses <code class="literal">()</code> can be used to group items into
210 a single logical item.
211 </p></li><li class="listitem"><p>
212 A bracket expression <code class="literal">[...]</code> specifies a character
213 class, just as in POSIX regular expressions.
214 </p></li></ul></div><p>
216 Notice that the period (<code class="literal">.</code>) is not a metacharacter
217 for <code class="function">SIMILAR TO</code>.
219 As with <code class="function">LIKE</code>, a backslash disables the special
220 meaning of any of these metacharacters. A different escape character
221 can be specified with <code class="literal">ESCAPE</code>, or the escape
222 capability can be disabled by writing <code class="literal">ESCAPE ''</code>.
224 According to the SQL standard, omitting <code class="literal">ESCAPE</code>
225 means there is no escape character (rather than defaulting to a
226 backslash), and a zero-length <code class="literal">ESCAPE</code> value is
227 disallowed. <span class="productname">PostgreSQL</span>'s behavior in
228 this regard is therefore slightly nonstandard.
230 Another nonstandard extension is that following the escape character
231 with a letter or digit provides access to the escape sequences
232 defined for POSIX regular expressions; see
233 <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>,
234 <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>, and
235 <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a> below.
238 </p><pre class="programlisting">
239 'abc' SIMILAR TO 'abc' <em class="lineannotation"><span class="lineannotation">true</span></em>
240 'abc' SIMILAR TO 'a' <em class="lineannotation"><span class="lineannotation">false</span></em>
241 'abc' SIMILAR TO '%(b|d)%' <em class="lineannotation"><span class="lineannotation">true</span></em>
242 'abc' SIMILAR TO '(b|c)%' <em class="lineannotation"><span class="lineannotation">false</span></em>
243 '-abc-' SIMILAR TO '%\mabc\M%' <em class="lineannotation"><span class="lineannotation">true</span></em>
244 'xabcy' SIMILAR TO '%\mabc\M%' <em class="lineannotation"><span class="lineannotation">false</span></em>
247 The <code class="function">substring</code> function with three parameters
248 provides extraction of a substring that matches an SQL
249 regular expression pattern. The function can be written according
250 to standard SQL syntax:
251 </p><pre class="synopsis">
252 substring(<em class="replaceable"><code>string</code></em> similar <em class="replaceable"><code>pattern</code></em> escape <em class="replaceable"><code>escape-character</code></em>)
254 or using the now obsolete SQL:1999 syntax:
255 </p><pre class="synopsis">
256 substring(<em class="replaceable"><code>string</code></em> from <em class="replaceable"><code>pattern</code></em> for <em class="replaceable"><code>escape-character</code></em>)
258 or as a plain three-argument function:
259 </p><pre class="synopsis">
260 substring(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>escape-character</code></em>)
262 As with <code class="literal">SIMILAR TO</code>, the
263 specified pattern must match the entire data string, or else the
264 function fails and returns null. To indicate the part of the
265 pattern for which the matching data sub-string is of interest,
266 the pattern should contain
267 two occurrences of the escape character followed by a double quote
268 (<code class="literal">"</code>).
269 The text matching the portion of the pattern
270 between these separators is returned when the match is successful.
272 The escape-double-quote separators actually
273 divide <code class="function">substring</code>'s pattern into three independent
274 regular expressions; for example, a vertical bar (<code class="literal">|</code>)
275 in any of the three sections affects only that section. Also, the first
276 and third of these regular expressions are defined to match the smallest
277 possible amount of text, not the largest, when there is any ambiguity
278 about how much of the data string matches which pattern. (In POSIX
279 parlance, the first and third regular expressions are forced to be
282 As an extension to the SQL standard, <span class="productname">PostgreSQL</span>
283 allows there to be just one escape-double-quote separator, in which case
284 the third regular expression is taken as empty; or no separators, in which
285 case the first and third regular expressions are taken as empty.
287 Some examples, with <code class="literal">#"</code> delimiting the return string:
288 </p><pre class="programlisting">
289 substring('foobar' similar '%#"o_b#"%' escape '#') <em class="lineannotation"><span class="lineannotation">oob</span></em>
290 substring('foobar' similar '#"o_b#"%' escape '#') <em class="lineannotation"><span class="lineannotation">NULL</span></em>
292 </p></div><div class="sect2" id="FUNCTIONS-POSIX-REGEXP"><div class="titlepage"><div><div><h3 class="title">9.7.3. <acronym class="acronym">POSIX</acronym> Regular Expressions <a href="#FUNCTIONS-POSIX-REGEXP" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.9.2" class="indexterm"></a><a id="id-1.5.8.13.9.3" class="indexterm"></a><a id="id-1.5.8.13.9.4" class="indexterm"></a><a id="id-1.5.8.13.9.5" class="indexterm"></a><a id="id-1.5.8.13.9.6" class="indexterm"></a><a id="id-1.5.8.13.9.7" class="indexterm"></a><a id="id-1.5.8.13.9.8" class="indexterm"></a><a id="id-1.5.8.13.9.9" class="indexterm"></a><a id="id-1.5.8.13.9.10" class="indexterm"></a><a id="id-1.5.8.13.9.11" class="indexterm"></a><a id="id-1.5.8.13.9.12" class="indexterm"></a><p>
293 <a class="xref" href="functions-matching.html#FUNCTIONS-POSIX-TABLE" title="Table 9.16. Regular Expression Match Operators">Table 9.16</a> lists the available
294 operators for pattern matching using POSIX regular expressions.
295 </p><div class="table" id="FUNCTIONS-POSIX-TABLE"><p class="title"><strong>Table 9.16. Regular Expression Match Operators</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Match Operators" border="1"><colgroup><col /></colgroup><thead><tr><th class="func_table_entry"><p class="func_signature">
303 </p></th></tr></thead><tbody><tr><td class="func_table_entry"><p class="func_signature">
304 <code class="type">text</code> <code class="literal">~</code> <code class="type">text</code>
305 → <code class="returnvalue">boolean</code>
308 String matches regular expression, case sensitively
311 <code class="literal">'thomas' ~ 't.*ma'</code>
312 → <code class="returnvalue">t</code>
313 </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
314 <code class="type">text</code> <code class="literal">~*</code> <code class="type">text</code>
315 → <code class="returnvalue">boolean</code>
318 String matches regular expression, case-insensitively
321 <code class="literal">'thomas' ~* 'T.*ma'</code>
322 → <code class="returnvalue">t</code>
323 </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
324 <code class="type">text</code> <code class="literal">!~</code> <code class="type">text</code>
325 → <code class="returnvalue">boolean</code>
328 String does not match regular expression, case sensitively
331 <code class="literal">'thomas' !~ 't.*max'</code>
332 → <code class="returnvalue">t</code>
333 </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
334 <code class="type">text</code> <code class="literal">!~*</code> <code class="type">text</code>
335 → <code class="returnvalue">boolean</code>
338 String does not match regular expression, case-insensitively
341 <code class="literal">'thomas' !~* 'T.*ma'</code>
342 → <code class="returnvalue">f</code>
343 </p></td></tr></tbody></table></div></div><br class="table-break" /><p>
344 <acronym class="acronym">POSIX</acronym> regular expressions provide a more
345 powerful means for pattern matching than the <code class="function">LIKE</code> and
346 <code class="function">SIMILAR TO</code> operators.
347 Many Unix tools such as <code class="command">egrep</code>,
348 <code class="command">sed</code>, or <code class="command">awk</code> use a pattern
349 matching language that is similar to the one described here.
351 A regular expression is a character sequence that is an
352 abbreviated definition of a set of strings (a <em class="firstterm">regular
353 set</em>). A string is said to match a regular expression
354 if it is a member of the regular set described by the regular
355 expression. As with <code class="function">LIKE</code>, pattern characters
356 match string characters exactly unless they are special characters
357 in the regular expression language — but regular expressions use
358 different special characters than <code class="function">LIKE</code> does.
359 Unlike <code class="function">LIKE</code> patterns, a
360 regular expression is allowed to match anywhere within a string, unless
361 the regular expression is explicitly anchored to the beginning or
365 </p><pre class="programlisting">
366 'abcd' ~ 'bc' <em class="lineannotation"><span class="lineannotation">true</span></em>
367 'abcd' ~ 'a.c' <em class="lineannotation"><span class="lineannotation">true — dot matches any character</span></em>
368 'abcd' ~ 'a.*d' <em class="lineannotation"><span class="lineannotation">true — <code class="literal">*</code> repeats the preceding pattern item</span></em>
369 'abcd' ~ '(b|x)' <em class="lineannotation"><span class="lineannotation">true — <code class="literal">|</code> means OR, parentheses group</span></em>
370 'abcd' ~ '^a' <em class="lineannotation"><span class="lineannotation">true — <code class="literal">^</code> anchors to start of string</span></em>
371 'abcd' ~ '^(b|c)' <em class="lineannotation"><span class="lineannotation">false — would match except for anchoring</span></em>
374 The <acronym class="acronym">POSIX</acronym> pattern language is described in much
375 greater detail below.
377 The <code class="function">substring</code> function with two parameters,
378 <code class="function">substring(<em class="replaceable"><code>string</code></em> from
379 <em class="replaceable"><code>pattern</code></em>)</code>, provides extraction of a
381 that matches a POSIX regular expression pattern. It returns null if
382 there is no match, otherwise the first portion of the text that matched the
383 pattern. But if the pattern contains any parentheses, the portion
384 of the text that matched the first parenthesized subexpression (the
385 one whose left parenthesis comes first) is
386 returned. You can put parentheses around the whole expression
387 if you want to use parentheses within it without triggering this
388 exception. If you need parentheses in the pattern before the
389 subexpression you want to extract, see the non-capturing parentheses
393 </p><pre class="programlisting">
394 substring('foobar' from 'o.b') <em class="lineannotation"><span class="lineannotation">oob</span></em>
395 substring('foobar' from 'o(.)b') <em class="lineannotation"><span class="lineannotation">o</span></em>
398 The <code class="function">regexp_count</code> function counts the number of
399 places where a POSIX regular expression pattern matches a string.
401 <code class="function">regexp_count</code>(<em class="replaceable"><code>string</code></em>,
402 <em class="replaceable"><code>pattern</code></em>
403 [<span class="optional">, <em class="replaceable"><code>start</code></em>
404 [<span class="optional">, <em class="replaceable"><code>flags</code></em>
406 <em class="replaceable"><code>pattern</code></em> is searched for
407 in <em class="replaceable"><code>string</code></em>, normally from the beginning of
408 the string, but if the <em class="replaceable"><code>start</code></em> parameter is
409 provided then beginning from that character index.
410 The <em class="replaceable"><code>flags</code></em> parameter is an optional text
411 string containing zero or more single-letter flags that change the
412 function's behavior. For example, including <code class="literal">i</code> in
413 <em class="replaceable"><code>flags</code></em> specifies case-insensitive matching.
414 Supported flags are described in
415 <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
418 </p><pre class="programlisting">
419 regexp_count('ABCABCAXYaxy', 'A.') <em class="lineannotation"><span class="lineannotation">3</span></em>
420 regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <em class="lineannotation"><span class="lineannotation">4</span></em>
423 The <code class="function">regexp_instr</code> function returns the starting or
424 ending position of the <em class="replaceable"><code>N</code></em>'th match of a
425 POSIX regular expression pattern to a string, or zero if there is no
426 such match. It has the syntax
427 <code class="function">regexp_instr</code>(<em class="replaceable"><code>string</code></em>,
428 <em class="replaceable"><code>pattern</code></em>
429 [<span class="optional">, <em class="replaceable"><code>start</code></em>
430 [<span class="optional">, <em class="replaceable"><code>N</code></em>
431 [<span class="optional">, <em class="replaceable"><code>endoption</code></em>
432 [<span class="optional">, <em class="replaceable"><code>flags</code></em>
433 [<span class="optional">, <em class="replaceable"><code>subexpr</code></em>
434 </span>]</span>]</span>]</span>]</span>]).
435 <em class="replaceable"><code>pattern</code></em> is searched for
436 in <em class="replaceable"><code>string</code></em>, normally from the beginning of
437 the string, but if the <em class="replaceable"><code>start</code></em> parameter is
438 provided then beginning from that character index.
439 If <em class="replaceable"><code>N</code></em> is specified
440 then the <em class="replaceable"><code>N</code></em>'th match of the pattern
441 is located, otherwise the first match is located.
442 If the <em class="replaceable"><code>endoption</code></em> parameter is omitted or
443 specified as zero, the function returns the position of the first
444 character of the match. Otherwise, <em class="replaceable"><code>endoption</code></em>
445 must be one, and the function returns the position of the character
447 The <em class="replaceable"><code>flags</code></em> parameter is an optional text
448 string containing zero or more single-letter flags that change the
449 function's behavior. Supported flags are described
450 in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
451 For a pattern containing parenthesized
452 subexpressions, <em class="replaceable"><code>subexpr</code></em> is an integer
453 indicating which subexpression is of interest: the result identifies
454 the position of the substring matching that subexpression.
455 Subexpressions are numbered in the order of their leading parentheses.
456 When <em class="replaceable"><code>subexpr</code></em> is omitted or zero, the result
457 identifies the position of the whole match regardless of
458 parenthesized subexpressions.
461 </p><pre class="programlisting">
462 regexp_instr('number of your street, town zip, FR', '[^,]+', 1, 2)
463 <em class="lineannotation"><span class="lineannotation">23</span></em>
464 regexp_instr(string=>'ABCDEFGHI', pattern=>'(c..)(...)', start=>1, "N"=>1, endoption=>0, flags=>'i', subexpr=>2)
465 <em class="lineannotation"><span class="lineannotation">6</span></em>
468 The <code class="function">regexp_like</code> function checks whether a match
469 of a POSIX regular expression pattern occurs within a string,
470 returning boolean true or false. It has the syntax
471 <code class="function">regexp_like</code>(<em class="replaceable"><code>string</code></em>,
472 <em class="replaceable"><code>pattern</code></em>
473 [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
474 The <em class="replaceable"><code>flags</code></em> parameter is an optional text
475 string containing zero or more single-letter flags that change the
476 function's behavior. Supported flags are described
477 in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
478 This function has the same results as the <code class="literal">~</code>
479 operator if no flags are specified. If only the <code class="literal">i</code>
480 flag is specified, it has the same results as
481 the <code class="literal">~*</code> operator.
484 </p><pre class="programlisting">
485 regexp_like('Hello World', 'world') <em class="lineannotation"><span class="lineannotation">false</span></em>
486 regexp_like('Hello World', 'world', 'i') <em class="lineannotation"><span class="lineannotation">true</span></em>
489 The <code class="function">regexp_match</code> function returns a text array of
490 matching substring(s) within the first match of a POSIX
491 regular expression pattern to a string. It has the syntax
492 <code class="function">regexp_match</code>(<em class="replaceable"><code>string</code></em>,
493 <em class="replaceable"><code>pattern</code></em> [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
494 If there is no match, the result is <code class="literal">NULL</code>.
495 If a match is found, and the <em class="replaceable"><code>pattern</code></em> contains no
496 parenthesized subexpressions, then the result is a single-element text
497 array containing the substring matching the whole pattern.
498 If a match is found, and the <em class="replaceable"><code>pattern</code></em> contains
499 parenthesized subexpressions, then the result is a text array
500 whose <em class="replaceable"><code>n</code></em>'th element is the substring matching
501 the <em class="replaceable"><code>n</code></em>'th parenthesized subexpression of
502 the <em class="replaceable"><code>pattern</code></em> (not counting <span class="quote">“<span class="quote">non-capturing</span>”</span>
503 parentheses; see below for details).
504 The <em class="replaceable"><code>flags</code></em> parameter is an optional text string
505 containing zero or more single-letter flags that change the function's
506 behavior. Supported flags are described
507 in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
510 </p><pre class="programlisting">
511 SELECT regexp_match('foobarbequebaz', 'bar.*que');
517 SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
523 </p><div class="tip"><h3 class="title">Tip</h3><p>
524 In the common case where you just want the whole matching substring
525 or <code class="literal">NULL</code> for no match, the best solution is to
526 use <code class="function">regexp_substr()</code>.
527 However, <code class="function">regexp_substr()</code> only exists
528 in <span class="productname">PostgreSQL</span> version 15 and up. When
529 working in older versions, you can extract the first element
530 of <code class="function">regexp_match()</code>'s result, for example:
531 </p><pre class="programlisting">
532 SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
539 The <code class="function">regexp_matches</code> function returns a set of text arrays
540 of matching substring(s) within matches of a POSIX regular
541 expression pattern to a string. It has the same syntax as
542 <code class="function">regexp_match</code>.
543 This function returns no rows if there is no match, one row if there is
544 a match and the <code class="literal">g</code> flag is not given, or <em class="replaceable"><code>N</code></em>
545 rows if there are <em class="replaceable"><code>N</code></em> matches and the <code class="literal">g</code> flag
546 is given. Each returned row is a text array containing the whole
547 matched substring or the substrings matching parenthesized
548 subexpressions of the <em class="replaceable"><code>pattern</code></em>, just as described above
549 for <code class="function">regexp_match</code>.
550 <code class="function">regexp_matches</code> accepts all the flags shown
551 in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>, plus
552 the <code class="literal">g</code> flag which commands it to return all matches, not
556 </p><pre class="programlisting">
557 SELECT regexp_matches('foo', 'not there');
562 SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
569 </p><div class="tip"><h3 class="title">Tip</h3><p>
570 In most cases <code class="function">regexp_matches()</code> should be used with
571 the <code class="literal">g</code> flag, since if you only want the first match, it's
572 easier and more efficient to use <code class="function">regexp_match()</code>.
573 However, <code class="function">regexp_match()</code> only exists
574 in <span class="productname">PostgreSQL</span> version 10 and up. When working in older
575 versions, a common trick is to place a <code class="function">regexp_matches()</code>
576 call in a sub-select, for example:
577 </p><pre class="programlisting">
578 SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
580 This produces a text array if there's a match, or <code class="literal">NULL</code> if
581 not, the same as <code class="function">regexp_match()</code> would do. Without the
582 sub-select, this query would produce no output at all for table rows
583 without a match, which is typically not the desired behavior.
585 The <code class="function">regexp_replace</code> function provides substitution of
586 new text for substrings that match POSIX regular expression patterns.
588 <code class="function">regexp_replace</code>(<em class="replaceable"><code>string</code></em>,
589 <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>
590 [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>])
592 <code class="function">regexp_replace</code>(<em class="replaceable"><code>string</code></em>,
593 <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>,
594 <em class="replaceable"><code>start</code></em>
595 [<span class="optional">, <em class="replaceable"><code>N</code></em>
596 [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]</span>]).
597 The source <em class="replaceable"><code>string</code></em> is returned unchanged if
598 there is no match to the <em class="replaceable"><code>pattern</code></em>. If there is a
599 match, the <em class="replaceable"><code>string</code></em> is returned with the
600 <em class="replaceable"><code>replacement</code></em> string substituted for the matching
601 substring. The <em class="replaceable"><code>replacement</code></em> string can contain
602 <code class="literal">\</code><em class="replaceable"><code>n</code></em>, where <em class="replaceable"><code>n</code></em> is 1
603 through 9, to indicate that the source substring matching the
604 <em class="replaceable"><code>n</code></em>'th parenthesized subexpression of the pattern should be
605 inserted, and it can contain <code class="literal">\&</code> to indicate that the
606 substring matching the entire pattern should be inserted. Write
607 <code class="literal">\\</code> if you need to put a literal backslash in the replacement
609 <em class="replaceable"><code>pattern</code></em> is searched for
610 in <em class="replaceable"><code>string</code></em>, normally from the beginning of
611 the string, but if the <em class="replaceable"><code>start</code></em> parameter is
612 provided then beginning from that character index.
613 By default, only the first match of the pattern is replaced.
614 If <em class="replaceable"><code>N</code></em> is specified and is greater than zero,
615 then the <em class="replaceable"><code>N</code></em>'th match of the pattern
617 If the <code class="literal">g</code> flag is given, or
618 if <em class="replaceable"><code>N</code></em> is specified and is zero, then all
619 matches at or after the <em class="replaceable"><code>start</code></em> position are
620 replaced. (The <code class="literal">g</code> flag is ignored
621 when <em class="replaceable"><code>N</code></em> is specified.)
622 The <em class="replaceable"><code>flags</code></em> parameter is an optional text
623 string containing zero or more single-letter flags that change the
624 function's behavior. Supported flags (though
625 not <code class="literal">g</code>) are
626 described in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
629 </p><pre class="programlisting">
630 regexp_replace('foobarbaz', 'b..', 'X')
631 <em class="lineannotation"><span class="lineannotation">fooXbaz</span></em>
632 regexp_replace('foobarbaz', 'b..', 'X', 'g')
633 <em class="lineannotation"><span class="lineannotation">fooXX</span></em>
634 regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
635 <em class="lineannotation"><span class="lineannotation">fooXarYXazY</span></em>
636 regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
637 <em class="lineannotation"><span class="lineannotation">X PXstgrXSQL fXnctXXn</span></em>
638 regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacement=>'X', start=>1, "N"=>3, flags=>'i')
639 <em class="lineannotation"><span class="lineannotation">A PostgrXSQL function</span></em>
642 The <code class="function">regexp_split_to_table</code> function splits a string using a POSIX
643 regular expression pattern as a delimiter. It has the syntax
644 <code class="function">regexp_split_to_table</code>(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>
645 [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
646 If there is no match to the <em class="replaceable"><code>pattern</code></em>, the function returns the
647 <em class="replaceable"><code>string</code></em>. If there is at least one match, for each match it returns
648 the text from the end of the last match (or the beginning of the string)
649 to the beginning of the match. When there are no more matches, it
650 returns the text from the end of the last match to the end of the string.
651 The <em class="replaceable"><code>flags</code></em> parameter is an optional text string containing
652 zero or more single-letter flags that change the function's behavior.
653 <code class="function">regexp_split_to_table</code> supports the flags described in
654 <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
656 The <code class="function">regexp_split_to_array</code> function behaves the same as
657 <code class="function">regexp_split_to_table</code>, except that <code class="function">regexp_split_to_array</code>
658 returns its result as an array of <code class="type">text</code>. It has the syntax
659 <code class="function">regexp_split_to_array</code>(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>
660 [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
661 The parameters are the same as for <code class="function">regexp_split_to_table</code>.
664 </p><pre class="programlisting">
665 SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
679 SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
680 regexp_split_to_array
681 -----------------------------------------------
682 {the,quick,brown,fox,jumps,over,the,lazy,dog}
685 SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
707 As the last example demonstrates, the regexp split functions ignore
708 zero-length matches that occur at the start or end of the string
709 or immediately after a previous match. This is contrary to the strict
710 definition of regexp matching that is implemented by
711 the other regexp functions, but is usually the most convenient behavior
712 in practice. Other software systems such as Perl use similar definitions.
714 The <code class="function">regexp_substr</code> function returns the substring
715 that matches a POSIX regular expression pattern,
716 or <code class="literal">NULL</code> if there is no match. It has the syntax
717 <code class="function">regexp_substr</code>(<em class="replaceable"><code>string</code></em>,
718 <em class="replaceable"><code>pattern</code></em>
719 [<span class="optional">, <em class="replaceable"><code>start</code></em>
720 [<span class="optional">, <em class="replaceable"><code>N</code></em>
721 [<span class="optional">, <em class="replaceable"><code>flags</code></em>
722 [<span class="optional">, <em class="replaceable"><code>subexpr</code></em>
723 </span>]</span>]</span>]</span>]).
724 <em class="replaceable"><code>pattern</code></em> is searched for
725 in <em class="replaceable"><code>string</code></em>, normally from the beginning of
726 the string, but if the <em class="replaceable"><code>start</code></em> parameter is
727 provided then beginning from that character index.
728 If <em class="replaceable"><code>N</code></em> is specified
729 then the <em class="replaceable"><code>N</code></em>'th match of the pattern
730 is returned, otherwise the first match is returned.
731 The <em class="replaceable"><code>flags</code></em> parameter is an optional text
732 string containing zero or more single-letter flags that change the
733 function's behavior. Supported flags are described
734 in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
735 For a pattern containing parenthesized
736 subexpressions, <em class="replaceable"><code>subexpr</code></em> is an integer
737 indicating which subexpression is of interest: the result is the
738 substring matching that subexpression.
739 Subexpressions are numbered in the order of their leading parentheses.
740 When <em class="replaceable"><code>subexpr</code></em> is omitted or zero, the result
741 is the whole match regardless of parenthesized subexpressions.
744 </p><pre class="programlisting">
745 regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2)
746 <em class="lineannotation"><span class="lineannotation"> town zip</span></em>
747 regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2)
748 <em class="lineannotation"><span class="lineannotation">FGH</span></em>
750 </p><div class="sect3" id="POSIX-SYNTAX-DETAILS"><div class="titlepage"><div><div><h4 class="title">9.7.3.1. Regular Expression Details <a href="#POSIX-SYNTAX-DETAILS" class="id_link">#</a></h4></div></div></div><p>
751 <span class="productname">PostgreSQL</span>'s regular expressions are implemented
752 using a software package written by Henry Spencer. Much of
753 the description of regular expressions below is copied verbatim from his
756 Regular expressions (<acronym class="acronym">RE</acronym>s), as defined in
757 <acronym class="acronym">POSIX</acronym> 1003.2, come in two forms:
758 <em class="firstterm">extended</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">ERE</acronym>s
759 (roughly those of <code class="command">egrep</code>), and
760 <em class="firstterm">basic</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">BRE</acronym>s
761 (roughly those of <code class="command">ed</code>).
762 <span class="productname">PostgreSQL</span> supports both forms, and
763 also implements some extensions
764 that are not in the POSIX standard, but have become widely used
765 due to their availability in programming languages such as Perl and Tcl.
766 <acronym class="acronym">RE</acronym>s using these non-POSIX extensions are called
767 <em class="firstterm">advanced</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">ARE</acronym>s
768 in this documentation. AREs are almost an exact superset of EREs,
769 but BREs have several notational incompatibilities (as well as being
771 We first describe the ARE and ERE forms, noting features that apply
772 only to AREs, and then describe how BREs differ.
773 </p><div class="note"><h3 class="title">Note</h3><p>
774 <span class="productname">PostgreSQL</span> always initially presumes that a regular
775 expression follows the ARE rules. However, the more limited ERE or
776 BRE rules can be chosen by prepending an <em class="firstterm">embedded option</em>
777 to the RE pattern, as described in <a class="xref" href="functions-matching.html#POSIX-METASYNTAX" title="9.7.3.4. Regular Expression Metasyntax">Section 9.7.3.4</a>.
778 This can be useful for compatibility with applications that expect
779 exactly the <acronym class="acronym">POSIX</acronym> 1003.2 rules.
781 A regular expression is defined as one or more
782 <em class="firstterm">branches</em>, separated by
783 <code class="literal">|</code>. It matches anything that matches one of the
786 A branch is zero or more <em class="firstterm">quantified atoms</em> or
787 <em class="firstterm">constraints</em>, concatenated.
788 It matches a match for the first, followed by a match for the second, etc.;
789 an empty branch matches the empty string.
791 A quantified atom is an <em class="firstterm">atom</em> possibly followed
792 by a single <em class="firstterm">quantifier</em>.
793 Without a quantifier, it matches a match for the atom.
794 With a quantifier, it can match some number of matches of the atom.
795 An <em class="firstterm">atom</em> can be any of the possibilities
796 shown in <a class="xref" href="functions-matching.html#POSIX-ATOMS-TABLE" title="Table 9.17. Regular Expression Atoms">Table 9.17</a>.
797 The possible quantifiers and their meanings are shown in
798 <a class="xref" href="functions-matching.html#POSIX-QUANTIFIERS-TABLE" title="Table 9.18. Regular Expression Quantifiers">Table 9.18</a>.
800 A <em class="firstterm">constraint</em> matches an empty string, but matches only when
801 specific conditions are met. A constraint can be used where an atom
802 could be used, except it cannot be followed by a quantifier.
803 The simple constraints are shown in
804 <a class="xref" href="functions-matching.html#POSIX-CONSTRAINTS-TABLE" title="Table 9.19. Regular Expression Constraints">Table 9.19</a>;
805 some more constraints are described later.
806 </p><div class="table" id="POSIX-ATOMS-TABLE"><p class="title"><strong>Table 9.17. Regular Expression Atoms</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Atoms" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Atom</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">(</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> (where <em class="replaceable"><code>re</code></em> is any regular expression)
808 <em class="replaceable"><code>re</code></em>, with the match noted for possible reporting </td></tr><tr><td> <code class="literal">(?:</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> as above, but the match is not noted for reporting
809 (a <span class="quote">“<span class="quote">non-capturing</span>”</span> set of parentheses)
810 (AREs only) </td></tr><tr><td> <code class="literal">.</code> </td><td> matches any single character </td></tr><tr><td> <code class="literal">[</code><em class="replaceable"><code>chars</code></em><code class="literal">]</code> </td><td> a <em class="firstterm">bracket expression</em>,
811 matching any one of the <em class="replaceable"><code>chars</code></em> (see
812 <a class="xref" href="functions-matching.html#POSIX-BRACKET-EXPRESSIONS" title="9.7.3.2. Bracket Expressions">Section 9.7.3.2</a> for more detail) </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>k</code></em> </td><td> (where <em class="replaceable"><code>k</code></em> is a non-alphanumeric character)
813 matches that character taken as an ordinary character,
814 e.g., <code class="literal">\\</code> matches a backslash character </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>c</code></em> </td><td> where <em class="replaceable"><code>c</code></em> is alphanumeric
815 (possibly followed by other characters)
816 is an <em class="firstterm">escape</em>, see <a class="xref" href="functions-matching.html#POSIX-ESCAPE-SEQUENCES" title="9.7.3.3. Regular Expression Escapes">Section 9.7.3.3</a>
817 (AREs only; in EREs and BREs, this matches <em class="replaceable"><code>c</code></em>) </td></tr><tr><td> <code class="literal">{</code> </td><td> when followed by a character other than a digit,
818 matches the left-brace character <code class="literal">{</code>;
819 when followed by a digit, it is the beginning of a
820 <em class="replaceable"><code>bound</code></em> (see below) </td></tr><tr><td> <em class="replaceable"><code>x</code></em> </td><td> where <em class="replaceable"><code>x</code></em> is a single character with no other
821 significance, matches that character </td></tr></tbody></table></div></div><br class="table-break" /><p>
822 An RE cannot end with a backslash (<code class="literal">\</code>).
823 </p><div class="note"><h3 class="title">Note</h3><p>
824 If you have <a class="xref" href="runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS">standard_conforming_strings</a> turned off,
825 any backslashes you write in literal string constants will need to be
826 doubled. See <a class="xref" href="sql-syntax-lexical.html#SQL-SYNTAX-STRINGS" title="4.1.2.1. String Constants">Section 4.1.2.1</a> for more information.
827 </p></div><div class="table" id="POSIX-QUANTIFIERS-TABLE"><p class="title"><strong>Table 9.18. Regular Expression Quantifiers</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Quantifiers" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Quantifier</th><th>Matches</th></tr></thead><tbody><tr><td> <code class="literal">*</code> </td><td> a sequence of 0 or more matches of the atom </td></tr><tr><td> <code class="literal">+</code> </td><td> a sequence of 1 or more matches of the atom </td></tr><tr><td> <code class="literal">?</code> </td><td> a sequence of 0 or 1 matches of the atom </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> </td><td> a sequence of exactly <em class="replaceable"><code>m</code></em> matches of the atom </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> </td><td> a sequence of <em class="replaceable"><code>m</code></em> or more matches of the atom </td></tr><tr><td>
828 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code> </td><td> a sequence of <em class="replaceable"><code>m</code></em> through <em class="replaceable"><code>n</code></em>
829 (inclusive) matches of the atom; <em class="replaceable"><code>m</code></em> cannot exceed
830 <em class="replaceable"><code>n</code></em> </td></tr><tr><td> <code class="literal">*?</code> </td><td> non-greedy version of <code class="literal">*</code> </td></tr><tr><td> <code class="literal">+?</code> </td><td> non-greedy version of <code class="literal">+</code> </td></tr><tr><td> <code class="literal">??</code> </td><td> non-greedy version of <code class="literal">?</code> </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> </td></tr><tr><td>
831 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code> </td></tr></tbody></table></div></div><br class="table-break" /><p>
832 The forms using <code class="literal">{</code><em class="replaceable"><code>...</code></em><code class="literal">}</code>
833 are known as <em class="firstterm">bounds</em>.
834 The numbers <em class="replaceable"><code>m</code></em> and <em class="replaceable"><code>n</code></em> within a bound are
835 unsigned decimal integers with permissible values from 0 to 255 inclusive.
837 <em class="firstterm">Non-greedy</em> quantifiers (available in AREs only) match the
838 same possibilities as their corresponding normal (<em class="firstterm">greedy</em>)
839 counterparts, but prefer the smallest number rather than the largest
841 See <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for more detail.
842 </p><div class="note"><h3 class="title">Note</h3><p>
843 A quantifier cannot immediately follow another quantifier, e.g.,
844 <code class="literal">**</code> is invalid.
846 begin an expression or subexpression or follow
847 <code class="literal">^</code> or <code class="literal">|</code>.
848 </p></div><div class="table" id="POSIX-CONSTRAINTS-TABLE"><p class="title"><strong>Table 9.19. Regular Expression Constraints</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Constraints" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Constraint</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">^</code> </td><td> matches at the beginning of the string </td></tr><tr><td> <code class="literal">$</code> </td><td> matches at the end of the string </td></tr><tr><td> <code class="literal">(?=</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">positive lookahead</em> matches at any point
849 where a substring matching <em class="replaceable"><code>re</code></em> begins
850 (AREs only) </td></tr><tr><td> <code class="literal">(?!</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">negative lookahead</em> matches at any point
851 where no substring matching <em class="replaceable"><code>re</code></em> begins
852 (AREs only) </td></tr><tr><td> <code class="literal">(?<=</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">positive lookbehind</em> matches at any point
853 where a substring matching <em class="replaceable"><code>re</code></em> ends
854 (AREs only) </td></tr><tr><td> <code class="literal">(?<!</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">negative lookbehind</em> matches at any point
855 where no substring matching <em class="replaceable"><code>re</code></em> ends
856 (AREs only) </td></tr></tbody></table></div></div><br class="table-break" /><p>
857 Lookahead and lookbehind constraints cannot contain <em class="firstterm">back
858 references</em> (see <a class="xref" href="functions-matching.html#POSIX-ESCAPE-SEQUENCES" title="9.7.3.3. Regular Expression Escapes">Section 9.7.3.3</a>),
859 and all parentheses within them are considered non-capturing.
860 </p></div><div class="sect3" id="POSIX-BRACKET-EXPRESSIONS"><div class="titlepage"><div><div><h4 class="title">9.7.3.2. Bracket Expressions <a href="#POSIX-BRACKET-EXPRESSIONS" class="id_link">#</a></h4></div></div></div><p>
861 A <em class="firstterm">bracket expression</em> is a list of
862 characters enclosed in <code class="literal">[]</code>. It normally matches
863 any single character from the list (but see below). If the list
864 begins with <code class="literal">^</code>, it matches any single character
865 <span class="emphasis"><em>not</em></span> from the rest of the list.
867 in the list are separated by <code class="literal">-</code>, this is
868 shorthand for the full range of characters between those two
869 (inclusive) in the collating sequence,
870 e.g., <code class="literal">[0-9]</code> in <acronym class="acronym">ASCII</acronym> matches
871 any decimal digit. It is illegal for two ranges to share an
872 endpoint, e.g., <code class="literal">a-c-e</code>. Ranges are very
873 collating-sequence-dependent, so portable programs should avoid
876 To include a literal <code class="literal">]</code> in the list, make it the
877 first character (after <code class="literal">^</code>, if that is used). To
878 include a literal <code class="literal">-</code>, make it the first or last
879 character, or the second endpoint of a range. To use a literal
880 <code class="literal">-</code> as the first endpoint of a range, enclose it
881 in <code class="literal">[.</code> and <code class="literal">.]</code> to make it a
882 collating element (see below). With the exception of these characters,
883 some combinations using <code class="literal">[</code>
884 (see next paragraphs), and escapes (AREs only), all other special
885 characters lose their special significance within a bracket expression.
886 In particular, <code class="literal">\</code> is not special when following
887 ERE or BRE rules, though it is special (as introducing an escape)
890 Within a bracket expression, a collating element (a character, a
891 multiple-character sequence that collates as if it were a single
892 character, or a collating-sequence name for either) enclosed in
893 <code class="literal">[.</code> and <code class="literal">.]</code> stands for the
894 sequence of characters of that collating element. The sequence is
895 treated as a single element of the bracket expression's list. This
897 expression containing a multiple-character collating element to
898 match more than one character, e.g., if the collating sequence
899 includes a <code class="literal">ch</code> collating element, then the RE
900 <code class="literal">[[.ch.]]*c</code> matches the first five characters of
901 <code class="literal">chchcc</code>.
902 </p><div class="note"><h3 class="title">Note</h3><p>
903 <span class="productname">PostgreSQL</span> currently does not support multi-character collating
904 elements. This information describes possible future behavior.
906 Within a bracket expression, a collating element enclosed in
907 <code class="literal">[=</code> and <code class="literal">=]</code> is an <em class="firstterm">equivalence
908 class</em>, standing for the sequences of characters of all collating
909 elements equivalent to that one, including itself. (If there are
910 no other equivalent collating elements, the treatment is as if the
911 enclosing delimiters were <code class="literal">[.</code> and
912 <code class="literal">.]</code>.) For example, if <code class="literal">o</code> and
913 <code class="literal">^</code> are the members of an equivalence class, then
914 <code class="literal">[[=o=]]</code>, <code class="literal">[[=^=]]</code>, and
915 <code class="literal">[o^]</code> are all synonymous. An equivalence class
916 cannot be an endpoint of a range.
918 Within a bracket expression, the name of a character class
919 enclosed in <code class="literal">[:</code> and <code class="literal">:]</code> stands
920 for the list of all characters belonging to that class. A character
921 class cannot be used as an endpoint of a range.
922 The <acronym class="acronym">POSIX</acronym> standard defines these character class
924 <code class="literal">alnum</code> (letters and numeric digits),
925 <code class="literal">alpha</code> (letters),
926 <code class="literal">blank</code> (space and tab),
927 <code class="literal">cntrl</code> (control characters),
928 <code class="literal">digit</code> (numeric digits),
929 <code class="literal">graph</code> (printable characters except space),
930 <code class="literal">lower</code> (lower-case letters),
931 <code class="literal">print</code> (printable characters including space),
932 <code class="literal">punct</code> (punctuation),
933 <code class="literal">space</code> (any white space),
934 <code class="literal">upper</code> (upper-case letters),
935 and <code class="literal">xdigit</code> (hexadecimal digits).
936 The behavior of these standard character classes is generally
937 consistent across platforms for characters in the 7-bit ASCII set.
938 Whether a given non-ASCII character is considered to belong to one
939 of these classes depends on the <em class="firstterm">collation</em>
940 that is used for the regular-expression function or operator
941 (see <a class="xref" href="collation.html" title="23.2. Collation Support">Section 23.2</a>), or by default on the
942 database's <code class="envar">LC_CTYPE</code> locale setting (see
943 <a class="xref" href="locale.html" title="23.1. Locale Support">Section 23.1</a>). The classification of non-ASCII
944 characters can vary across platforms even in similarly-named
945 locales. (But the <code class="literal">C</code> locale never considers any
946 non-ASCII characters to belong to any of these classes.)
947 In addition to these standard character
948 classes, <span class="productname">PostgreSQL</span> defines
949 the <code class="literal">word</code> character class, which is the same as
950 <code class="literal">alnum</code> plus the underscore (<code class="literal">_</code>)
952 the <code class="literal">ascii</code> character class, which contains exactly
955 There are two special cases of bracket expressions: the bracket
956 expressions <code class="literal">[[:<:]]</code> and
957 <code class="literal">[[:>:]]</code> are constraints,
958 matching empty strings at the beginning
959 and end of a word respectively. A word is defined as a sequence
960 of word characters that is neither preceded nor followed by word
961 characters. A word character is any character belonging to the
962 <code class="literal">word</code> character class, that is, any letter, digit,
963 or underscore. This is an extension, compatible with but not
964 specified by <acronym class="acronym">POSIX</acronym> 1003.2, and should be used with
965 caution in software intended to be portable to other systems.
966 The constraint escapes described below are usually preferable; they
967 are no more standard, but are easier to type.
968 </p></div><div class="sect3" id="POSIX-ESCAPE-SEQUENCES"><div class="titlepage"><div><div><h4 class="title">9.7.3.3. Regular Expression Escapes <a href="#POSIX-ESCAPE-SEQUENCES" class="id_link">#</a></h4></div></div></div><p>
969 <em class="firstterm">Escapes</em> are special sequences beginning with <code class="literal">\</code>
970 followed by an alphanumeric character. Escapes come in several varieties:
971 character entry, class shorthands, constraint escapes, and back references.
972 A <code class="literal">\</code> followed by an alphanumeric character but not constituting
973 a valid escape is illegal in AREs.
974 In EREs, there are no escapes: outside a bracket expression,
975 a <code class="literal">\</code> followed by an alphanumeric character merely stands for
976 that character as an ordinary character, and inside a bracket expression,
977 <code class="literal">\</code> is an ordinary character.
978 (The latter is the one actual incompatibility between EREs and AREs.)
980 <em class="firstterm">Character-entry escapes</em> exist to make it easier to specify
981 non-printing and other inconvenient characters in REs. They are
982 shown in <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>.
984 <em class="firstterm">Class-shorthand escapes</em> provide shorthands for certain
985 commonly-used character classes. They are
986 shown in <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>.
988 A <em class="firstterm">constraint escape</em> is a constraint,
989 matching the empty string if specific conditions are met,
990 written as an escape. They are
991 shown in <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a>.
993 A <em class="firstterm">back reference</em> (<code class="literal">\</code><em class="replaceable"><code>n</code></em>) matches the
994 same string matched by the previous parenthesized subexpression specified
995 by the number <em class="replaceable"><code>n</code></em>
996 (see <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-BACKREF-TABLE" title="Table 9.23. Regular Expression Back References">Table 9.23</a>). For example,
997 <code class="literal">([bc])\1</code> matches <code class="literal">bb</code> or <code class="literal">cc</code>
998 but not <code class="literal">bc</code> or <code class="literal">cb</code>.
999 The subexpression must entirely precede the back reference in the RE.
1000 Subexpressions are numbered in the order of their leading parentheses.
1001 Non-capturing parentheses do not define subexpressions.
1002 The back reference considers only the string characters matched by the
1003 referenced subexpression, not any constraints contained in it. For
1004 example, <code class="literal">(^\d)\1</code> will match <code class="literal">22</code>.
1005 </p><div class="table" id="POSIX-CHARACTER-ENTRY-ESCAPES-TABLE"><p class="title"><strong>Table 9.20. Regular Expression Character-Entry Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Character-Entry Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\a</code> </td><td> alert (bell) character, as in C </td></tr><tr><td> <code class="literal">\b</code> </td><td> backspace, as in C </td></tr><tr><td> <code class="literal">\B</code> </td><td> synonym for backslash (<code class="literal">\</code>) to help reduce the need for backslash
1006 doubling </td></tr><tr><td> <code class="literal">\c</code><em class="replaceable"><code>X</code></em> </td><td> (where <em class="replaceable"><code>X</code></em> is any character) the character whose
1007 low-order 5 bits are the same as those of
1008 <em class="replaceable"><code>X</code></em>, and whose other bits are all zero </td></tr><tr><td> <code class="literal">\e</code> </td><td> the character whose collating-sequence name
1009 is <code class="literal">ESC</code>,
1010 or failing that, the character with octal value <code class="literal">033</code> </td></tr><tr><td> <code class="literal">\f</code> </td><td> form feed, as in C </td></tr><tr><td> <code class="literal">\n</code> </td><td> newline, as in C </td></tr><tr><td> <code class="literal">\r</code> </td><td> carriage return, as in C </td></tr><tr><td> <code class="literal">\t</code> </td><td> horizontal tab, as in C </td></tr><tr><td> <code class="literal">\u</code><em class="replaceable"><code>wxyz</code></em> </td><td> (where <em class="replaceable"><code>wxyz</code></em> is exactly four hexadecimal digits)
1011 the character whose hexadecimal value is
1012 <code class="literal">0x</code><em class="replaceable"><code>wxyz</code></em>
1013 </td></tr><tr><td> <code class="literal">\U</code><em class="replaceable"><code>stuvwxyz</code></em> </td><td> (where <em class="replaceable"><code>stuvwxyz</code></em> is exactly eight hexadecimal
1015 the character whose hexadecimal value is
1016 <code class="literal">0x</code><em class="replaceable"><code>stuvwxyz</code></em>
1017 </td></tr><tr><td> <code class="literal">\v</code> </td><td> vertical tab, as in C </td></tr><tr><td> <code class="literal">\x</code><em class="replaceable"><code>hhh</code></em> </td><td> (where <em class="replaceable"><code>hhh</code></em> is any sequence of hexadecimal
1019 the character whose hexadecimal value is
1020 <code class="literal">0x</code><em class="replaceable"><code>hhh</code></em>
1021 (a single character no matter how many hexadecimal digits are used)
1022 </td></tr><tr><td> <code class="literal">\0</code> </td><td> the character whose value is <code class="literal">0</code> (the null byte)</td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>xy</code></em> </td><td> (where <em class="replaceable"><code>xy</code></em> is exactly two octal digits,
1023 and is not a <em class="firstterm">back reference</em>)
1024 the character whose octal value is
1025 <code class="literal">0</code><em class="replaceable"><code>xy</code></em> </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>xyz</code></em> </td><td> (where <em class="replaceable"><code>xyz</code></em> is exactly three octal digits,
1026 and is not a <em class="firstterm">back reference</em>)
1027 the character whose octal value is
1028 <code class="literal">0</code><em class="replaceable"><code>xyz</code></em> </td></tr></tbody></table></div></div><br class="table-break" /><p>
1029 Hexadecimal digits are <code class="literal">0</code>-<code class="literal">9</code>,
1030 <code class="literal">a</code>-<code class="literal">f</code>, and <code class="literal">A</code>-<code class="literal">F</code>.
1031 Octal digits are <code class="literal">0</code>-<code class="literal">7</code>.
1033 Numeric character-entry escapes specifying values outside the ASCII range
1034 (0–127) have meanings dependent on the database encoding. When the
1035 encoding is UTF-8, escape values are equivalent to Unicode code points,
1036 for example <code class="literal">\u1234</code> means the character <code class="literal">U+1234</code>.
1037 For other multibyte encodings, character-entry escapes usually just
1038 specify the concatenation of the byte values for the character. If the
1039 escape value does not correspond to any legal character in the database
1040 encoding, no error will be raised, but it will never match any data.
1042 The character-entry escapes are always taken as ordinary characters.
1043 For example, <code class="literal">\135</code> is <code class="literal">]</code> in ASCII, but
1044 <code class="literal">\135</code> does not terminate a bracket expression.
1045 </p><div class="table" id="POSIX-CLASS-SHORTHAND-ESCAPES-TABLE"><p class="title"><strong>Table 9.21. Regular Expression Class-Shorthand Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Class-Shorthand Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\d</code> </td><td> matches any digit, like
1046 <code class="literal">[[:digit:]]</code> </td></tr><tr><td> <code class="literal">\s</code> </td><td> matches any whitespace character, like
1047 <code class="literal">[[:space:]]</code> </td></tr><tr><td> <code class="literal">\w</code> </td><td> matches any word character, like
1048 <code class="literal">[[:word:]]</code> </td></tr><tr><td> <code class="literal">\D</code> </td><td> matches any non-digit, like
1049 <code class="literal">[^[:digit:]]</code> </td></tr><tr><td> <code class="literal">\S</code> </td><td> matches any non-whitespace character, like
1050 <code class="literal">[^[:space:]]</code> </td></tr><tr><td> <code class="literal">\W</code> </td><td> matches any non-word character, like
1051 <code class="literal">[^[:word:]]</code> </td></tr></tbody></table></div></div><br class="table-break" /><p>
1052 The class-shorthand escapes also work within bracket expressions,
1053 although the definitions shown above are not quite syntactically
1054 valid in that context.
1055 For example, <code class="literal">[a-c\d]</code> is equivalent to
1056 <code class="literal">[a-c[:digit:]]</code>.
1057 </p><div class="table" id="POSIX-CONSTRAINT-ESCAPES-TABLE"><p class="title"><strong>Table 9.22. Regular Expression Constraint Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Constraint Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\A</code> </td><td> matches only at the beginning of the string
1058 (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for how this differs from
1059 <code class="literal">^</code>) </td></tr><tr><td> <code class="literal">\m</code> </td><td> matches only at the beginning of a word </td></tr><tr><td> <code class="literal">\M</code> </td><td> matches only at the end of a word </td></tr><tr><td> <code class="literal">\y</code> </td><td> matches only at the beginning or end of a word </td></tr><tr><td> <code class="literal">\Y</code> </td><td> matches only at a point that is not the beginning or end of a
1060 word </td></tr><tr><td> <code class="literal">\Z</code> </td><td> matches only at the end of the string
1061 (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for how this differs from
1062 <code class="literal">$</code>) </td></tr></tbody></table></div></div><br class="table-break" /><p>
1063 A word is defined as in the specification of
1064 <code class="literal">[[:<:]]</code> and <code class="literal">[[:>:]]</code> above.
1065 Constraint escapes are illegal within bracket expressions.
1066 </p><div class="table" id="POSIX-CONSTRAINT-BACKREF-TABLE"><p class="title"><strong>Table 9.23. Regular Expression Back References</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Back References" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\</code><em class="replaceable"><code>m</code></em> </td><td> (where <em class="replaceable"><code>m</code></em> is a nonzero digit)
1067 a back reference to the <em class="replaceable"><code>m</code></em>'th subexpression </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>mnn</code></em> </td><td> (where <em class="replaceable"><code>m</code></em> is a nonzero digit, and
1068 <em class="replaceable"><code>nn</code></em> is some more digits, and the decimal value
1069 <em class="replaceable"><code>mnn</code></em> is not greater than the number of closing capturing
1070 parentheses seen so far)
1071 a back reference to the <em class="replaceable"><code>mnn</code></em>'th subexpression </td></tr></tbody></table></div></div><br class="table-break" /><div class="note"><h3 class="title">Note</h3><p>
1072 There is an inherent ambiguity between octal character-entry
1073 escapes and back references, which is resolved by the following heuristics,
1075 A leading zero always indicates an octal escape.
1076 A single non-zero digit, not followed by another digit,
1077 is always taken as a back reference.
1078 A multi-digit sequence not starting with a zero is taken as a back
1079 reference if it comes after a suitable subexpression
1080 (i.e., the number is in the legal range for a back reference),
1081 and otherwise is taken as octal.
1082 </p></div></div><div class="sect3" id="POSIX-METASYNTAX"><div class="titlepage"><div><div><h4 class="title">9.7.3.4. Regular Expression Metasyntax <a href="#POSIX-METASYNTAX" class="id_link">#</a></h4></div></div></div><p>
1083 In addition to the main syntax described above, there are some special
1084 forms and miscellaneous syntactic facilities available.
1086 An RE can begin with one of two special <em class="firstterm">director</em> prefixes.
1087 If an RE begins with <code class="literal">***:</code>,
1088 the rest of the RE is taken as an ARE. (This normally has no effect in
1089 <span class="productname">PostgreSQL</span>, since REs are assumed to be AREs;
1090 but it does have an effect if ERE or BRE mode had been specified by
1091 the <em class="replaceable"><code>flags</code></em> parameter to a regex function.)
1092 If an RE begins with <code class="literal">***=</code>,
1093 the rest of the RE is taken to be a literal string,
1094 with all characters considered ordinary characters.
1096 An ARE can begin with <em class="firstterm">embedded options</em>:
1097 a sequence <code class="literal">(?</code><em class="replaceable"><code>xyz</code></em><code class="literal">)</code>
1098 (where <em class="replaceable"><code>xyz</code></em> is one or more alphabetic characters)
1099 specifies options affecting the rest of the RE.
1100 These options override any previously determined options —
1101 in particular, they can override the case-sensitivity behavior implied by
1102 a regex operator, or the <em class="replaceable"><code>flags</code></em> parameter to a regex
1104 The available option letters are
1105 shown in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
1106 Note that these same option letters are used in the <em class="replaceable"><code>flags</code></em>
1107 parameters of regex functions.
1108 </p><div class="table" id="POSIX-EMBEDDED-OPTIONS-TABLE"><p class="title"><strong>Table 9.24. ARE Embedded-Option Letters</strong></p><div class="table-contents"><table class="table" summary="ARE Embedded-Option Letters" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Option</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">b</code> </td><td> rest of RE is a BRE </td></tr><tr><td> <code class="literal">c</code> </td><td> case-sensitive matching (overrides operator type) </td></tr><tr><td> <code class="literal">e</code> </td><td> rest of RE is an ERE </td></tr><tr><td> <code class="literal">i</code> </td><td> case-insensitive matching (see
1109 <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) (overrides operator type) </td></tr><tr><td> <code class="literal">m</code> </td><td> historical synonym for <code class="literal">n</code> </td></tr><tr><td> <code class="literal">n</code> </td><td> newline-sensitive matching (see
1110 <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">p</code> </td><td> partial newline-sensitive matching (see
1111 <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">q</code> </td><td> rest of RE is a literal (<span class="quote">“<span class="quote">quoted</span>”</span>) string, all ordinary
1112 characters </td></tr><tr><td> <code class="literal">s</code> </td><td> non-newline-sensitive matching (default) </td></tr><tr><td> <code class="literal">t</code> </td><td> tight syntax (default; see below) </td></tr><tr><td> <code class="literal">w</code> </td><td> inverse partial newline-sensitive (<span class="quote">“<span class="quote">weird</span>”</span>) matching
1113 (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">x</code> </td><td> expanded syntax (see below) </td></tr></tbody></table></div></div><br class="table-break" /><p>
1114 Embedded options take effect at the <code class="literal">)</code> terminating the sequence.
1115 They can appear only at the start of an ARE (after the
1116 <code class="literal">***:</code> director if any).
1118 In addition to the usual (<em class="firstterm">tight</em>) RE syntax, in which all
1119 characters are significant, there is an <em class="firstterm">expanded</em> syntax,
1120 available by specifying the embedded <code class="literal">x</code> option.
1121 In the expanded syntax,
1122 white-space characters in the RE are ignored, as are
1123 all characters between a <code class="literal">#</code>
1124 and the following newline (or the end of the RE). This
1125 permits paragraphing and commenting a complex RE.
1126 There are three exceptions to that basic rule:
1128 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1129 a white-space character or <code class="literal">#</code> preceded by <code class="literal">\</code> is
1131 </p></li><li class="listitem"><p>
1132 white space or <code class="literal">#</code> within a bracket expression is retained
1133 </p></li><li class="listitem"><p>
1134 white space and comments cannot appear within multi-character symbols,
1135 such as <code class="literal">(?:</code>
1136 </p></li></ul></div><p>
1138 For this purpose, white-space characters are blank, tab, newline, and
1139 any character that belongs to the <em class="replaceable"><code>space</code></em> character class.
1141 Finally, in an ARE, outside bracket expressions, the sequence
1142 <code class="literal">(?#</code><em class="replaceable"><code>ttt</code></em><code class="literal">)</code>
1143 (where <em class="replaceable"><code>ttt</code></em> is any text not containing a <code class="literal">)</code>)
1144 is a comment, completely ignored.
1145 Again, this is not allowed between the characters of
1146 multi-character symbols, like <code class="literal">(?:</code>.
1147 Such comments are more a historical artifact than a useful facility,
1148 and their use is deprecated; use the expanded syntax instead.
1150 <span class="emphasis"><em>None</em></span> of these metasyntax extensions is available if
1151 an initial <code class="literal">***=</code> director
1152 has specified that the user's input be treated as a literal string
1153 rather than as an RE.
1154 </p></div><div class="sect3" id="POSIX-MATCHING-RULES"><div class="titlepage"><div><div><h4 class="title">9.7.3.5. Regular Expression Matching Rules <a href="#POSIX-MATCHING-RULES" class="id_link">#</a></h4></div></div></div><p>
1155 In the event that an RE could match more than one substring of a given
1156 string, the RE matches the one starting earliest in the string.
1157 If the RE could match more than one substring starting at that point,
1158 either the longest possible match or the shortest possible match will
1159 be taken, depending on whether the RE is <em class="firstterm">greedy</em> or
1160 <em class="firstterm">non-greedy</em>.
1162 Whether an RE is greedy or not is determined by the following rules:
1163 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1164 Most atoms, and all constraints, have no greediness attribute (because
1165 they cannot match variable amounts of text anyway).
1166 </p></li><li class="listitem"><p>
1167 Adding parentheses around an RE does not change its greediness.
1168 </p></li><li class="listitem"><p>
1169 A quantified atom with a fixed-repetition quantifier
1170 (<code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code>
1172 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}?</code>)
1173 has the same greediness (possibly none) as the atom itself.
1174 </p></li><li class="listitem"><p>
1175 A quantified atom with other normal quantifiers (including
1176 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code>
1177 with <em class="replaceable"><code>m</code></em> equal to <em class="replaceable"><code>n</code></em>)
1178 is greedy (prefers longest match).
1179 </p></li><li class="listitem"><p>
1180 A quantified atom with a non-greedy quantifier (including
1181 <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}?</code>
1182 with <em class="replaceable"><code>m</code></em> equal to <em class="replaceable"><code>n</code></em>)
1183 is non-greedy (prefers shortest match).
1184 </p></li><li class="listitem"><p>
1185 A branch — that is, an RE that has no top-level
1186 <code class="literal">|</code> operator — has the same greediness as the first
1187 quantified atom in it that has a greediness attribute.
1188 </p></li><li class="listitem"><p>
1189 An RE consisting of two or more branches connected by the
1190 <code class="literal">|</code> operator is always greedy.
1191 </p></li></ul></div><p>
1193 The above rules associate greediness attributes not only with individual
1194 quantified atoms, but with branches and entire REs that contain quantified
1195 atoms. What that means is that the matching is done in such a way that
1196 the branch, or whole RE, matches the longest or shortest possible
1197 substring <span class="emphasis"><em>as a whole</em></span>. Once the length of the entire match
1198 is determined, the part of it that matches any particular subexpression
1199 is determined on the basis of the greediness attribute of that
1200 subexpression, with subexpressions starting earlier in the RE taking
1201 priority over ones starting later.
1203 An example of what this means:
1204 </p><pre class="screen">
1205 SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
1206 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">123</code>
1207 SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
1208 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">1</code>
1210 In the first case, the RE as a whole is greedy because <code class="literal">Y*</code>
1211 is greedy. It can match beginning at the <code class="literal">Y</code>, and it matches
1212 the longest possible string starting there, i.e., <code class="literal">Y123</code>.
1213 The output is the parenthesized part of that, or <code class="literal">123</code>.
1214 In the second case, the RE as a whole is non-greedy because <code class="literal">Y*?</code>
1215 is non-greedy. It can match beginning at the <code class="literal">Y</code>, and it matches
1216 the shortest possible string starting there, i.e., <code class="literal">Y1</code>.
1217 The subexpression <code class="literal">[0-9]{1,3}</code> is greedy but it cannot change
1218 the decision as to the overall match length; so it is forced to match
1219 just <code class="literal">1</code>.
1221 In short, when an RE contains both greedy and non-greedy subexpressions,
1222 the total match length is either as long as possible or as short as
1223 possible, according to the attribute assigned to the whole RE. The
1224 attributes assigned to the subexpressions only affect how much of that
1225 match they are allowed to <span class="quote">“<span class="quote">eat</span>”</span> relative to each other.
1227 The quantifiers <code class="literal">{1,1}</code> and <code class="literal">{1,1}?</code>
1228 can be used to force greediness or non-greediness, respectively,
1229 on a subexpression or a whole RE.
1230 This is useful when you need the whole RE to have a greediness attribute
1231 different from what's deduced from its elements. As an example,
1232 suppose that we are trying to separate a string containing some digits
1233 into the digits and the parts before and after them. We might try to
1235 </p><pre class="screen">
1236 SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
1237 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc0123,4,xyz}</code>
1239 That didn't work: the first <code class="literal">.*</code> is greedy so
1240 it <span class="quote">“<span class="quote">eats</span>”</span> as much as it can, leaving the <code class="literal">\d+</code> to
1241 match at the last possible place, the last digit. We might try to fix
1242 that by making it non-greedy:
1243 </p><pre class="screen">
1244 SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
1245 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc,0,""}</code>
1247 That didn't work either, because now the RE as a whole is non-greedy
1248 and so it ends the overall match as soon as possible. We can get what
1249 we want by forcing the RE as a whole to be greedy:
1250 </p><pre class="screen">
1251 SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
1252 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc,01234,xyz}</code>
1254 Controlling the RE's overall greediness separately from its components'
1255 greediness allows great flexibility in handling variable-length patterns.
1257 When deciding what is a longer or shorter match,
1258 match lengths are measured in characters, not collating elements.
1259 An empty string is considered longer than no match at all.
1261 <code class="literal">bb*</code>
1262 matches the three middle characters of <code class="literal">abbbc</code>;
1263 <code class="literal">(week|wee)(night|knights)</code>
1264 matches all ten characters of <code class="literal">weeknights</code>;
1265 when <code class="literal">(.*).*</code>
1266 is matched against <code class="literal">abc</code> the parenthesized subexpression
1267 matches all three characters; and when
1268 <code class="literal">(a*)*</code> is matched against <code class="literal">bc</code>
1269 both the whole RE and the parenthesized
1270 subexpression match an empty string.
1272 If case-independent matching is specified,
1273 the effect is much as if all case distinctions had vanished from the
1275 When an alphabetic that exists in multiple cases appears as an
1276 ordinary character outside a bracket expression, it is effectively
1277 transformed into a bracket expression containing both cases,
1278 e.g., <code class="literal">x</code> becomes <code class="literal">[xX]</code>.
1279 When it appears inside a bracket expression, all case counterparts
1280 of it are added to the bracket expression, e.g.,
1281 <code class="literal">[x]</code> becomes <code class="literal">[xX]</code>
1282 and <code class="literal">[^x]</code> becomes <code class="literal">[^xX]</code>.
1284 If newline-sensitive matching is specified, <code class="literal">.</code>
1285 and bracket expressions using <code class="literal">^</code>
1286 will never match the newline character
1287 (so that matches will not cross lines unless the RE
1288 explicitly includes a newline)
1289 and <code class="literal">^</code> and <code class="literal">$</code>
1290 will match the empty string after and before a newline
1291 respectively, in addition to matching at beginning and end of string
1293 But the ARE escapes <code class="literal">\A</code> and <code class="literal">\Z</code>
1294 continue to match beginning or end of string <span class="emphasis"><em>only</em></span>.
1295 Also, the character class shorthands <code class="literal">\D</code>
1296 and <code class="literal">\W</code> will match a newline regardless of this mode.
1297 (Before <span class="productname">PostgreSQL</span> 14, they did not match
1298 newlines when in newline-sensitive mode.
1299 Write <code class="literal">[^[:digit:]]</code>
1300 or <code class="literal">[^[:word:]]</code> to get the old behavior.)
1302 If partial newline-sensitive matching is specified,
1303 this affects <code class="literal">.</code> and bracket expressions
1304 as with newline-sensitive matching, but not <code class="literal">^</code>
1305 and <code class="literal">$</code>.
1307 If inverse partial newline-sensitive matching is specified,
1308 this affects <code class="literal">^</code> and <code class="literal">$</code>
1309 as with newline-sensitive matching, but not <code class="literal">.</code>
1310 and bracket expressions.
1311 This isn't very useful but is provided for symmetry.
1312 </p></div><div class="sect3" id="POSIX-LIMITS-COMPATIBILITY"><div class="titlepage"><div><div><h4 class="title">9.7.3.6. Limits and Compatibility <a href="#POSIX-LIMITS-COMPATIBILITY" class="id_link">#</a></h4></div></div></div><p>
1313 No particular limit is imposed on the length of REs in this
1314 implementation. However,
1315 programs intended to be highly portable should not employ REs longer
1317 as a POSIX-compliant implementation can refuse to accept such REs.
1319 The only feature of AREs that is actually incompatible with
1320 POSIX EREs is that <code class="literal">\</code> does not lose its special
1321 significance inside bracket expressions.
1322 All other ARE features use syntax which is illegal or has
1323 undefined or unspecified effects in POSIX EREs;
1324 the <code class="literal">***</code> syntax of directors likewise is outside the POSIX
1325 syntax for both BREs and EREs.
1327 Many of the ARE extensions are borrowed from Perl, but some have
1328 been changed to clean them up, and a few Perl extensions are not present.
1329 Incompatibilities of note include <code class="literal">\b</code>, <code class="literal">\B</code>,
1330 the lack of special treatment for a trailing newline,
1331 the addition of complemented bracket expressions to the things
1332 affected by newline-sensitive matching,
1333 the restrictions on parentheses and back references in lookahead/lookbehind
1334 constraints, and the longest/shortest-match (rather than first-match)
1336 </p></div><div class="sect3" id="POSIX-BASIC-REGEXES"><div class="titlepage"><div><div><h4 class="title">9.7.3.7. Basic Regular Expressions <a href="#POSIX-BASIC-REGEXES" class="id_link">#</a></h4></div></div></div><p>
1337 BREs differ from EREs in several respects.
1338 In BREs, <code class="literal">|</code>, <code class="literal">+</code>, and <code class="literal">?</code>
1339 are ordinary characters and there is no equivalent
1340 for their functionality.
1341 The delimiters for bounds are
1342 <code class="literal">\{</code> and <code class="literal">\}</code>,
1343 with <code class="literal">{</code> and <code class="literal">}</code>
1344 by themselves ordinary characters.
1345 The parentheses for nested subexpressions are
1346 <code class="literal">\(</code> and <code class="literal">\)</code>,
1347 with <code class="literal">(</code> and <code class="literal">)</code> by themselves ordinary characters.
1348 <code class="literal">^</code> is an ordinary character except at the beginning of the
1349 RE or the beginning of a parenthesized subexpression,
1350 <code class="literal">$</code> is an ordinary character except at the end of the
1351 RE or the end of a parenthesized subexpression,
1352 and <code class="literal">*</code> is an ordinary character if it appears at the beginning
1353 of the RE or the beginning of a parenthesized subexpression
1354 (after a possible leading <code class="literal">^</code>).
1355 Finally, single-digit back references are available, and
1356 <code class="literal">\<</code> and <code class="literal">\></code>
1358 <code class="literal">[[:<:]]</code> and <code class="literal">[[:>:]]</code>
1359 respectively; no other escapes are available in BREs.
1360 </p></div><div class="sect3" id="POSIX-VS-XQUERY"><div class="titlepage"><div><div><h4 class="title">9.7.3.8. Differences from SQL Standard and XQuery <a href="#POSIX-VS-XQUERY" class="id_link">#</a></h4></div></div></div><a id="id-1.5.8.13.9.48.2" class="indexterm"></a><a id="id-1.5.8.13.9.48.3" class="indexterm"></a><a id="id-1.5.8.13.9.48.4" class="indexterm"></a><a id="id-1.5.8.13.9.48.5" class="indexterm"></a><a id="id-1.5.8.13.9.48.6" class="indexterm"></a><a id="id-1.5.8.13.9.48.7" class="indexterm"></a><p>
1361 Since SQL:2008, the SQL standard includes regular expression operators
1362 and functions that performs pattern
1363 matching according to the XQuery regular expression
1365 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="literal">LIKE_REGEX</code></p></li><li class="listitem"><p><code class="literal">OCCURRENCES_REGEX</code></p></li><li class="listitem"><p><code class="literal">POSITION_REGEX</code></p></li><li class="listitem"><p><code class="literal">SUBSTRING_REGEX</code></p></li><li class="listitem"><p><code class="literal">TRANSLATE_REGEX</code></p></li></ul></div><p>
1366 <span class="productname">PostgreSQL</span> does not currently implement these
1367 operators and functions. You can get approximately equivalent
1368 functionality in each case as shown in <a class="xref" href="functions-matching.html#FUNCTIONS-REGEXP-SQL-TABLE" title="Table 9.25. Regular Expression Functions Equivalencies">Table 9.25</a>. (Various optional clauses on
1369 both sides have been omitted in this table.)
1370 </p><div class="table" id="FUNCTIONS-REGEXP-SQL-TABLE"><p class="title"><strong>Table 9.25. Regular Expression Functions Equivalencies</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Functions Equivalencies" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>SQL standard</th><th><span class="productname">PostgreSQL</span></th></tr></thead><tbody><tr><td><code class="literal"><em class="replaceable"><code>string</code></em> LIKE_REGEX <em class="replaceable"><code>pattern</code></em></code></td><td><code class="literal">regexp_like(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code> or <code class="literal"><em class="replaceable"><code>string</code></em> ~ <em class="replaceable"><code>pattern</code></em></code></td></tr><tr><td><code class="literal">OCCURRENCES_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_count(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">POSITION_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_instr(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">SUBSTRING_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_substr(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">TRANSLATE_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em> WITH <em class="replaceable"><code>replacement</code></em>)</code></td><td><code class="literal">regexp_replace(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>)</code></td></tr></tbody></table></div></div><br class="table-break" /><p>
1371 Regular expression functions similar to those provided by PostgreSQL are
1372 also available in a number of other SQL implementations, whereas the
1373 SQL-standard functions are not as widely implemented. Some of the
1374 details of the regular expression syntax will likely differ in each
1377 The SQL-standard operators and functions use XQuery regular expressions,
1378 which are quite close to the ARE syntax described above.
1379 Notable differences between the existing POSIX-based
1380 regular-expression feature and XQuery regular expressions include:
1382 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1383 XQuery character class subtraction is not supported. An example of
1384 this feature is using the following to match only English
1385 consonants: <code class="literal">[a-z-[aeiou]]</code>.
1386 </p></li><li class="listitem"><p>
1387 XQuery character class shorthands <code class="literal">\c</code>,
1388 <code class="literal">\C</code>, <code class="literal">\i</code>,
1389 and <code class="literal">\I</code> are not supported.
1390 </p></li><li class="listitem"><p>
1391 XQuery character class elements
1392 using <code class="literal">\p{UnicodeProperty}</code> or the
1393 inverse <code class="literal">\P{UnicodeProperty}</code> are not supported.
1394 </p></li><li class="listitem"><p>
1395 POSIX interprets character classes such as <code class="literal">\w</code>
1396 (see <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>)
1397 according to the prevailing locale (which you can control by
1398 attaching a <code class="literal">COLLATE</code> clause to the operator or
1399 function). XQuery specifies these classes by reference to Unicode
1400 character properties, so equivalent behavior is obtained only with
1401 a locale that follows the Unicode rules.
1402 </p></li><li class="listitem"><p>
1403 The SQL standard (not XQuery itself) attempts to cater for more
1404 variants of <span class="quote">“<span class="quote">newline</span>”</span> than POSIX does. The
1405 newline-sensitive matching options described above consider only
1406 ASCII NL (<code class="literal">\n</code>) to be a newline, but SQL would have
1407 us treat CR (<code class="literal">\r</code>), CRLF (<code class="literal">\r\n</code>)
1408 (a Windows-style newline), and some Unicode-only characters like
1409 LINE SEPARATOR (U+2028) as newlines as well.
1410 Notably, <code class="literal">.</code> and <code class="literal">\s</code> should
1411 count <code class="literal">\r\n</code> as one character not two according to
1413 </p></li><li class="listitem"><p>
1414 Of the character-entry escapes described in
1415 <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>,
1416 XQuery supports only <code class="literal">\n</code>, <code class="literal">\r</code>,
1417 and <code class="literal">\t</code>.
1418 </p></li><li class="listitem"><p>
1419 XQuery does not support
1420 the <code class="literal">[:<em class="replaceable"><code>name</code></em>:]</code> syntax
1421 for character classes within bracket expressions.
1422 </p></li><li class="listitem"><p>
1423 XQuery does not have lookahead or lookbehind constraints,
1424 nor any of the constraint escapes described in
1425 <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a>.
1426 </p></li><li class="listitem"><p>
1427 The metasyntax forms described in <a class="xref" href="functions-matching.html#POSIX-METASYNTAX" title="9.7.3.4. Regular Expression Metasyntax">Section 9.7.3.4</a>
1428 do not exist in XQuery.
1429 </p></li><li class="listitem"><p>
1430 The regular expression flag letters defined by XQuery are
1431 related to but not the same as the option letters for POSIX
1432 (<a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>). While the
1433 <code class="literal">i</code> and <code class="literal">q</code> options behave the
1434 same, others do not:
1435 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
1436 XQuery's <code class="literal">s</code> (allow dot to match newline)
1437 and <code class="literal">m</code> (allow <code class="literal">^</code>
1438 and <code class="literal">$</code> to match at newlines) flags provide
1439 access to the same behaviors as
1440 POSIX's <code class="literal">n</code>, <code class="literal">p</code>
1441 and <code class="literal">w</code> flags, but they
1442 do <span class="emphasis"><em>not</em></span> match the behavior of
1443 POSIX's <code class="literal">s</code> and <code class="literal">m</code> flags.
1444 Note in particular that dot-matches-newline is the default
1445 behavior in POSIX but not XQuery.
1446 </p></li><li class="listitem"><p>
1447 XQuery's <code class="literal">x</code> (ignore whitespace in pattern) flag
1448 is noticeably different from POSIX's expanded-mode flag.
1449 POSIX's <code class="literal">x</code> flag also
1450 allows <code class="literal">#</code> to begin a comment in the pattern,
1451 and POSIX will not ignore a whitespace character after a
1453 </p></li></ul></div><p>
1454 </p></li></ul></div><p>
1455 </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="functions.html" title="Chapter 9. Functions and Operators">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="functions-formatting.html" title="9.8. Data Type Formatting Functions">Next</a></td></tr><tr><td width="40%" align="left" valign="top">9.6. Bit String Functions and Operators </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 9.8. Data Type Formatting Functions</td></tr></table></div></body></html>