begriffs open source - ai-pg/blob - full-docs/html/functions-matching.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>9.7. Pattern Matching</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators" /><link rel="next" href="functions-formatting.html" title="9.8. Data Type Formatting Functions" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">9.7. Pattern Matching</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="functions.html" title="Chapter 9. Functions and Operators">Up</a></td><th width="60%" align="center">Chapter 9. Functions and Operators</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="functions-formatting.html" title="9.8. Data Type Formatting Functions">Next</a></td></tr></table><hr /></div><div class="sect1" id="FUNCTIONS-MATCHING"><div class="titlepage"><div><div><h2 class="title" style="clear: both">9.7. Pattern Matching <a href="#FUNCTIONS-MATCHING" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-LIKE">9.7.1. <code class="function">LIKE</code></a></span></dt><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP">9.7.2. <code class="function">SIMILAR TO</code> Regular Expressions</a></span></dt><dt><span class="sect2"><a href="functions-matching.html#FUNCTIONS-POSIX-REGEXP">9.7.3. <acronym class="acronym">POSIX</acronym> Regular Expressions</a></span></dt></dl></div><a id="id-1.5.8.13.2" class="indexterm"></a><p>
   3     There are three separate approaches to pattern matching provided
   4     by <span class="productname">PostgreSQL</span>: the traditional
   5     <acronym class="acronym">SQL</acronym> <code class="function">LIKE</code> operator, the
   6     more recent <code class="function">SIMILAR TO</code> operator (added in
   7     SQL:1999), and <acronym class="acronym">POSIX</acronym>-style regular
   8     expressions.  Aside from the basic <span class="quote">“<span class="quote">does this string match
   9     this pattern?</span>”</span> operators, functions are available to extract
  10     or replace matching substrings and to split a string at matching
  11     locations.
  12    </p><div class="tip"><h3 class="title">Tip</h3><p>
  13      If you have pattern matching needs that go beyond this,
  14      consider writing a user-defined function in Perl or Tcl.
  15     </p></div><div class="caution"><h3 class="title">Caution</h3><p>
  16      While most regular-expression searches can be executed very quickly,
  17      regular expressions can be contrived that take arbitrary amounts of
  18      time and memory to process.  Be wary of accepting regular-expression
  19      search patterns from hostile sources.  If you must do so, it is
  20      advisable to impose a statement timeout.
  21     </p><p>
  22      Searches using <code class="function">SIMILAR TO</code> patterns have the same
  23      security hazards, since <code class="function">SIMILAR TO</code> provides many
  24      of the same capabilities as <acronym class="acronym">POSIX</acronym>-style regular
  25      expressions.
  26     </p><p>
  27      <code class="function">LIKE</code> searches, being much simpler than the other
  28      two options, are safer to use with possibly-hostile pattern sources.
  29     </p></div><p>
  30     <code class="function">SIMILAR TO</code> and <acronym class="acronym">POSIX</acronym>-style regular
  31     expressions do not support nondeterministic collations.  If required, use
  32     <code class="function">LIKE</code> or apply a different collation to the expression
  33     to work around this limitation.
  34    </p><div class="sect2" id="FUNCTIONS-LIKE"><div class="titlepage"><div><div><h3 class="title">9.7.1. <code class="function">LIKE</code> <a href="#FUNCTIONS-LIKE" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.7.2" class="indexterm"></a><pre class="synopsis">
  35 <em class="replaceable"><code>string</code></em> LIKE <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
  36 <em class="replaceable"><code>string</code></em> NOT LIKE <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
  37 </pre><p>
  38      The <code class="function">LIKE</code> expression returns true if the
  39      <em class="replaceable"><code>string</code></em> matches the supplied
  40      <em class="replaceable"><code>pattern</code></em>.  (As
  41      expected, the <code class="function">NOT LIKE</code> expression returns
  42      false if <code class="function">LIKE</code> returns true, and vice versa.
  43      An equivalent expression is
  44      <code class="literal">NOT (<em class="replaceable"><code>string</code></em> LIKE
  45       <em class="replaceable"><code>pattern</code></em>)</code>.)
  46     </p><p>
  47      If <em class="replaceable"><code>pattern</code></em> does not contain percent
  48      signs or underscores, then the pattern only represents the string
  49      itself; in that case <code class="function">LIKE</code> acts like the
  50      equals operator.  An underscore (<code class="literal">_</code>) in
  51      <em class="replaceable"><code>pattern</code></em> stands for (matches) any single
  52      character; a percent sign (<code class="literal">%</code>) matches any sequence
  53      of zero or more characters.
  54     </p><p>
  55     Some examples:
  56 </p><pre class="programlisting">
  57 'abc' LIKE 'abc'    <em class="lineannotation"><span class="lineannotation">true</span></em>
  58 'abc' LIKE 'a%'     <em class="lineannotation"><span class="lineannotation">true</span></em>
  59 'abc' LIKE '_b_'    <em class="lineannotation"><span class="lineannotation">true</span></em>
  60 'abc' LIKE 'c'      <em class="lineannotation"><span class="lineannotation">false</span></em>
  61 </pre><p>
  62    </p><p>
  63     <code class="function">LIKE</code> pattern matching supports nondeterministic
  64     collations (see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="23.2.2.4. Nondeterministic Collations">Section 23.2.2.4</a>), such as
  65     case-insensitive collations or collations that, say, ignore punctuation.
  66     So with a case-insensitive collation, one could have:
  67 </p><pre class="programlisting">
  68 'AbC' LIKE 'abc' COLLATE case_insensitive    <em class="lineannotation"><span class="lineannotation">true</span></em>
  69 'AbC' LIKE 'a%' COLLATE case_insensitive     <em class="lineannotation"><span class="lineannotation">true</span></em>
  70 </pre><p>
  71     With collations that ignore certain characters or in general that consider
  72     strings of different lengths equal, the semantics can become a bit more
  73     complicated.  Consider these examples:
  74 </p><pre class="programlisting">
  75 '.foo.' LIKE 'foo' COLLATE ign_punct    <em class="lineannotation"><span class="lineannotation">true</span></em>
  76 '.foo.' LIKE 'f_o' COLLATE ign_punct    <em class="lineannotation"><span class="lineannotation">true</span></em>
  77 '.foo.' LIKE '_oo' COLLATE ign_punct    <em class="lineannotation"><span class="lineannotation">false</span></em>
  78 </pre><p>
  79     The way the matching works is that the pattern is partitioned into
  80     sequences of wildcards and non-wildcard strings (wildcards being
  81     <code class="literal">_</code> and <code class="literal">%</code>).  For example, the pattern
  82     <code class="literal">f_o</code> is partitioned into <code class="literal">f, _, o</code>, the
  83     pattern <code class="literal">_oo</code> is partitioned into <code class="literal">_,
  84     oo</code>.  The input string matches the pattern if it can be
  85     partitioned in such a way that the wildcards match one character or any
  86     number of characters respectively and the non-wildcard partitions are
  87     equal under the applicable collation.  So for example, <code class="literal">'.foo.'
  88     LIKE 'f_o' COLLATE ign_punct</code> is true because one can partition
  89     <code class="literal">.foo.</code> into <code class="literal">.f, o, o.</code>, and then
  90     <code class="literal">'.f' = 'f' COLLATE ign_punct</code>, <code class="literal">'o'</code>
  91     matches the <code class="literal">_</code> wildcard, and <code class="literal">'o.' = 'o' COLLATE
  92     ign_punct</code>.  But <code class="literal">'.foo.' LIKE '_oo' COLLATE
  93     ign_punct</code> is false because <code class="literal">.foo.</code> cannot be
  94     partitioned in a way that the first character is any character and the
  95     rest of the string compares equal to <code class="literal">oo</code>.  (Note that
  96     the single-character wildcard always matches exactly one character,
  97     independent of the collation.  So in this example, the
  98     <code class="literal">_</code> would match <code class="literal">.</code>, but then the rest
  99     of the input string won't match the rest of the pattern.)
 100    </p><p>
 101     <code class="function">LIKE</code> pattern matching always covers the entire
 102     string.  Therefore, if it's desired to match a sequence anywhere within
 103     a string, the pattern must start and end with a percent sign.
 104    </p><p>
 105     To match a literal underscore or percent sign without matching
 106     other characters, the respective character in
 107     <em class="replaceable"><code>pattern</code></em> must be
 108     preceded by the escape character.  The default escape
 109     character is the backslash but a different one can be selected by
 110     using the <code class="literal">ESCAPE</code> clause.  To match the escape
 111     character itself, write two escape characters.
 112    </p><div class="note"><h3 class="title">Note</h3><p>
 113      If you have <a class="xref" href="runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS">standard_conforming_strings</a> turned off,
 114      any backslashes you write in literal string constants will need to be
 115      doubled.  See <a class="xref" href="sql-syntax-lexical.html#SQL-SYNTAX-STRINGS" title="4.1.2.1. String Constants">Section 4.1.2.1</a> for more information.
 116     </p></div><p>
 117     It's also possible to select no escape character by writing
 118     <code class="literal">ESCAPE ''</code>.  This effectively disables the
 119     escape mechanism, which makes it impossible to turn off the
 120     special meaning of underscore and percent signs in the pattern.
 121    </p><p>
 122     According to the SQL standard, omitting <code class="literal">ESCAPE</code>
 123     means there is no escape character (rather than defaulting to a
 124     backslash), and a zero-length <code class="literal">ESCAPE</code> value is
 125     disallowed.  <span class="productname">PostgreSQL</span>'s behavior in
 126     this regard is therefore slightly nonstandard.
 127    </p><p>
 128     The key word <code class="token">ILIKE</code> can be used instead of
 129     <code class="token">LIKE</code> to make the match case-insensitive according to the
 130     active locale.  (But this does not support nondeterministic collations.)
 131     This is not in the <acronym class="acronym">SQL</acronym> standard but is a
 132     <span class="productname">PostgreSQL</span> extension.
 133    </p><p>
 134     The operator <code class="literal">~~</code> is equivalent to
 135     <code class="function">LIKE</code>, and <code class="literal">~~*</code> corresponds to
 136     <code class="function">ILIKE</code>.  There are also
 137     <code class="literal">!~~</code> and <code class="literal">!~~*</code> operators that
 138     represent <code class="function">NOT LIKE</code> and <code class="function">NOT
 139     ILIKE</code>, respectively.  All of these operators are
 140     <span class="productname">PostgreSQL</span>-specific.  You may see these
 141     operator names in <code class="command">EXPLAIN</code> output and similar
 142     places, since the parser actually translates <code class="function">LIKE</code>
 143     et al. to these operators.
 144    </p><p>
 145     The phrases <code class="function">LIKE</code>, <code class="function">ILIKE</code>,
 146     <code class="function">NOT LIKE</code>, and <code class="function">NOT ILIKE</code> are
 147     generally treated as operators
 148     in <span class="productname">PostgreSQL</span> syntax; for example they can
 149     be used in <em class="replaceable"><code>expression</code></em>
 150     <em class="replaceable"><code>operator</code></em> ANY
 151     (<em class="replaceable"><code>subquery</code></em>) constructs, although
 152     an <code class="literal">ESCAPE</code> clause cannot be included there.  In some
 153     obscure cases it may be necessary to use the underlying operator names
 154     instead.
 155    </p><p>
 156     Also see the starts-with operator <code class="literal">^@</code> and the
 157     corresponding <code class="function">starts_with()</code> function, which are
 158     useful in cases where simply matching the beginning of a string is
 159     needed.
 160    </p></div><div class="sect2" id="FUNCTIONS-SIMILARTO-REGEXP"><div class="titlepage"><div><div><h3 class="title">9.7.2. <code class="function">SIMILAR TO</code> Regular Expressions <a href="#FUNCTIONS-SIMILARTO-REGEXP" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.8.2" class="indexterm"></a><a id="id-1.5.8.13.8.3" class="indexterm"></a><a id="id-1.5.8.13.8.4" class="indexterm"></a><pre class="synopsis">
 161 <em class="replaceable"><code>string</code></em> SIMILAR TO <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
 162 <em class="replaceable"><code>string</code></em> NOT SIMILAR TO <em class="replaceable"><code>pattern</code></em> [<span class="optional">ESCAPE <em class="replaceable"><code>escape-character</code></em></span>]
 163 </pre><p>
 164     The <code class="function">SIMILAR TO</code> operator returns true or
 165     false depending on whether its pattern matches the given string.
 166     It is similar to <code class="function">LIKE</code>, except that it
 167     interprets the pattern using the SQL standard's definition of a
 168     regular expression.  SQL regular expressions are a curious cross
 169     between <code class="function">LIKE</code> notation and common (POSIX) regular
 170     expression notation.
 171    </p><p>
 172     Like <code class="function">LIKE</code>, the <code class="function">SIMILAR TO</code>
 173     operator succeeds only if its pattern matches the entire string;
 174     this is unlike common regular expression behavior where the pattern
 175     can match any part of the string.
 176     Also like
 177     <code class="function">LIKE</code>, <code class="function">SIMILAR TO</code> uses
 178     <code class="literal">_</code> and <code class="literal">%</code> as wildcard characters denoting
 179     any single character and any string, respectively (these are
 180     comparable to <code class="literal">.</code> and <code class="literal">.*</code> in POSIX regular
 181     expressions).
 182    </p><p>
 183     In addition to these facilities borrowed from <code class="function">LIKE</code>,
 184     <code class="function">SIMILAR TO</code> supports these pattern-matching
 185     metacharacters borrowed from POSIX regular expressions:
 186
 187    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 188       <code class="literal">|</code> denotes alternation (either of two alternatives).
 189      </p></li><li class="listitem"><p>
 190       <code class="literal">*</code> denotes repetition of the previous item zero
 191       or more times.
 192      </p></li><li class="listitem"><p>
 193       <code class="literal">+</code> denotes repetition of the previous item one
 194       or more times.
 195      </p></li><li class="listitem"><p>
 196       <code class="literal">?</code> denotes repetition of the previous item zero
 197       or one time.
 198      </p></li><li class="listitem"><p>
 199       <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> denotes repetition
 200       of the previous item exactly <em class="replaceable"><code>m</code></em> times.
 201      </p></li><li class="listitem"><p>
 202       <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> denotes repetition
 203       of the previous item <em class="replaceable"><code>m</code></em> or more times.
 204      </p></li><li class="listitem"><p>
 205       <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code>
 206       denotes repetition of the previous item at least <em class="replaceable"><code>m</code></em> and
 207       not more than <em class="replaceable"><code>n</code></em> times.
 208      </p></li><li class="listitem"><p>
 209       Parentheses <code class="literal">()</code> can be used to group items into
 210       a single logical item.
 211      </p></li><li class="listitem"><p>
 212       A bracket expression <code class="literal">[...]</code> specifies a character
 213       class, just as in POSIX regular expressions.
 214      </p></li></ul></div><p>
 215
 216     Notice that the period (<code class="literal">.</code>) is not a metacharacter
 217     for <code class="function">SIMILAR TO</code>.
 218    </p><p>
 219     As with <code class="function">LIKE</code>, a backslash disables the special
 220     meaning of any of these metacharacters.  A different escape character
 221     can be specified with <code class="literal">ESCAPE</code>, or the escape
 222     capability can be disabled by writing <code class="literal">ESCAPE ''</code>.
 223    </p><p>
 224     According to the SQL standard, omitting <code class="literal">ESCAPE</code>
 225     means there is no escape character (rather than defaulting to a
 226     backslash), and a zero-length <code class="literal">ESCAPE</code> value is
 227     disallowed.  <span class="productname">PostgreSQL</span>'s behavior in
 228     this regard is therefore slightly nonstandard.
 229    </p><p>
 230     Another nonstandard extension is that following the escape character
 231     with a letter or digit provides access to the escape sequences
 232     defined for POSIX regular expressions; see
 233     <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>,
 234     <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>, and
 235     <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a> below.
 236    </p><p>
 237     Some examples:
 238 </p><pre class="programlisting">
 239 'abc' SIMILAR TO 'abc'          <em class="lineannotation"><span class="lineannotation">true</span></em>
 240 'abc' SIMILAR TO 'a'            <em class="lineannotation"><span class="lineannotation">false</span></em>
 241 'abc' SIMILAR TO '%(b|d)%'      <em class="lineannotation"><span class="lineannotation">true</span></em>
 242 'abc' SIMILAR TO '(b|c)%'       <em class="lineannotation"><span class="lineannotation">false</span></em>
 243 '-abc-' SIMILAR TO '%\mabc\M%'  <em class="lineannotation"><span class="lineannotation">true</span></em>
 244 'xabcy' SIMILAR TO '%\mabc\M%'  <em class="lineannotation"><span class="lineannotation">false</span></em>
 245 </pre><p>
 246    </p><p>
 247     The <code class="function">substring</code> function with three parameters
 248     provides extraction of a substring that matches an SQL
 249     regular expression pattern.  The function can be written according
 250     to standard SQL syntax:
 251 </p><pre class="synopsis">
 252 substring(<em class="replaceable"><code>string</code></em> similar <em class="replaceable"><code>pattern</code></em> escape <em class="replaceable"><code>escape-character</code></em>)
 253 </pre><p>
 254     or using the now obsolete SQL:1999 syntax:
 255 </p><pre class="synopsis">
 256 substring(<em class="replaceable"><code>string</code></em> from <em class="replaceable"><code>pattern</code></em> for <em class="replaceable"><code>escape-character</code></em>)
 257 </pre><p>
 258     or as a plain three-argument function:
 259 </p><pre class="synopsis">
 260 substring(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>escape-character</code></em>)
 261 </pre><p>
 262     As with <code class="literal">SIMILAR TO</code>, the
 263     specified pattern must match the entire data string, or else the
 264     function fails and returns null.  To indicate the part of the
 265     pattern for which the matching data sub-string is of interest,
 266     the pattern should contain
 267     two occurrences of the escape character followed by a double quote
 268     (<code class="literal">"</code>).
 269     The text matching the portion of the pattern
 270     between these separators is returned when the match is successful.
 271    </p><p>
 272     The escape-double-quote separators actually
 273     divide <code class="function">substring</code>'s pattern into three independent
 274     regular expressions; for example, a vertical bar (<code class="literal">|</code>)
 275     in any of the three sections affects only that section.  Also, the first
 276     and third of these regular expressions are defined to match the smallest
 277     possible amount of text, not the largest, when there is any ambiguity
 278     about how much of the data string matches which pattern.  (In POSIX
 279     parlance, the first and third regular expressions are forced to be
 280     non-greedy.)
 281    </p><p>
 282     As an extension to the SQL standard, <span class="productname">PostgreSQL</span>
 283     allows there to be just one escape-double-quote separator, in which case
 284     the third regular expression is taken as empty; or no separators, in which
 285     case the first and third regular expressions are taken as empty.
 286    </p><p>
 287     Some examples, with <code class="literal">#"</code> delimiting the return string:
 288 </p><pre class="programlisting">
 289 substring('foobar' similar '%#"o_b#"%' escape '#')   <em class="lineannotation"><span class="lineannotation">oob</span></em>
 290 substring('foobar' similar '#"o_b#"%' escape '#')    <em class="lineannotation"><span class="lineannotation">NULL</span></em>
 291 </pre><p>
 292    </p></div><div class="sect2" id="FUNCTIONS-POSIX-REGEXP"><div class="titlepage"><div><div><h3 class="title">9.7.3. <acronym class="acronym">POSIX</acronym> Regular Expressions <a href="#FUNCTIONS-POSIX-REGEXP" class="id_link">#</a></h3></div></div></div><a id="id-1.5.8.13.9.2" class="indexterm"></a><a id="id-1.5.8.13.9.3" class="indexterm"></a><a id="id-1.5.8.13.9.4" class="indexterm"></a><a id="id-1.5.8.13.9.5" class="indexterm"></a><a id="id-1.5.8.13.9.6" class="indexterm"></a><a id="id-1.5.8.13.9.7" class="indexterm"></a><a id="id-1.5.8.13.9.8" class="indexterm"></a><a id="id-1.5.8.13.9.9" class="indexterm"></a><a id="id-1.5.8.13.9.10" class="indexterm"></a><a id="id-1.5.8.13.9.11" class="indexterm"></a><a id="id-1.5.8.13.9.12" class="indexterm"></a><p>
 293     <a class="xref" href="functions-matching.html#FUNCTIONS-POSIX-TABLE" title="Table 9.16. Regular Expression Match Operators">Table 9.16</a> lists the available
 294     operators for pattern matching using POSIX regular expressions.
 295    </p><div class="table" id="FUNCTIONS-POSIX-TABLE"><p class="title"><strong>Table 9.16. Regular Expression Match Operators</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Match Operators" border="1"><colgroup><col /></colgroup><thead><tr><th class="func_table_entry"><p class="func_signature">
 296         Operator
 297        </p>
 298        <p>
 299         Description
 300        </p>
 301        <p>
 302         Example(s)
 303        </p></th></tr></thead><tbody><tr><td class="func_table_entry"><p class="func_signature">
 304         <code class="type">text</code> <code class="literal">~</code> <code class="type">text</code>
 305         → <code class="returnvalue">boolean</code>
 306        </p>
 307        <p>
 308         String matches regular expression, case sensitively
 309        </p>
 310        <p>
 311         <code class="literal">'thomas' ~ 't.*ma'</code>
 312         → <code class="returnvalue">t</code>
 313        </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
 314         <code class="type">text</code> <code class="literal">~*</code> <code class="type">text</code>
 315         → <code class="returnvalue">boolean</code>
 316        </p>
 317        <p>
 318         String matches regular expression, case-insensitively
 319        </p>
 320        <p>
 321         <code class="literal">'thomas' ~* 'T.*ma'</code>
 322         → <code class="returnvalue">t</code>
 323        </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
 324         <code class="type">text</code> <code class="literal">!~</code> <code class="type">text</code>
 325         → <code class="returnvalue">boolean</code>
 326        </p>
 327        <p>
 328         String does not match regular expression, case sensitively
 329        </p>
 330        <p>
 331         <code class="literal">'thomas' !~ 't.*max'</code>
 332         → <code class="returnvalue">t</code>
 333        </p></td></tr><tr><td class="func_table_entry"><p class="func_signature">
 334         <code class="type">text</code> <code class="literal">!~*</code> <code class="type">text</code>
 335         → <code class="returnvalue">boolean</code>
 336        </p>
 337        <p>
 338         String does not match regular expression, case-insensitively
 339        </p>
 340        <p>
 341         <code class="literal">'thomas' !~* 'T.*ma'</code>
 342         → <code class="returnvalue">f</code>
 343        </p></td></tr></tbody></table></div></div><br class="table-break" /><p>
 344      <acronym class="acronym">POSIX</acronym> regular expressions provide a more
 345      powerful means for pattern matching than the <code class="function">LIKE</code> and
 346      <code class="function">SIMILAR TO</code> operators.
 347      Many Unix tools such as <code class="command">egrep</code>,
 348      <code class="command">sed</code>, or <code class="command">awk</code> use a pattern
 349      matching language that is similar to the one described here.
 350     </p><p>
 351      A regular expression is a character sequence that is an
 352      abbreviated definition of a set of strings (a <em class="firstterm">regular
 353      set</em>).  A string is said to match a regular expression
 354      if it is a member of the regular set described by the regular
 355      expression.  As with <code class="function">LIKE</code>, pattern characters
 356      match string characters exactly unless they are special characters
 357      in the regular expression language — but regular expressions use
 358      different special characters than <code class="function">LIKE</code> does.
 359      Unlike <code class="function">LIKE</code> patterns, a
 360      regular expression is allowed to match anywhere within a string, unless
 361      the regular expression is explicitly anchored to the beginning or
 362      end of the string.
 363     </p><p>
 364      Some examples:
 365 </p><pre class="programlisting">
 366 'abcd' ~ 'bc'     <em class="lineannotation"><span class="lineannotation">true</span></em>
 367 'abcd' ~ 'a.c'    <em class="lineannotation"><span class="lineannotation">true — dot matches any character</span></em>
 368 'abcd' ~ 'a.*d'   <em class="lineannotation"><span class="lineannotation">true — <code class="literal">*</code> repeats the preceding pattern item</span></em>
 369 'abcd' ~ '(b|x)'  <em class="lineannotation"><span class="lineannotation">true — <code class="literal">|</code> means OR, parentheses group</span></em>
 370 'abcd' ~ '^a'     <em class="lineannotation"><span class="lineannotation">true — <code class="literal">^</code> anchors to start of string</span></em>
 371 'abcd' ~ '^(b|c)' <em class="lineannotation"><span class="lineannotation">false — would match except for anchoring</span></em>
 372 </pre><p>
 373     </p><p>
 374      The <acronym class="acronym">POSIX</acronym> pattern language is described in much
 375      greater detail below.
 376     </p><p>
 377      The <code class="function">substring</code> function with two parameters,
 378      <code class="function">substring(<em class="replaceable"><code>string</code></em> from
 379      <em class="replaceable"><code>pattern</code></em>)</code>, provides extraction of a
 380      substring
 381      that matches a POSIX regular expression pattern.  It returns null if
 382      there is no match, otherwise the first portion of the text that matched the
 383      pattern.  But if the pattern contains any parentheses, the portion
 384      of the text that matched the first parenthesized subexpression (the
 385      one whose left parenthesis comes first) is
 386      returned.  You can put parentheses around the whole expression
 387      if you want to use parentheses within it without triggering this
 388      exception.  If you need parentheses in the pattern before the
 389      subexpression you want to extract, see the non-capturing parentheses
 390      described below.
 391     </p><p>
 392     Some examples:
 393 </p><pre class="programlisting">
 394 substring('foobar' from 'o.b')     <em class="lineannotation"><span class="lineannotation">oob</span></em>
 395 substring('foobar' from 'o(.)b')   <em class="lineannotation"><span class="lineannotation">o</span></em>
 396 </pre><p>
 397    </p><p>
 398      The <code class="function">regexp_count</code> function counts the number of
 399      places where a POSIX regular expression pattern matches a string.
 400      It has the syntax
 401      <code class="function">regexp_count</code>(<em class="replaceable"><code>string</code></em>,
 402      <em class="replaceable"><code>pattern</code></em>
 403      [<span class="optional">, <em class="replaceable"><code>start</code></em>
 404      [<span class="optional">, <em class="replaceable"><code>flags</code></em>
 405      </span>]</span>]).
 406      <em class="replaceable"><code>pattern</code></em> is searched for
 407      in <em class="replaceable"><code>string</code></em>, normally from the beginning of
 408      the string, but if the <em class="replaceable"><code>start</code></em> parameter is
 409      provided then beginning from that character index.
 410      The <em class="replaceable"><code>flags</code></em> parameter is an optional text
 411      string containing zero or more single-letter flags that change the
 412      function's behavior.  For example, including <code class="literal">i</code> in
 413      <em class="replaceable"><code>flags</code></em> specifies case-insensitive matching.
 414      Supported flags are described in
 415      <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 416     </p><p>
 417      Some examples:
 418 </p><pre class="programlisting">
 419 regexp_count('ABCABCAXYaxy', 'A.')          <em class="lineannotation"><span class="lineannotation">3</span></em>
 420 regexp_count('ABCABCAXYaxy', 'A.', 1, 'i')  <em class="lineannotation"><span class="lineannotation">4</span></em>
 421 </pre><p>
 422     </p><p>
 423      The <code class="function">regexp_instr</code> function returns the starting or
 424      ending position of the <em class="replaceable"><code>N</code></em>'th match of a
 425      POSIX regular expression pattern to a string, or zero if there is no
 426      such match.  It has the syntax
 427      <code class="function">regexp_instr</code>(<em class="replaceable"><code>string</code></em>,
 428      <em class="replaceable"><code>pattern</code></em>
 429      [<span class="optional">, <em class="replaceable"><code>start</code></em>
 430      [<span class="optional">, <em class="replaceable"><code>N</code></em>
 431      [<span class="optional">, <em class="replaceable"><code>endoption</code></em>
 432      [<span class="optional">, <em class="replaceable"><code>flags</code></em>
 433      [<span class="optional">, <em class="replaceable"><code>subexpr</code></em>
 434      </span>]</span>]</span>]</span>]</span>]).
 435      <em class="replaceable"><code>pattern</code></em> is searched for
 436      in <em class="replaceable"><code>string</code></em>, normally from the beginning of
 437      the string, but if the <em class="replaceable"><code>start</code></em> parameter is
 438      provided then beginning from that character index.
 439      If <em class="replaceable"><code>N</code></em> is specified
 440      then the <em class="replaceable"><code>N</code></em>'th match of the pattern
 441      is located, otherwise the first match is located.
 442      If the <em class="replaceable"><code>endoption</code></em> parameter is omitted or
 443      specified as zero, the function returns the position of the first
 444      character of the match.  Otherwise, <em class="replaceable"><code>endoption</code></em>
 445      must be one, and the function returns the position of the character
 446      following the match.
 447      The <em class="replaceable"><code>flags</code></em> parameter is an optional text
 448      string containing zero or more single-letter flags that change the
 449      function's behavior.  Supported flags are described
 450      in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 451      For a pattern containing parenthesized
 452      subexpressions, <em class="replaceable"><code>subexpr</code></em> is an integer
 453      indicating which subexpression is of interest: the result identifies
 454      the position of the substring matching that subexpression.
 455      Subexpressions are numbered in the order of their leading parentheses.
 456      When <em class="replaceable"><code>subexpr</code></em> is omitted or zero, the result
 457      identifies the position of the whole match regardless of
 458      parenthesized subexpressions.
 459     </p><p>
 460      Some examples:
 461 </p><pre class="programlisting">
 462 regexp_instr('number of your street, town zip, FR', '[^,]+', 1, 2)
 463                                    <em class="lineannotation"><span class="lineannotation">23</span></em>
 464 regexp_instr(string=&gt;'ABCDEFGHI', pattern=&gt;'(c..)(...)', start=&gt;1, "N"=&gt;1, endoption=&gt;0, flags=&gt;'i', subexpr=&gt;2)
 465                                    <em class="lineannotation"><span class="lineannotation">6</span></em>
 466 </pre><p>
 467     </p><p>
 468      The <code class="function">regexp_like</code> function checks whether a match
 469      of a POSIX regular expression pattern occurs within a string,
 470      returning boolean true or false.  It has the syntax
 471      <code class="function">regexp_like</code>(<em class="replaceable"><code>string</code></em>,
 472      <em class="replaceable"><code>pattern</code></em>
 473      [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
 474      The <em class="replaceable"><code>flags</code></em> parameter is an optional text
 475      string containing zero or more single-letter flags that change the
 476      function's behavior.  Supported flags are described
 477      in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 478      This function has the same results as the <code class="literal">~</code>
 479      operator if no flags are specified.  If only the <code class="literal">i</code>
 480      flag is specified, it has the same results as
 481      the <code class="literal">~*</code> operator.
 482     </p><p>
 483      Some examples:
 484 </p><pre class="programlisting">
 485 regexp_like('Hello World', 'world')       <em class="lineannotation"><span class="lineannotation">false</span></em>
 486 regexp_like('Hello World', 'world', 'i')  <em class="lineannotation"><span class="lineannotation">true</span></em>
 487 </pre><p>
 488     </p><p>
 489      The <code class="function">regexp_match</code> function returns a text array of
 490      matching substring(s) within the first match of a POSIX
 491      regular expression pattern to a string.  It has the syntax
 492      <code class="function">regexp_match</code>(<em class="replaceable"><code>string</code></em>,
 493      <em class="replaceable"><code>pattern</code></em> [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
 494      If there is no match, the result is <code class="literal">NULL</code>.
 495      If a match is found, and the <em class="replaceable"><code>pattern</code></em> contains no
 496      parenthesized subexpressions, then the result is a single-element text
 497      array containing the substring matching the whole pattern.
 498      If a match is found, and the <em class="replaceable"><code>pattern</code></em> contains
 499      parenthesized subexpressions, then the result is a text array
 500      whose <em class="replaceable"><code>n</code></em>'th element is the substring matching
 501      the <em class="replaceable"><code>n</code></em>'th parenthesized subexpression of
 502      the <em class="replaceable"><code>pattern</code></em> (not counting <span class="quote">“<span class="quote">non-capturing</span>”</span>
 503      parentheses; see below for details).
 504      The <em class="replaceable"><code>flags</code></em> parameter is an optional text string
 505      containing zero or more single-letter flags that change the function's
 506      behavior.  Supported flags are described
 507      in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 508     </p><p>
 509     Some examples:
 510 </p><pre class="programlisting">
 511 SELECT regexp_match('foobarbequebaz', 'bar.*que');
 512  regexp_match
 513 --------------
 514  {barbeque}
 515 (1 row)
 516
 517 SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
 518  regexp_match
 519 --------------
 520  {bar,beque}
 521 (1 row)
 522 </pre><p>
 523    </p><div class="tip"><h3 class="title">Tip</h3><p>
 524       In the common case where you just want the whole matching substring
 525       or <code class="literal">NULL</code> for no match, the best solution is to
 526       use <code class="function">regexp_substr()</code>.
 527       However, <code class="function">regexp_substr()</code> only exists
 528       in <span class="productname">PostgreSQL</span> version 15 and up.  When
 529       working in older versions, you can extract the first element
 530       of <code class="function">regexp_match()</code>'s result, for example:
 531 </p><pre class="programlisting">
 532 SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
 533  regexp_match
 534 --------------
 535  barbeque
 536 (1 row)
 537 </pre><p>
 538      </p></div><p>
 539      The <code class="function">regexp_matches</code> function returns a set of text arrays
 540      of matching substring(s) within matches of a POSIX regular
 541      expression pattern to a string.  It has the same syntax as
 542      <code class="function">regexp_match</code>.
 543      This function returns no rows if there is no match, one row if there is
 544      a match and the <code class="literal">g</code> flag is not given, or <em class="replaceable"><code>N</code></em>
 545      rows if there are <em class="replaceable"><code>N</code></em> matches and the <code class="literal">g</code> flag
 546      is given.  Each returned row is a text array containing the whole
 547      matched substring or the substrings matching parenthesized
 548      subexpressions of the <em class="replaceable"><code>pattern</code></em>, just as described above
 549      for <code class="function">regexp_match</code>.
 550      <code class="function">regexp_matches</code> accepts all the flags shown
 551      in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>, plus
 552      the <code class="literal">g</code> flag which commands it to return all matches, not
 553      just the first one.
 554     </p><p>
 555     Some examples:
 556 </p><pre class="programlisting">
 557 SELECT regexp_matches('foo', 'not there');
 558  regexp_matches
 559 ----------------
 560 (0 rows)
 561
 562 SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
 563  regexp_matches
 564 ----------------
 565  {bar,beque}
 566  {bazil,barf}
 567 (2 rows)
 568 </pre><p>
 569    </p><div class="tip"><h3 class="title">Tip</h3><p>
 570      In most cases <code class="function">regexp_matches()</code> should be used with
 571      the <code class="literal">g</code> flag, since if you only want the first match, it's
 572      easier and more efficient to use <code class="function">regexp_match()</code>.
 573      However, <code class="function">regexp_match()</code> only exists
 574      in <span class="productname">PostgreSQL</span> version 10 and up.  When working in older
 575      versions, a common trick is to place a <code class="function">regexp_matches()</code>
 576      call in a sub-select, for example:
 577 </p><pre class="programlisting">
 578 SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;
 579 </pre><p>
 580      This produces a text array if there's a match, or <code class="literal">NULL</code> if
 581      not, the same as <code class="function">regexp_match()</code> would do.  Without the
 582      sub-select, this query would produce no output at all for table rows
 583      without a match, which is typically not the desired behavior.
 584     </p></div><p>
 585      The <code class="function">regexp_replace</code> function provides substitution of
 586      new text for substrings that match POSIX regular expression patterns.
 587      It has the syntax
 588      <code class="function">regexp_replace</code>(<em class="replaceable"><code>string</code></em>,
 589      <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>
 590      [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>])
 591      or
 592      <code class="function">regexp_replace</code>(<em class="replaceable"><code>string</code></em>,
 593      <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>,
 594      <em class="replaceable"><code>start</code></em>
 595      [<span class="optional">, <em class="replaceable"><code>N</code></em>
 596      [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]</span>]).
 597      The source <em class="replaceable"><code>string</code></em> is returned unchanged if
 598      there is no match to the <em class="replaceable"><code>pattern</code></em>.  If there is a
 599      match, the <em class="replaceable"><code>string</code></em> is returned with the
 600      <em class="replaceable"><code>replacement</code></em> string substituted for the matching
 601      substring.  The <em class="replaceable"><code>replacement</code></em> string can contain
 602      <code class="literal">\</code><em class="replaceable"><code>n</code></em>, where <em class="replaceable"><code>n</code></em> is 1
 603      through 9, to indicate that the source substring matching the
 604      <em class="replaceable"><code>n</code></em>'th parenthesized subexpression of the pattern should be
 605      inserted, and it can contain <code class="literal">\&amp;</code> to indicate that the
 606      substring matching the entire pattern should be inserted.  Write
 607      <code class="literal">\\</code> if you need to put a literal backslash in the replacement
 608      text.
 609      <em class="replaceable"><code>pattern</code></em> is searched for
 610      in <em class="replaceable"><code>string</code></em>, normally from the beginning of
 611      the string, but if the <em class="replaceable"><code>start</code></em> parameter is
 612      provided then beginning from that character index.
 613      By default, only the first match of the pattern is replaced.
 614      If <em class="replaceable"><code>N</code></em> is specified and is greater than zero,
 615      then the <em class="replaceable"><code>N</code></em>'th match of the pattern
 616      is replaced.
 617      If the <code class="literal">g</code> flag is given, or
 618      if <em class="replaceable"><code>N</code></em> is specified and is zero, then all
 619      matches at or after the <em class="replaceable"><code>start</code></em> position are
 620      replaced.  (The <code class="literal">g</code> flag is ignored
 621      when <em class="replaceable"><code>N</code></em> is specified.)
 622      The <em class="replaceable"><code>flags</code></em> parameter is an optional text
 623      string containing zero or more single-letter flags that change the
 624      function's behavior.  Supported flags (though
 625      not <code class="literal">g</code>) are
 626      described in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 627     </p><p>
 628     Some examples:
 629 </p><pre class="programlisting">
 630 regexp_replace('foobarbaz', 'b..', 'X')
 631                                    <em class="lineannotation"><span class="lineannotation">fooXbaz</span></em>
 632 regexp_replace('foobarbaz', 'b..', 'X', 'g')
 633                                    <em class="lineannotation"><span class="lineannotation">fooXX</span></em>
 634 regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
 635                                    <em class="lineannotation"><span class="lineannotation">fooXarYXazY</span></em>
 636 regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
 637                                    <em class="lineannotation"><span class="lineannotation">X PXstgrXSQL fXnctXXn</span></em>
 638 regexp_replace(string=&gt;'A PostgreSQL function', pattern=&gt;'a|e|i|o|u', replacement=&gt;'X', start=&gt;1, "N"=&gt;3, flags=&gt;'i')
 639                                    <em class="lineannotation"><span class="lineannotation">A PostgrXSQL function</span></em>
 640 </pre><p>
 641    </p><p>
 642      The <code class="function">regexp_split_to_table</code> function splits a string using a POSIX
 643      regular expression pattern as a delimiter.  It has the syntax
 644      <code class="function">regexp_split_to_table</code>(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>
 645      [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
 646      If there is no match to the <em class="replaceable"><code>pattern</code></em>, the function returns the
 647      <em class="replaceable"><code>string</code></em>.  If there is at least one match, for each match it returns
 648      the text from the end of the last match (or the beginning of the string)
 649      to the beginning of the match.  When there are no more matches, it
 650      returns the text from the end of the last match to the end of the string.
 651      The <em class="replaceable"><code>flags</code></em> parameter is an optional text string containing
 652      zero or more single-letter flags that change the function's behavior.
 653      <code class="function">regexp_split_to_table</code> supports the flags described in
 654      <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 655     </p><p>
 656      The <code class="function">regexp_split_to_array</code> function behaves the same as
 657      <code class="function">regexp_split_to_table</code>, except that <code class="function">regexp_split_to_array</code>
 658      returns its result as an array of <code class="type">text</code>.  It has the syntax
 659      <code class="function">regexp_split_to_array</code>(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>
 660      [<span class="optional">, <em class="replaceable"><code>flags</code></em> </span>]).
 661      The parameters are the same as for <code class="function">regexp_split_to_table</code>.
 662     </p><p>
 663     Some examples:
 664 </p><pre class="programlisting">
 665 SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', '\s+') AS foo;
 666   foo
 667 -------
 668  the
 669  quick
 670  brown
 671  fox
 672  jumps
 673  over
 674  the
 675  lazy
 676  dog
 677 (9 rows)
 678
 679 SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
 680               regexp_split_to_array
 681 -----------------------------------------------
 682  {the,quick,brown,fox,jumps,over,the,lazy,dog}
 683 (1 row)
 684
 685 SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
 686  foo
 687 -----
 688  t
 689  h
 690  e
 691  q
 692  u
 693  i
 694  c
 695  k
 696  b
 697  r
 698  o
 699  w
 700  n
 701  f
 702  o
 703  x
 704 (16 rows)
 705 </pre><p>
 706    </p><p>
 707     As the last example demonstrates, the regexp split functions ignore
 708     zero-length matches that occur at the start or end of the string
 709     or immediately after a previous match.  This is contrary to the strict
 710     definition of regexp matching that is implemented by
 711     the other regexp functions, but is usually the most convenient behavior
 712     in practice.  Other software systems such as Perl use similar definitions.
 713    </p><p>
 714      The <code class="function">regexp_substr</code> function returns the substring
 715      that matches a POSIX regular expression pattern,
 716      or <code class="literal">NULL</code> if there is no match.  It has the syntax
 717      <code class="function">regexp_substr</code>(<em class="replaceable"><code>string</code></em>,
 718      <em class="replaceable"><code>pattern</code></em>
 719      [<span class="optional">, <em class="replaceable"><code>start</code></em>
 720      [<span class="optional">, <em class="replaceable"><code>N</code></em>
 721      [<span class="optional">, <em class="replaceable"><code>flags</code></em>
 722      [<span class="optional">, <em class="replaceable"><code>subexpr</code></em>
 723      </span>]</span>]</span>]</span>]).
 724      <em class="replaceable"><code>pattern</code></em> is searched for
 725      in <em class="replaceable"><code>string</code></em>, normally from the beginning of
 726      the string, but if the <em class="replaceable"><code>start</code></em> parameter is
 727      provided then beginning from that character index.
 728      If <em class="replaceable"><code>N</code></em> is specified
 729      then the <em class="replaceable"><code>N</code></em>'th match of the pattern
 730      is returned, otherwise the first match is returned.
 731      The <em class="replaceable"><code>flags</code></em> parameter is an optional text
 732      string containing zero or more single-letter flags that change the
 733      function's behavior.  Supported flags are described
 734      in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
 735      For a pattern containing parenthesized
 736      subexpressions, <em class="replaceable"><code>subexpr</code></em> is an integer
 737      indicating which subexpression is of interest: the result is the
 738      substring matching that subexpression.
 739      Subexpressions are numbered in the order of their leading parentheses.
 740      When <em class="replaceable"><code>subexpr</code></em> is omitted or zero, the result
 741      is the whole match regardless of parenthesized subexpressions.
 742     </p><p>
 743      Some examples:
 744 </p><pre class="programlisting">
 745 regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2)
 746                                    <em class="lineannotation"><span class="lineannotation"> town zip</span></em>
 747 regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2)
 748                                    <em class="lineannotation"><span class="lineannotation">FGH</span></em>
 749 </pre><p>
 750     </p><div class="sect3" id="POSIX-SYNTAX-DETAILS"><div class="titlepage"><div><div><h4 class="title">9.7.3.1. Regular Expression Details <a href="#POSIX-SYNTAX-DETAILS" class="id_link">#</a></h4></div></div></div><p>
 751     <span class="productname">PostgreSQL</span>'s regular expressions are implemented
 752     using a software package written by Henry Spencer.  Much of
 753     the description of regular expressions below is copied verbatim from his
 754     manual.
 755    </p><p>
 756     Regular expressions (<acronym class="acronym">RE</acronym>s), as defined in
 757     <acronym class="acronym">POSIX</acronym> 1003.2, come in two forms:
 758     <em class="firstterm">extended</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">ERE</acronym>s
 759     (roughly those of <code class="command">egrep</code>), and
 760     <em class="firstterm">basic</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">BRE</acronym>s
 761     (roughly those of <code class="command">ed</code>).
 762     <span class="productname">PostgreSQL</span> supports both forms, and
 763     also implements some extensions
 764     that are not in the POSIX standard, but have become widely used
 765     due to their availability in programming languages such as Perl and Tcl.
 766     <acronym class="acronym">RE</acronym>s using these non-POSIX extensions are called
 767     <em class="firstterm">advanced</em> <acronym class="acronym">RE</acronym>s or <acronym class="acronym">ARE</acronym>s
 768     in this documentation.  AREs are almost an exact superset of EREs,
 769     but BREs have several notational incompatibilities (as well as being
 770     much more limited).
 771     We first describe the ARE and ERE forms, noting features that apply
 772     only to AREs, and then describe how BREs differ.
 773    </p><div class="note"><h3 class="title">Note</h3><p>
 774      <span class="productname">PostgreSQL</span> always initially presumes that a regular
 775      expression follows the ARE rules.  However, the more limited ERE or
 776      BRE rules can be chosen by prepending an <em class="firstterm">embedded option</em>
 777      to the RE pattern, as described in <a class="xref" href="functions-matching.html#POSIX-METASYNTAX" title="9.7.3.4. Regular Expression Metasyntax">Section 9.7.3.4</a>.
 778      This can be useful for compatibility with applications that expect
 779      exactly the <acronym class="acronym">POSIX</acronym> 1003.2 rules.
 780     </p></div><p>
 781     A regular expression is defined as one or more
 782     <em class="firstterm">branches</em>, separated by
 783     <code class="literal">|</code>.  It matches anything that matches one of the
 784     branches.
 785    </p><p>
 786     A branch is zero or more <em class="firstterm">quantified atoms</em> or
 787     <em class="firstterm">constraints</em>, concatenated.
 788     It matches a match for the first, followed by a match for the second, etc.;
 789     an empty branch matches the empty string.
 790    </p><p>
 791     A quantified atom is an <em class="firstterm">atom</em> possibly followed
 792     by a single <em class="firstterm">quantifier</em>.
 793     Without a quantifier, it matches a match for the atom.
 794     With a quantifier, it can match some number of matches of the atom.
 795     An <em class="firstterm">atom</em> can be any of the possibilities
 796     shown in <a class="xref" href="functions-matching.html#POSIX-ATOMS-TABLE" title="Table 9.17. Regular Expression Atoms">Table 9.17</a>.
 797     The possible quantifiers and their meanings are shown in
 798     <a class="xref" href="functions-matching.html#POSIX-QUANTIFIERS-TABLE" title="Table 9.18. Regular Expression Quantifiers">Table 9.18</a>.
 799    </p><p>
 800     A <em class="firstterm">constraint</em> matches an empty string, but matches only when
 801     specific conditions are met.  A constraint can be used where an atom
 802     could be used, except it cannot be followed by a quantifier.
 803     The simple constraints are shown in
 804     <a class="xref" href="functions-matching.html#POSIX-CONSTRAINTS-TABLE" title="Table 9.19. Regular Expression Constraints">Table 9.19</a>;
 805     some more constraints are described later.
 806    </p><div class="table" id="POSIX-ATOMS-TABLE"><p class="title"><strong>Table 9.17. Regular Expression Atoms</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Atoms" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Atom</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">(</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> (where <em class="replaceable"><code>re</code></em> is any regular expression)
 807        matches a match for
 808        <em class="replaceable"><code>re</code></em>, with the match noted for possible reporting </td></tr><tr><td> <code class="literal">(?:</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> as above, but the match is not noted for reporting
 809        (a <span class="quote">“<span class="quote">non-capturing</span>”</span> set of parentheses)
 810        (AREs only) </td></tr><tr><td> <code class="literal">.</code> </td><td> matches any single character </td></tr><tr><td> <code class="literal">[</code><em class="replaceable"><code>chars</code></em><code class="literal">]</code> </td><td> a <em class="firstterm">bracket expression</em>,
 811        matching any one of the <em class="replaceable"><code>chars</code></em> (see
 812        <a class="xref" href="functions-matching.html#POSIX-BRACKET-EXPRESSIONS" title="9.7.3.2. Bracket Expressions">Section 9.7.3.2</a> for more detail) </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>k</code></em> </td><td> (where <em class="replaceable"><code>k</code></em> is a non-alphanumeric character)
 813        matches that character taken as an ordinary character,
 814        e.g., <code class="literal">\\</code> matches a backslash character </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>c</code></em> </td><td> where <em class="replaceable"><code>c</code></em> is alphanumeric
 815        (possibly followed by other characters)
 816        is an <em class="firstterm">escape</em>, see <a class="xref" href="functions-matching.html#POSIX-ESCAPE-SEQUENCES" title="9.7.3.3. Regular Expression Escapes">Section 9.7.3.3</a>
 817        (AREs only; in EREs and BREs, this matches <em class="replaceable"><code>c</code></em>) </td></tr><tr><td> <code class="literal">{</code> </td><td> when followed by a character other than a digit,
 818        matches the left-brace character <code class="literal">{</code>;
 819        when followed by a digit, it is the beginning of a
 820        <em class="replaceable"><code>bound</code></em> (see below) </td></tr><tr><td> <em class="replaceable"><code>x</code></em> </td><td> where <em class="replaceable"><code>x</code></em> is a single character with no other
 821        significance, matches that character </td></tr></tbody></table></div></div><br class="table-break" /><p>
 822     An RE cannot end with a backslash (<code class="literal">\</code>).
 823    </p><div class="note"><h3 class="title">Note</h3><p>
 824      If you have <a class="xref" href="runtime-config-compatible.html#GUC-STANDARD-CONFORMING-STRINGS">standard_conforming_strings</a> turned off,
 825      any backslashes you write in literal string constants will need to be
 826      doubled.  See <a class="xref" href="sql-syntax-lexical.html#SQL-SYNTAX-STRINGS" title="4.1.2.1. String Constants">Section 4.1.2.1</a> for more information.
 827     </p></div><div class="table" id="POSIX-QUANTIFIERS-TABLE"><p class="title"><strong>Table 9.18. Regular Expression Quantifiers</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Quantifiers" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Quantifier</th><th>Matches</th></tr></thead><tbody><tr><td> <code class="literal">*</code> </td><td> a sequence of 0 or more matches of the atom </td></tr><tr><td> <code class="literal">+</code> </td><td> a sequence of 1 or more matches of the atom </td></tr><tr><td> <code class="literal">?</code> </td><td> a sequence of 0 or 1 matches of the atom </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> </td><td> a sequence of exactly <em class="replaceable"><code>m</code></em> matches of the atom </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> </td><td> a sequence of <em class="replaceable"><code>m</code></em> or more matches of the atom </td></tr><tr><td>
 828        <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code> </td><td> a sequence of <em class="replaceable"><code>m</code></em> through <em class="replaceable"><code>n</code></em>
 829        (inclusive) matches of the atom; <em class="replaceable"><code>m</code></em> cannot exceed
 830        <em class="replaceable"><code>n</code></em> </td></tr><tr><td> <code class="literal">*?</code> </td><td> non-greedy version of <code class="literal">*</code> </td></tr><tr><td> <code class="literal">+?</code> </td><td> non-greedy version of <code class="literal">+</code> </td></tr><tr><td> <code class="literal">??</code> </td><td> non-greedy version of <code class="literal">?</code> </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code> </td></tr><tr><td> <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,}</code> </td></tr><tr><td>
 831        <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}?</code> </td><td> non-greedy version of <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code> </td></tr></tbody></table></div></div><br class="table-break" /><p>
 832     The forms using <code class="literal">{</code><em class="replaceable"><code>...</code></em><code class="literal">}</code>
 833     are known as <em class="firstterm">bounds</em>.
 834     The numbers <em class="replaceable"><code>m</code></em> and <em class="replaceable"><code>n</code></em> within a bound are
 835     unsigned decimal integers with permissible values from 0 to 255 inclusive.
 836    </p><p>
 837      <em class="firstterm">Non-greedy</em> quantifiers (available in AREs only) match the
 838      same possibilities as their corresponding normal (<em class="firstterm">greedy</em>)
 839      counterparts, but prefer the smallest number rather than the largest
 840      number of matches.
 841      See <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for more detail.
 842    </p><div class="note"><h3 class="title">Note</h3><p>
 843      A quantifier cannot immediately follow another quantifier, e.g.,
 844      <code class="literal">**</code> is invalid.
 845      A quantifier cannot
 846      begin an expression or subexpression or follow
 847      <code class="literal">^</code> or <code class="literal">|</code>.
 848     </p></div><div class="table" id="POSIX-CONSTRAINTS-TABLE"><p class="title"><strong>Table 9.19. Regular Expression Constraints</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Constraints" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Constraint</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">^</code> </td><td> matches at the beginning of the string </td></tr><tr><td> <code class="literal">$</code> </td><td> matches at the end of the string </td></tr><tr><td> <code class="literal">(?=</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">positive lookahead</em> matches at any point
 849        where a substring matching <em class="replaceable"><code>re</code></em> begins
 850        (AREs only) </td></tr><tr><td> <code class="literal">(?!</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">negative lookahead</em> matches at any point
 851        where no substring matching <em class="replaceable"><code>re</code></em> begins
 852        (AREs only) </td></tr><tr><td> <code class="literal">(?&lt;=</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">positive lookbehind</em> matches at any point
 853        where a substring matching <em class="replaceable"><code>re</code></em> ends
 854        (AREs only) </td></tr><tr><td> <code class="literal">(?&lt;!</code><em class="replaceable"><code>re</code></em><code class="literal">)</code> </td><td> <em class="firstterm">negative lookbehind</em> matches at any point
 855        where no substring matching <em class="replaceable"><code>re</code></em> ends
 856        (AREs only) </td></tr></tbody></table></div></div><br class="table-break" /><p>
 857     Lookahead and lookbehind constraints cannot contain <em class="firstterm">back
 858     references</em> (see <a class="xref" href="functions-matching.html#POSIX-ESCAPE-SEQUENCES" title="9.7.3.3. Regular Expression Escapes">Section 9.7.3.3</a>),
 859     and all parentheses within them are considered non-capturing.
 860    </p></div><div class="sect3" id="POSIX-BRACKET-EXPRESSIONS"><div class="titlepage"><div><div><h4 class="title">9.7.3.2. Bracket Expressions <a href="#POSIX-BRACKET-EXPRESSIONS" class="id_link">#</a></h4></div></div></div><p>
 861     A <em class="firstterm">bracket expression</em> is a list of
 862     characters enclosed in <code class="literal">[]</code>.  It normally matches
 863     any single character from the list (but see below).  If the list
 864     begins with <code class="literal">^</code>, it matches any single character
 865     <span class="emphasis"><em>not</em></span> from the rest of the list.
 866     If two characters
 867     in the list are separated by <code class="literal">-</code>, this is
 868     shorthand for the full range of characters between those two
 869     (inclusive) in the collating sequence,
 870     e.g., <code class="literal">[0-9]</code> in <acronym class="acronym">ASCII</acronym> matches
 871     any decimal digit.  It is illegal for two ranges to share an
 872     endpoint, e.g.,  <code class="literal">a-c-e</code>.  Ranges are very
 873     collating-sequence-dependent, so portable programs should avoid
 874     relying on them.
 875    </p><p>
 876     To include a literal <code class="literal">]</code> in the list, make it the
 877     first character (after <code class="literal">^</code>, if that is used).  To
 878     include a literal <code class="literal">-</code>, make it the first or last
 879     character, or the second endpoint of a range.  To use a literal
 880     <code class="literal">-</code> as the first endpoint of a range, enclose it
 881     in <code class="literal">[.</code> and <code class="literal">.]</code> to make it a
 882     collating element (see below).  With the exception of these characters,
 883     some combinations using <code class="literal">[</code>
 884     (see next paragraphs), and escapes (AREs only), all other special
 885     characters lose their special significance within a bracket expression.
 886     In particular, <code class="literal">\</code> is not special when following
 887     ERE or BRE rules, though it is special (as introducing an escape)
 888     in AREs.
 889    </p><p>
 890     Within a bracket expression, a collating element (a character, a
 891     multiple-character sequence that collates as if it were a single
 892     character, or a collating-sequence name for either) enclosed in
 893     <code class="literal">[.</code> and <code class="literal">.]</code> stands for the
 894     sequence of characters of that collating element.  The sequence is
 895     treated as a single element of the bracket expression's list.  This
 896     allows a bracket
 897     expression containing a multiple-character collating element to
 898     match more than one character, e.g., if the collating sequence
 899     includes a <code class="literal">ch</code> collating element, then the RE
 900     <code class="literal">[[.ch.]]*c</code> matches the first five characters of
 901     <code class="literal">chchcc</code>.
 902    </p><div class="note"><h3 class="title">Note</h3><p>
 903      <span class="productname">PostgreSQL</span> currently does not support multi-character collating
 904      elements. This information describes possible future behavior.
 905     </p></div><p>
 906     Within a bracket expression, a collating element enclosed in
 907     <code class="literal">[=</code> and <code class="literal">=]</code> is an <em class="firstterm">equivalence
 908     class</em>, standing for the sequences of characters of all collating
 909     elements equivalent to that one, including itself.  (If there are
 910     no other equivalent collating elements, the treatment is as if the
 911     enclosing delimiters were <code class="literal">[.</code> and
 912     <code class="literal">.]</code>.)  For example, if <code class="literal">o</code> and
 913     <code class="literal">^</code> are the members of an equivalence class, then
 914     <code class="literal">[[=o=]]</code>, <code class="literal">[[=^=]]</code>, and
 915     <code class="literal">[o^]</code> are all synonymous.  An equivalence class
 916     cannot be an endpoint of a range.
 917    </p><p>
 918     Within a bracket expression, the name of a character class
 919     enclosed in <code class="literal">[:</code> and <code class="literal">:]</code> stands
 920     for the list of all characters belonging to that class.  A character
 921     class cannot be used as an endpoint of a range.
 922     The <acronym class="acronym">POSIX</acronym> standard defines these character class
 923     names:
 924     <code class="literal">alnum</code> (letters and numeric digits),
 925     <code class="literal">alpha</code> (letters),
 926     <code class="literal">blank</code> (space and tab),
 927     <code class="literal">cntrl</code> (control characters),
 928     <code class="literal">digit</code> (numeric digits),
 929     <code class="literal">graph</code> (printable characters except space),
 930     <code class="literal">lower</code> (lower-case letters),
 931     <code class="literal">print</code> (printable characters including space),
 932     <code class="literal">punct</code> (punctuation),
 933     <code class="literal">space</code> (any white space),
 934     <code class="literal">upper</code> (upper-case letters),
 935     and <code class="literal">xdigit</code> (hexadecimal digits).
 936     The behavior of these standard character classes is generally
 937     consistent across platforms for characters in the 7-bit ASCII set.
 938     Whether a given non-ASCII character is considered to belong to one
 939     of these classes depends on the <em class="firstterm">collation</em>
 940     that is used for the regular-expression function or operator
 941     (see <a class="xref" href="collation.html" title="23.2. Collation Support">Section 23.2</a>), or by default on the
 942     database's <code class="envar">LC_CTYPE</code> locale setting (see
 943     <a class="xref" href="locale.html" title="23.1. Locale Support">Section 23.1</a>).  The classification of non-ASCII
 944     characters can vary across platforms even in similarly-named
 945     locales.  (But the <code class="literal">C</code> locale never considers any
 946     non-ASCII characters to belong to any of these classes.)
 947     In addition to these standard character
 948     classes, <span class="productname">PostgreSQL</span> defines
 949     the <code class="literal">word</code> character class, which is the same as
 950     <code class="literal">alnum</code> plus the underscore (<code class="literal">_</code>)
 951     character, and
 952     the <code class="literal">ascii</code> character class, which contains exactly
 953     the 7-bit ASCII set.
 954    </p><p>
 955     There are two special cases of bracket expressions:  the bracket
 956     expressions <code class="literal">[[:&lt;:]]</code> and
 957     <code class="literal">[[:&gt;:]]</code> are constraints,
 958     matching empty strings at the beginning
 959     and end of a word respectively.  A word is defined as a sequence
 960     of word characters that is neither preceded nor followed by word
 961     characters.  A word character is any character belonging to the
 962     <code class="literal">word</code> character class, that is, any letter, digit,
 963     or underscore.  This is an extension, compatible with but not
 964     specified by <acronym class="acronym">POSIX</acronym> 1003.2, and should be used with
 965     caution in software intended to be portable to other systems.
 966     The constraint escapes described below are usually preferable; they
 967     are no more standard, but are easier to type.
 968    </p></div><div class="sect3" id="POSIX-ESCAPE-SEQUENCES"><div class="titlepage"><div><div><h4 class="title">9.7.3.3. Regular Expression Escapes <a href="#POSIX-ESCAPE-SEQUENCES" class="id_link">#</a></h4></div></div></div><p>
 969     <em class="firstterm">Escapes</em> are special sequences beginning with <code class="literal">\</code>
 970     followed by an alphanumeric character. Escapes come in several varieties:
 971     character entry, class shorthands, constraint escapes, and back references.
 972     A <code class="literal">\</code> followed by an alphanumeric character but not constituting
 973     a valid escape is illegal in AREs.
 974     In EREs, there are no escapes: outside a bracket expression,
 975     a <code class="literal">\</code> followed by an alphanumeric character merely stands for
 976     that character as an ordinary character, and inside a bracket expression,
 977     <code class="literal">\</code> is an ordinary character.
 978     (The latter is the one actual incompatibility between EREs and AREs.)
 979    </p><p>
 980     <em class="firstterm">Character-entry escapes</em> exist to make it easier to specify
 981     non-printing and other inconvenient characters in REs.  They are
 982     shown in <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>.
 983    </p><p>
 984     <em class="firstterm">Class-shorthand escapes</em> provide shorthands for certain
 985     commonly-used character classes.  They are
 986     shown in <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>.
 987    </p><p>
 988     A <em class="firstterm">constraint escape</em> is a constraint,
 989     matching the empty string if specific conditions are met,
 990     written as an escape.  They are
 991     shown in <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a>.
 992    </p><p>
 993     A <em class="firstterm">back reference</em> (<code class="literal">\</code><em class="replaceable"><code>n</code></em>) matches the
 994     same string matched by the previous parenthesized subexpression specified
 995     by the number <em class="replaceable"><code>n</code></em>
 996     (see <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-BACKREF-TABLE" title="Table 9.23. Regular Expression Back References">Table 9.23</a>).  For example,
 997     <code class="literal">([bc])\1</code> matches <code class="literal">bb</code> or <code class="literal">cc</code>
 998     but not <code class="literal">bc</code> or <code class="literal">cb</code>.
 999     The subexpression must entirely precede the back reference in the RE.
1000     Subexpressions are numbered in the order of their leading parentheses.
1001     Non-capturing parentheses do not define subexpressions.
1002     The back reference considers only the string characters matched by the
1003     referenced subexpression, not any constraints contained in it.  For
1004     example, <code class="literal">(^\d)\1</code> will match <code class="literal">22</code>.
1005    </p><div class="table" id="POSIX-CHARACTER-ENTRY-ESCAPES-TABLE"><p class="title"><strong>Table 9.20. Regular Expression Character-Entry Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Character-Entry Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\a</code> </td><td> alert (bell) character, as in C </td></tr><tr><td> <code class="literal">\b</code> </td><td> backspace, as in C </td></tr><tr><td> <code class="literal">\B</code> </td><td> synonym for backslash (<code class="literal">\</code>) to help reduce the need for backslash
1006        doubling </td></tr><tr><td> <code class="literal">\c</code><em class="replaceable"><code>X</code></em> </td><td> (where <em class="replaceable"><code>X</code></em> is any character) the character whose
1007        low-order 5 bits are the same as those of
1008        <em class="replaceable"><code>X</code></em>, and whose other bits are all zero </td></tr><tr><td> <code class="literal">\e</code> </td><td> the character whose collating-sequence name
1009        is <code class="literal">ESC</code>,
1010        or failing that, the character with octal value <code class="literal">033</code> </td></tr><tr><td> <code class="literal">\f</code> </td><td> form feed, as in C </td></tr><tr><td> <code class="literal">\n</code> </td><td> newline, as in C </td></tr><tr><td> <code class="literal">\r</code> </td><td> carriage return, as in C </td></tr><tr><td> <code class="literal">\t</code> </td><td> horizontal tab, as in C </td></tr><tr><td> <code class="literal">\u</code><em class="replaceable"><code>wxyz</code></em> </td><td> (where <em class="replaceable"><code>wxyz</code></em> is exactly four hexadecimal digits)
1011        the character whose hexadecimal value is
1012        <code class="literal">0x</code><em class="replaceable"><code>wxyz</code></em>
1013        </td></tr><tr><td> <code class="literal">\U</code><em class="replaceable"><code>stuvwxyz</code></em> </td><td> (where <em class="replaceable"><code>stuvwxyz</code></em> is exactly eight hexadecimal
1014        digits)
1015        the character whose hexadecimal value is
1016        <code class="literal">0x</code><em class="replaceable"><code>stuvwxyz</code></em>
1017        </td></tr><tr><td> <code class="literal">\v</code> </td><td> vertical tab, as in C </td></tr><tr><td> <code class="literal">\x</code><em class="replaceable"><code>hhh</code></em> </td><td> (where <em class="replaceable"><code>hhh</code></em> is any sequence of hexadecimal
1018        digits)
1019        the character whose hexadecimal value is
1020        <code class="literal">0x</code><em class="replaceable"><code>hhh</code></em>
1021        (a single character no matter how many hexadecimal digits are used)
1022        </td></tr><tr><td> <code class="literal">\0</code> </td><td> the character whose value is <code class="literal">0</code> (the null byte)</td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>xy</code></em> </td><td> (where <em class="replaceable"><code>xy</code></em> is exactly two octal digits,
1023        and is not a <em class="firstterm">back reference</em>)
1024        the character whose octal value is
1025        <code class="literal">0</code><em class="replaceable"><code>xy</code></em> </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>xyz</code></em> </td><td> (where <em class="replaceable"><code>xyz</code></em> is exactly three octal digits,
1026        and is not a <em class="firstterm">back reference</em>)
1027        the character whose octal value is
1028        <code class="literal">0</code><em class="replaceable"><code>xyz</code></em> </td></tr></tbody></table></div></div><br class="table-break" /><p>
1029     Hexadecimal digits are <code class="literal">0</code>-<code class="literal">9</code>,
1030     <code class="literal">a</code>-<code class="literal">f</code>, and <code class="literal">A</code>-<code class="literal">F</code>.
1031     Octal digits are <code class="literal">0</code>-<code class="literal">7</code>.
1032    </p><p>
1033     Numeric character-entry escapes specifying values outside the ASCII range
1034     (0–127) have meanings dependent on the database encoding.  When the
1035     encoding is UTF-8, escape values are equivalent to Unicode code points,
1036     for example <code class="literal">\u1234</code> means the character <code class="literal">U+1234</code>.
1037     For other multibyte encodings, character-entry escapes usually just
1038     specify the concatenation of the byte values for the character.  If the
1039     escape value does not correspond to any legal character in the database
1040     encoding, no error will be raised, but it will never match any data.
1041    </p><p>
1042     The character-entry escapes are always taken as ordinary characters.
1043     For example, <code class="literal">\135</code> is <code class="literal">]</code> in ASCII, but
1044     <code class="literal">\135</code> does not terminate a bracket expression.
1045    </p><div class="table" id="POSIX-CLASS-SHORTHAND-ESCAPES-TABLE"><p class="title"><strong>Table 9.21. Regular Expression Class-Shorthand Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Class-Shorthand Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\d</code> </td><td> matches any digit, like
1046         <code class="literal">[[:digit:]]</code> </td></tr><tr><td> <code class="literal">\s</code> </td><td> matches any whitespace character, like
1047         <code class="literal">[[:space:]]</code> </td></tr><tr><td> <code class="literal">\w</code> </td><td> matches any word character, like
1048         <code class="literal">[[:word:]]</code> </td></tr><tr><td> <code class="literal">\D</code> </td><td> matches any non-digit, like
1049         <code class="literal">[^[:digit:]]</code> </td></tr><tr><td> <code class="literal">\S</code> </td><td> matches any non-whitespace character, like
1050         <code class="literal">[^[:space:]]</code> </td></tr><tr><td> <code class="literal">\W</code> </td><td> matches any non-word character, like
1051         <code class="literal">[^[:word:]]</code> </td></tr></tbody></table></div></div><br class="table-break" /><p>
1052     The class-shorthand escapes also work within bracket expressions,
1053     although the definitions shown above are not quite syntactically
1054     valid in that context.
1055     For example, <code class="literal">[a-c\d]</code> is equivalent to
1056     <code class="literal">[a-c[:digit:]]</code>.
1057    </p><div class="table" id="POSIX-CONSTRAINT-ESCAPES-TABLE"><p class="title"><strong>Table 9.22. Regular Expression Constraint Escapes</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Constraint Escapes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\A</code> </td><td> matches only at the beginning of the string
1058        (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for how this differs from
1059        <code class="literal">^</code>) </td></tr><tr><td> <code class="literal">\m</code> </td><td> matches only at the beginning of a word </td></tr><tr><td> <code class="literal">\M</code> </td><td> matches only at the end of a word </td></tr><tr><td> <code class="literal">\y</code> </td><td> matches only at the beginning or end of a word </td></tr><tr><td> <code class="literal">\Y</code> </td><td> matches only at a point that is not the beginning or end of a
1060        word </td></tr><tr><td> <code class="literal">\Z</code> </td><td> matches only at the end of the string
1061        (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a> for how this differs from
1062        <code class="literal">$</code>) </td></tr></tbody></table></div></div><br class="table-break" /><p>
1063     A word is defined as in the specification of
1064     <code class="literal">[[:&lt;:]]</code> and <code class="literal">[[:&gt;:]]</code> above.
1065     Constraint escapes are illegal within bracket expressions.
1066    </p><div class="table" id="POSIX-CONSTRAINT-BACKREF-TABLE"><p class="title"><strong>Table 9.23. Regular Expression Back References</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Back References" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Escape</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">\</code><em class="replaceable"><code>m</code></em> </td><td> (where <em class="replaceable"><code>m</code></em> is a nonzero digit)
1067        a back reference to the <em class="replaceable"><code>m</code></em>'th subexpression </td></tr><tr><td> <code class="literal">\</code><em class="replaceable"><code>mnn</code></em> </td><td> (where <em class="replaceable"><code>m</code></em> is a nonzero digit, and
1068        <em class="replaceable"><code>nn</code></em> is some more digits, and the decimal value
1069        <em class="replaceable"><code>mnn</code></em> is not greater than the number of closing capturing
1070        parentheses seen so far)
1071        a back reference to the <em class="replaceable"><code>mnn</code></em>'th subexpression </td></tr></tbody></table></div></div><br class="table-break" /><div class="note"><h3 class="title">Note</h3><p>
1072      There is an inherent ambiguity between octal character-entry
1073      escapes and back references, which is resolved by the following heuristics,
1074      as hinted at above.
1075      A leading zero always indicates an octal escape.
1076      A single non-zero digit, not followed by another digit,
1077      is always taken as a back reference.
1078      A multi-digit sequence not starting with a zero is taken as a back
1079      reference if it comes after a suitable subexpression
1080      (i.e., the number is in the legal range for a back reference),
1081      and otherwise is taken as octal.
1082     </p></div></div><div class="sect3" id="POSIX-METASYNTAX"><div class="titlepage"><div><div><h4 class="title">9.7.3.4. Regular Expression Metasyntax <a href="#POSIX-METASYNTAX" class="id_link">#</a></h4></div></div></div><p>
1083     In addition to the main syntax described above, there are some special
1084     forms and miscellaneous syntactic facilities available.
1085    </p><p>
1086     An RE can begin with one of two special <em class="firstterm">director</em> prefixes.
1087     If an RE begins with <code class="literal">***:</code>,
1088     the rest of the RE is taken as an ARE.  (This normally has no effect in
1089     <span class="productname">PostgreSQL</span>, since REs are assumed to be AREs;
1090     but it does have an effect if ERE or BRE mode had been specified by
1091     the <em class="replaceable"><code>flags</code></em> parameter to a regex function.)
1092     If an RE begins with <code class="literal">***=</code>,
1093     the rest of the RE is taken to be a literal string,
1094     with all characters considered ordinary characters.
1095    </p><p>
1096     An ARE can begin with <em class="firstterm">embedded options</em>:
1097     a sequence <code class="literal">(?</code><em class="replaceable"><code>xyz</code></em><code class="literal">)</code>
1098     (where <em class="replaceable"><code>xyz</code></em> is one or more alphabetic characters)
1099     specifies options affecting the rest of the RE.
1100     These options override any previously determined options —
1101     in particular, they can override the case-sensitivity behavior implied by
1102     a regex operator, or the <em class="replaceable"><code>flags</code></em> parameter to a regex
1103     function.
1104     The available option letters are
1105     shown in <a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>.
1106     Note that these same option letters are used in the <em class="replaceable"><code>flags</code></em>
1107     parameters of regex functions.
1108    </p><div class="table" id="POSIX-EMBEDDED-OPTIONS-TABLE"><p class="title"><strong>Table 9.24. ARE Embedded-Option Letters</strong></p><div class="table-contents"><table class="table" summary="ARE Embedded-Option Letters" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Option</th><th>Description</th></tr></thead><tbody><tr><td> <code class="literal">b</code> </td><td> rest of RE is a BRE </td></tr><tr><td> <code class="literal">c</code> </td><td> case-sensitive matching (overrides operator type) </td></tr><tr><td> <code class="literal">e</code> </td><td> rest of RE is an ERE </td></tr><tr><td> <code class="literal">i</code> </td><td> case-insensitive matching (see
1109        <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) (overrides operator type) </td></tr><tr><td> <code class="literal">m</code> </td><td> historical synonym for <code class="literal">n</code> </td></tr><tr><td> <code class="literal">n</code> </td><td> newline-sensitive matching (see
1110        <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">p</code> </td><td> partial newline-sensitive matching (see
1111        <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">q</code> </td><td> rest of RE is a literal (<span class="quote">“<span class="quote">quoted</span>”</span>) string, all ordinary
1112        characters </td></tr><tr><td> <code class="literal">s</code> </td><td> non-newline-sensitive matching (default) </td></tr><tr><td> <code class="literal">t</code> </td><td> tight syntax (default; see below) </td></tr><tr><td> <code class="literal">w</code> </td><td> inverse partial newline-sensitive (<span class="quote">“<span class="quote">weird</span>”</span>) matching
1113        (see <a class="xref" href="functions-matching.html#POSIX-MATCHING-RULES" title="9.7.3.5. Regular Expression Matching Rules">Section 9.7.3.5</a>) </td></tr><tr><td> <code class="literal">x</code> </td><td> expanded syntax (see below) </td></tr></tbody></table></div></div><br class="table-break" /><p>
1114     Embedded options take effect at the <code class="literal">)</code> terminating the sequence.
1115     They can appear only at the start of an ARE (after the
1116     <code class="literal">***:</code> director if any).
1117    </p><p>
1118     In addition to the usual (<em class="firstterm">tight</em>) RE syntax, in which all
1119     characters are significant, there is an <em class="firstterm">expanded</em> syntax,
1120     available by specifying the embedded <code class="literal">x</code> option.
1121     In the expanded syntax,
1122     white-space characters in the RE are ignored, as are
1123     all characters between a <code class="literal">#</code>
1124     and the following newline (or the end of the RE).  This
1125     permits paragraphing and commenting a complex RE.
1126     There are three exceptions to that basic rule:
1127
1128     </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1129        a white-space character or <code class="literal">#</code> preceded by <code class="literal">\</code> is
1130        retained
1131       </p></li><li class="listitem"><p>
1132        white space or <code class="literal">#</code> within a bracket expression is retained
1133       </p></li><li class="listitem"><p>
1134        white space and comments cannot appear within multi-character symbols,
1135        such as <code class="literal">(?:</code>
1136       </p></li></ul></div><p>
1137
1138     For this purpose, white-space characters are blank, tab, newline, and
1139     any character that belongs to the <em class="replaceable"><code>space</code></em> character class.
1140    </p><p>
1141     Finally, in an ARE, outside bracket expressions, the sequence
1142     <code class="literal">(?#</code><em class="replaceable"><code>ttt</code></em><code class="literal">)</code>
1143     (where <em class="replaceable"><code>ttt</code></em> is any text not containing a <code class="literal">)</code>)
1144     is a comment, completely ignored.
1145     Again, this is not allowed between the characters of
1146     multi-character symbols, like <code class="literal">(?:</code>.
1147     Such comments are more a historical artifact than a useful facility,
1148     and their use is deprecated; use the expanded syntax instead.
1149    </p><p>
1150     <span class="emphasis"><em>None</em></span> of these metasyntax extensions is available if
1151     an initial <code class="literal">***=</code> director
1152     has specified that the user's input be treated as a literal string
1153     rather than as an RE.
1154    </p></div><div class="sect3" id="POSIX-MATCHING-RULES"><div class="titlepage"><div><div><h4 class="title">9.7.3.5. Regular Expression Matching Rules <a href="#POSIX-MATCHING-RULES" class="id_link">#</a></h4></div></div></div><p>
1155     In the event that an RE could match more than one substring of a given
1156     string, the RE matches the one starting earliest in the string.
1157     If the RE could match more than one substring starting at that point,
1158     either the longest possible match or the shortest possible match will
1159     be taken, depending on whether the RE is <em class="firstterm">greedy</em> or
1160     <em class="firstterm">non-greedy</em>.
1161    </p><p>
1162     Whether an RE is greedy or not is determined by the following rules:
1163     </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1164        Most atoms, and all constraints, have no greediness attribute (because
1165        they cannot match variable amounts of text anyway).
1166       </p></li><li class="listitem"><p>
1167        Adding parentheses around an RE does not change its greediness.
1168       </p></li><li class="listitem"><p>
1169        A quantified atom with a fixed-repetition quantifier
1170        (<code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}</code>
1171        or
1172        <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">}?</code>)
1173        has the same greediness (possibly none) as the atom itself.
1174       </p></li><li class="listitem"><p>
1175        A quantified atom with other normal quantifiers (including
1176        <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}</code>
1177        with <em class="replaceable"><code>m</code></em> equal to <em class="replaceable"><code>n</code></em>)
1178        is greedy (prefers longest match).
1179       </p></li><li class="listitem"><p>
1180        A quantified atom with a non-greedy quantifier (including
1181        <code class="literal">{</code><em class="replaceable"><code>m</code></em><code class="literal">,</code><em class="replaceable"><code>n</code></em><code class="literal">}?</code>
1182        with <em class="replaceable"><code>m</code></em> equal to <em class="replaceable"><code>n</code></em>)
1183        is non-greedy (prefers shortest match).
1184       </p></li><li class="listitem"><p>
1185        A branch — that is, an RE that has no top-level
1186        <code class="literal">|</code> operator — has the same greediness as the first
1187        quantified atom in it that has a greediness attribute.
1188       </p></li><li class="listitem"><p>
1189        An RE consisting of two or more branches connected by the
1190        <code class="literal">|</code> operator is always greedy.
1191       </p></li></ul></div><p>
1192    </p><p>
1193     The above rules associate greediness attributes not only with individual
1194     quantified atoms, but with branches and entire REs that contain quantified
1195     atoms.  What that means is that the matching is done in such a way that
1196     the branch, or whole RE, matches the longest or shortest possible
1197     substring <span class="emphasis"><em>as a whole</em></span>.  Once the length of the entire match
1198     is determined, the part of it that matches any particular subexpression
1199     is determined on the basis of the greediness attribute of that
1200     subexpression, with subexpressions starting earlier in the RE taking
1201     priority over ones starting later.
1202    </p><p>
1203     An example of what this means:
1204 </p><pre class="screen">
1205 SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
1206 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">123</code>
1207 SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
1208 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">1</code>
1209 </pre><p>
1210     In the first case, the RE as a whole is greedy because <code class="literal">Y*</code>
1211     is greedy.  It can match beginning at the <code class="literal">Y</code>, and it matches
1212     the longest possible string starting there, i.e., <code class="literal">Y123</code>.
1213     The output is the parenthesized part of that, or <code class="literal">123</code>.
1214     In the second case, the RE as a whole is non-greedy because <code class="literal">Y*?</code>
1215     is non-greedy.  It can match beginning at the <code class="literal">Y</code>, and it matches
1216     the shortest possible string starting there, i.e., <code class="literal">Y1</code>.
1217     The subexpression <code class="literal">[0-9]{1,3}</code> is greedy but it cannot change
1218     the decision as to the overall match length; so it is forced to match
1219     just <code class="literal">1</code>.
1220    </p><p>
1221     In short, when an RE contains both greedy and non-greedy subexpressions,
1222     the total match length is either as long as possible or as short as
1223     possible, according to the attribute assigned to the whole RE.  The
1224     attributes assigned to the subexpressions only affect how much of that
1225     match they are allowed to <span class="quote">“<span class="quote">eat</span>”</span> relative to each other.
1226    </p><p>
1227     The quantifiers <code class="literal">{1,1}</code> and <code class="literal">{1,1}?</code>
1228     can be used to force greediness or non-greediness, respectively,
1229     on a subexpression or a whole RE.
1230     This is useful when you need the whole RE to have a greediness attribute
1231     different from what's deduced from its elements.  As an example,
1232     suppose that we are trying to separate a string containing some digits
1233     into the digits and the parts before and after them.  We might try to
1234     do that like this:
1235 </p><pre class="screen">
1236 SELECT regexp_match('abc01234xyz', '(.*)(\d+)(.*)');
1237 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc0123,4,xyz}</code>
1238 </pre><p>
1239     That didn't work: the first <code class="literal">.*</code> is greedy so
1240     it <span class="quote">“<span class="quote">eats</span>”</span> as much as it can, leaving the <code class="literal">\d+</code> to
1241     match at the last possible place, the last digit.  We might try to fix
1242     that by making it non-greedy:
1243 </p><pre class="screen">
1244 SELECT regexp_match('abc01234xyz', '(.*?)(\d+)(.*)');
1245 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc,0,""}</code>
1246 </pre><p>
1247     That didn't work either, because now the RE as a whole is non-greedy
1248     and so it ends the overall match as soon as possible.  We can get what
1249     we want by forcing the RE as a whole to be greedy:
1250 </p><pre class="screen">
1251 SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
1252 <em class="lineannotation"><span class="lineannotation">Result: </span></em><code class="computeroutput">{abc,01234,xyz}</code>
1253 </pre><p>
1254     Controlling the RE's overall greediness separately from its components'
1255     greediness allows great flexibility in handling variable-length patterns.
1256    </p><p>
1257     When deciding what is a longer or shorter match,
1258     match lengths are measured in characters, not collating elements.
1259     An empty string is considered longer than no match at all.
1260     For example:
1261     <code class="literal">bb*</code>
1262     matches the three middle characters of <code class="literal">abbbc</code>;
1263     <code class="literal">(week|wee)(night|knights)</code>
1264     matches all ten characters of <code class="literal">weeknights</code>;
1265     when <code class="literal">(.*).*</code>
1266     is matched against <code class="literal">abc</code> the parenthesized subexpression
1267     matches all three characters; and when
1268     <code class="literal">(a*)*</code> is matched against <code class="literal">bc</code>
1269     both the whole RE and the parenthesized
1270     subexpression match an empty string.
1271    </p><p>
1272     If case-independent matching is specified,
1273     the effect is much as if all case distinctions had vanished from the
1274     alphabet.
1275     When an alphabetic that exists in multiple cases appears as an
1276     ordinary character outside a bracket expression, it is effectively
1277     transformed into a bracket expression containing both cases,
1278     e.g., <code class="literal">x</code> becomes <code class="literal">[xX]</code>.
1279     When it appears inside a bracket expression, all case counterparts
1280     of it are added to the bracket expression, e.g.,
1281     <code class="literal">[x]</code> becomes <code class="literal">[xX]</code>
1282     and <code class="literal">[^x]</code> becomes <code class="literal">[^xX]</code>.
1283    </p><p>
1284     If newline-sensitive matching is specified, <code class="literal">.</code>
1285     and bracket expressions using <code class="literal">^</code>
1286     will never match the newline character
1287     (so that matches will not cross lines unless the RE
1288     explicitly includes a newline)
1289     and <code class="literal">^</code> and <code class="literal">$</code>
1290     will match the empty string after and before a newline
1291     respectively, in addition to matching at beginning and end of string
1292     respectively.
1293     But the ARE escapes <code class="literal">\A</code> and <code class="literal">\Z</code>
1294     continue to match beginning or end of string <span class="emphasis"><em>only</em></span>.
1295     Also, the character class shorthands <code class="literal">\D</code>
1296     and <code class="literal">\W</code> will match a newline regardless of this mode.
1297     (Before <span class="productname">PostgreSQL</span> 14, they did not match
1298     newlines when in newline-sensitive mode.
1299     Write <code class="literal">[^[:digit:]]</code>
1300     or <code class="literal">[^[:word:]]</code> to get the old behavior.)
1301    </p><p>
1302     If partial newline-sensitive matching is specified,
1303     this affects <code class="literal">.</code> and bracket expressions
1304     as with newline-sensitive matching, but not <code class="literal">^</code>
1305     and <code class="literal">$</code>.
1306    </p><p>
1307     If inverse partial newline-sensitive matching is specified,
1308     this affects <code class="literal">^</code> and <code class="literal">$</code>
1309     as with newline-sensitive matching, but not <code class="literal">.</code>
1310     and bracket expressions.
1311     This isn't very useful but is provided for symmetry.
1312    </p></div><div class="sect3" id="POSIX-LIMITS-COMPATIBILITY"><div class="titlepage"><div><div><h4 class="title">9.7.3.6. Limits and Compatibility <a href="#POSIX-LIMITS-COMPATIBILITY" class="id_link">#</a></h4></div></div></div><p>
1313     No particular limit is imposed on the length of REs in this
1314     implementation.  However,
1315     programs intended to be highly portable should not employ REs longer
1316     than 256 bytes,
1317     as a POSIX-compliant implementation can refuse to accept such REs.
1318    </p><p>
1319     The only feature of AREs that is actually incompatible with
1320     POSIX EREs is that <code class="literal">\</code> does not lose its special
1321     significance inside bracket expressions.
1322     All other ARE features use syntax which is illegal or has
1323     undefined or unspecified effects in POSIX EREs;
1324     the <code class="literal">***</code> syntax of directors likewise is outside the POSIX
1325     syntax for both BREs and EREs.
1326    </p><p>
1327     Many of the ARE extensions are borrowed from Perl, but some have
1328     been changed to clean them up, and a few Perl extensions are not present.
1329     Incompatibilities of note include <code class="literal">\b</code>, <code class="literal">\B</code>,
1330     the lack of special treatment for a trailing newline,
1331     the addition of complemented bracket expressions to the things
1332     affected by newline-sensitive matching,
1333     the restrictions on parentheses and back references in lookahead/lookbehind
1334     constraints, and the longest/shortest-match (rather than first-match)
1335     matching semantics.
1336    </p></div><div class="sect3" id="POSIX-BASIC-REGEXES"><div class="titlepage"><div><div><h4 class="title">9.7.3.7. Basic Regular Expressions <a href="#POSIX-BASIC-REGEXES" class="id_link">#</a></h4></div></div></div><p>
1337     BREs differ from EREs in several respects.
1338     In BREs, <code class="literal">|</code>, <code class="literal">+</code>, and <code class="literal">?</code>
1339     are ordinary characters and there is no equivalent
1340     for their functionality.
1341     The delimiters for bounds are
1342     <code class="literal">\{</code> and <code class="literal">\}</code>,
1343     with <code class="literal">{</code> and <code class="literal">}</code>
1344     by themselves ordinary characters.
1345     The parentheses for nested subexpressions are
1346     <code class="literal">\(</code> and <code class="literal">\)</code>,
1347     with <code class="literal">(</code> and <code class="literal">)</code> by themselves ordinary characters.
1348     <code class="literal">^</code> is an ordinary character except at the beginning of the
1349     RE or the beginning of a parenthesized subexpression,
1350     <code class="literal">$</code> is an ordinary character except at the end of the
1351     RE or the end of a parenthesized subexpression,
1352     and <code class="literal">*</code> is an ordinary character if it appears at the beginning
1353     of the RE or the beginning of a parenthesized subexpression
1354     (after a possible leading <code class="literal">^</code>).
1355     Finally, single-digit back references are available, and
1356     <code class="literal">\&lt;</code> and <code class="literal">\&gt;</code>
1357     are synonyms for
1358     <code class="literal">[[:&lt;:]]</code> and <code class="literal">[[:&gt;:]]</code>
1359     respectively; no other escapes are available in BREs.
1360    </p></div><div class="sect3" id="POSIX-VS-XQUERY"><div class="titlepage"><div><div><h4 class="title">9.7.3.8. Differences from SQL Standard and XQuery <a href="#POSIX-VS-XQUERY" class="id_link">#</a></h4></div></div></div><a id="id-1.5.8.13.9.48.2" class="indexterm"></a><a id="id-1.5.8.13.9.48.3" class="indexterm"></a><a id="id-1.5.8.13.9.48.4" class="indexterm"></a><a id="id-1.5.8.13.9.48.5" class="indexterm"></a><a id="id-1.5.8.13.9.48.6" class="indexterm"></a><a id="id-1.5.8.13.9.48.7" class="indexterm"></a><p>
1361      Since SQL:2008, the SQL standard includes regular expression operators
1362      and functions that performs pattern
1363      matching according to the XQuery regular expression
1364      standard:
1365      </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><code class="literal">LIKE_REGEX</code></p></li><li class="listitem"><p><code class="literal">OCCURRENCES_REGEX</code></p></li><li class="listitem"><p><code class="literal">POSITION_REGEX</code></p></li><li class="listitem"><p><code class="literal">SUBSTRING_REGEX</code></p></li><li class="listitem"><p><code class="literal">TRANSLATE_REGEX</code></p></li></ul></div><p>
1366      <span class="productname">PostgreSQL</span> does not currently implement these
1367      operators and functions.  You can get approximately equivalent
1368      functionality in each case as shown in <a class="xref" href="functions-matching.html#FUNCTIONS-REGEXP-SQL-TABLE" title="Table 9.25. Regular Expression Functions Equivalencies">Table 9.25</a>.  (Various optional clauses on
1369      both sides have been omitted in this table.)
1370     </p><div class="table" id="FUNCTIONS-REGEXP-SQL-TABLE"><p class="title"><strong>Table 9.25. Regular Expression Functions Equivalencies</strong></p><div class="table-contents"><table class="table" summary="Regular Expression Functions Equivalencies" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>SQL standard</th><th><span class="productname">PostgreSQL</span></th></tr></thead><tbody><tr><td><code class="literal"><em class="replaceable"><code>string</code></em> LIKE_REGEX <em class="replaceable"><code>pattern</code></em></code></td><td><code class="literal">regexp_like(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code> or <code class="literal"><em class="replaceable"><code>string</code></em> ~ <em class="replaceable"><code>pattern</code></em></code></td></tr><tr><td><code class="literal">OCCURRENCES_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_count(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">POSITION_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_instr(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">SUBSTRING_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em>)</code></td><td><code class="literal">regexp_substr(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>)</code></td></tr><tr><td><code class="literal">TRANSLATE_REGEX(<em class="replaceable"><code>pattern</code></em> IN <em class="replaceable"><code>string</code></em> WITH <em class="replaceable"><code>replacement</code></em>)</code></td><td><code class="literal">regexp_replace(<em class="replaceable"><code>string</code></em>, <em class="replaceable"><code>pattern</code></em>, <em class="replaceable"><code>replacement</code></em>)</code></td></tr></tbody></table></div></div><br class="table-break" /><p>
1371      Regular expression functions similar to those provided by PostgreSQL are
1372      also available in a number of other SQL implementations, whereas the
1373      SQL-standard functions are not as widely implemented.  Some of the
1374      details of the regular expression syntax will likely differ in each
1375      implementation.
1376     </p><p>
1377      The SQL-standard operators and functions use XQuery regular expressions,
1378      which are quite close to the ARE syntax described above.
1379      Notable differences between the existing POSIX-based
1380      regular-expression feature and XQuery regular expressions include:
1381
1382      </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
1383         XQuery character class subtraction is not supported.  An example of
1384         this feature is using the following to match only English
1385         consonants: <code class="literal">[a-z-[aeiou]]</code>.
1386        </p></li><li class="listitem"><p>
1387         XQuery character class shorthands <code class="literal">\c</code>,
1388         <code class="literal">\C</code>, <code class="literal">\i</code>,
1389         and <code class="literal">\I</code> are not supported.
1390        </p></li><li class="listitem"><p>
1391         XQuery character class elements
1392         using <code class="literal">\p{UnicodeProperty}</code> or the
1393         inverse <code class="literal">\P{UnicodeProperty}</code> are not supported.
1394        </p></li><li class="listitem"><p>
1395         POSIX interprets character classes such as <code class="literal">\w</code>
1396         (see <a class="xref" href="functions-matching.html#POSIX-CLASS-SHORTHAND-ESCAPES-TABLE" title="Table 9.21. Regular Expression Class-Shorthand Escapes">Table 9.21</a>)
1397         according to the prevailing locale (which you can control by
1398         attaching a <code class="literal">COLLATE</code> clause to the operator or
1399         function).  XQuery specifies these classes by reference to Unicode
1400         character properties, so equivalent behavior is obtained only with
1401         a locale that follows the Unicode rules.
1402        </p></li><li class="listitem"><p>
1403         The SQL standard (not XQuery itself) attempts to cater for more
1404         variants of <span class="quote">“<span class="quote">newline</span>”</span> than POSIX does.  The
1405         newline-sensitive matching options described above consider only
1406         ASCII NL (<code class="literal">\n</code>) to be a newline, but SQL would have
1407         us treat CR (<code class="literal">\r</code>), CRLF (<code class="literal">\r\n</code>)
1408         (a Windows-style newline), and some Unicode-only characters like
1409         LINE SEPARATOR (U+2028) as newlines as well.
1410         Notably, <code class="literal">.</code> and <code class="literal">\s</code> should
1411         count <code class="literal">\r\n</code> as one character not two according to
1412         SQL.
1413        </p></li><li class="listitem"><p>
1414         Of the character-entry escapes described in
1415         <a class="xref" href="functions-matching.html#POSIX-CHARACTER-ENTRY-ESCAPES-TABLE" title="Table 9.20. Regular Expression Character-Entry Escapes">Table 9.20</a>,
1416         XQuery supports only <code class="literal">\n</code>, <code class="literal">\r</code>,
1417         and <code class="literal">\t</code>.
1418        </p></li><li class="listitem"><p>
1419         XQuery does not support
1420         the <code class="literal">[:<em class="replaceable"><code>name</code></em>:]</code> syntax
1421         for character classes within bracket expressions.
1422        </p></li><li class="listitem"><p>
1423         XQuery does not have lookahead or lookbehind constraints,
1424         nor any of the constraint escapes described in
1425         <a class="xref" href="functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE" title="Table 9.22. Regular Expression Constraint Escapes">Table 9.22</a>.
1426        </p></li><li class="listitem"><p>
1427         The metasyntax forms described in <a class="xref" href="functions-matching.html#POSIX-METASYNTAX" title="9.7.3.4. Regular Expression Metasyntax">Section 9.7.3.4</a>
1428         do not exist in XQuery.
1429        </p></li><li class="listitem"><p>
1430         The regular expression flag letters defined by XQuery are
1431         related to but not the same as the option letters for POSIX
1432         (<a class="xref" href="functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE" title="Table 9.24. ARE Embedded-Option Letters">Table 9.24</a>).  While the
1433         <code class="literal">i</code> and <code class="literal">q</code> options behave the
1434         same, others do not:
1435         </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p>
1436            XQuery's <code class="literal">s</code> (allow dot to match newline)
1437            and <code class="literal">m</code> (allow <code class="literal">^</code>
1438            and <code class="literal">$</code> to match at newlines) flags provide
1439            access to the same behaviors as
1440            POSIX's <code class="literal">n</code>, <code class="literal">p</code>
1441            and <code class="literal">w</code> flags, but they
1442            do <span class="emphasis"><em>not</em></span> match the behavior of
1443            POSIX's <code class="literal">s</code> and <code class="literal">m</code> flags.
1444            Note in particular that dot-matches-newline is the default
1445            behavior in POSIX but not XQuery.
1446           </p></li><li class="listitem"><p>
1447            XQuery's <code class="literal">x</code> (ignore whitespace in pattern) flag
1448            is noticeably different from POSIX's expanded-mode flag.
1449            POSIX's <code class="literal">x</code> flag also
1450            allows <code class="literal">#</code> to begin a comment in the pattern,
1451            and POSIX will not ignore a whitespace character after a
1452            backslash.
1453           </p></li></ul></div><p>
1454        </p></li></ul></div><p>
1455     </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="functions-bitstring.html" title="9.6. Bit String Functions and Operators">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="functions.html" title="Chapter 9. Functions and Operators">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="functions-formatting.html" title="9.8. Data Type Formatting Functions">Next</a></td></tr><tr><td width="40%" align="left" valign="top">9.6. Bit String Functions and Operators </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 9.8. Data Type Formatting Functions</td></tr></table></div></body></html>