1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>23.2. Collation Support</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="locale.html" title="23.1. Locale Support" /><link rel="next" href="multibyte.html" title="23.3. Character Set Support" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">23.2. Collation Support</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="locale.html" title="23.1. Locale Support">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="charset.html" title="Chapter 23. Localization">Up</a></td><th width="60%" align="center">Chapter 23. Localization</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="multibyte.html" title="23.3. Character Set Support">Next</a></td></tr></table><hr /></div><div class="sect1" id="COLLATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">23.2. Collation Support <a href="#COLLATION" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="collation.html#COLLATION-CONCEPTS">23.2.1. Concepts</a></span></dt><dt><span class="sect2"><a href="collation.html#COLLATION-MANAGING">23.2.2. Managing Collations</a></span></dt><dt><span class="sect2"><a href="collation.html#ICU-CUSTOM-COLLATIONS">23.2.3. ICU Custom Collations</a></span></dt></dl></div><a id="id-1.6.10.4.2" class="indexterm"></a><p>
3 The collation feature allows specifying the sort order and character
4 classification behavior of data per-column, or even per-operation.
5 This alleviates the restriction that the
6 <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code> settings
7 of a database cannot be changed after its creation.
8 </p><div class="sect2" id="COLLATION-CONCEPTS"><div class="titlepage"><div><div><h3 class="title">23.2.1. Concepts <a href="#COLLATION-CONCEPTS" class="id_link">#</a></h3></div></div></div><p>
9 Conceptually, every expression of a collatable data type has a
10 collation. (The built-in collatable data types are
11 <code class="type">text</code>, <code class="type">varchar</code>, and <code class="type">char</code>.
12 User-defined base types can also be marked collatable, and of course
13 a <a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN"><em class="glossterm"><a class="glossterm" href="glossary.html#GLOSSARY-DOMAIN" title="Domain">domain</a></em></a> over a
14 collatable data type is collatable.) If the
15 expression is a column reference, the collation of the expression is the
16 defined collation of the column. If the expression is a constant, the
17 collation is the default collation of the data type of the
18 constant. The collation of a more complex expression is derived
19 from the collations of its inputs, as described below.
21 The collation of an expression can be the <span class="quote">“<span class="quote">default</span>”</span>
22 collation, which means the locale settings defined for the
23 database. It is also possible for an expression's collation to be
24 indeterminate. In such cases, ordering operations and other
25 operations that need to know the collation will fail.
27 When the database system has to perform an ordering or a character
28 classification, it uses the collation of the input expression. This
29 happens, for example, with <code class="literal">ORDER BY</code> clauses
30 and function or operator calls such as <code class="literal"><</code>.
31 The collation to apply for an <code class="literal">ORDER BY</code> clause
32 is simply the collation of the sort key. The collation to apply for a
33 function or operator call is derived from the arguments, as described
34 below. In addition to comparison operators, collations are taken into
35 account by functions that convert between lower and upper case
36 letters, such as <code class="function">lower</code>, <code class="function">upper</code>, and
37 <code class="function">initcap</code>; by pattern matching operators; and by
38 <code class="function">to_char</code> and related functions.
40 For a function or operator call, the collation that is derived by
41 examining the argument collations is used at run time for performing
42 the specified operation. If the result of the function or operator
43 call is of a collatable data type, the collation is also used at parse
44 time as the defined collation of the function or operator expression,
45 in case there is a surrounding expression that requires knowledge of
48 The <em class="firstterm">collation derivation</em> of an expression can be
49 implicit or explicit. This distinction affects how collations are
50 combined when multiple different collations appear in an
51 expression. An explicit collation derivation occurs when a
52 <code class="literal">COLLATE</code> clause is used; all other collation
53 derivations are implicit. When multiple collations need to be
54 combined, for example in a function call, the following rules are
57 </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
58 If any input expression has an explicit collation derivation, then
59 all explicitly derived collations among the input expressions must be
60 the same, otherwise an error is raised. If any explicitly
61 derived collation is present, that is the result of the
62 collation combination.
63 </p></li><li class="listitem"><p>
64 Otherwise, all input expressions must have the same implicit
65 collation derivation or the default collation. If any non-default
66 collation is present, that is the result of the collation combination.
67 Otherwise, the result is the default collation.
68 </p></li><li class="listitem"><p>
69 If there are conflicting non-default implicit collations among the
70 input expressions, then the combination is deemed to have indeterminate
71 collation. This is not an error condition unless the particular
72 function being invoked requires knowledge of the collation it should
73 apply. If it does, an error will be raised at run-time.
74 </p></li></ol></div><p>
76 For example, consider this table definition:
77 </p><pre class="programlisting">
79 a text COLLATE "de_DE",
80 b text COLLATE "es_ES",
86 </p><pre class="programlisting">
87 SELECT a < 'foo' FROM test1;
89 the <code class="literal"><</code> comparison is performed according to
90 <code class="literal">de_DE</code> rules, because the expression combines an
91 implicitly derived collation with the default collation. But in
92 </p><pre class="programlisting">
93 SELECT a < ('foo' COLLATE "fr_FR") FROM test1;
95 the comparison is performed using <code class="literal">fr_FR</code> rules,
96 because the explicit collation derivation overrides the implicit one.
98 </p><pre class="programlisting">
99 SELECT a < b FROM test1;
101 the parser cannot determine which collation to apply, since the
102 <code class="structfield">a</code> and <code class="structfield">b</code> columns have conflicting
103 implicit collations. Since the <code class="literal"><</code> operator
104 does need to know which collation to use, this will result in an
105 error. The error can be resolved by attaching an explicit collation
106 specifier to either input expression, thus:
107 </p><pre class="programlisting">
108 SELECT a < b COLLATE "de_DE" FROM test1;
111 </p><pre class="programlisting">
112 SELECT a COLLATE "de_DE" < b FROM test1;
114 On the other hand, the structurally similar case
115 </p><pre class="programlisting">
116 SELECT a || b FROM test1;
118 does not result in an error, because the <code class="literal">||</code> operator
119 does not care about collations: its result is the same regardless
122 The collation assigned to a function or operator's combined input
123 expressions is also considered to apply to the function or operator's
124 result, if the function or operator delivers a result of a collatable
126 </p><pre class="programlisting">
127 SELECT * FROM test1 ORDER BY a || 'foo';
129 the ordering will be done according to <code class="literal">de_DE</code> rules.
131 </p><pre class="programlisting">
132 SELECT * FROM test1 ORDER BY a || b;
134 results in an error, because even though the <code class="literal">||</code> operator
135 doesn't need to know a collation, the <code class="literal">ORDER BY</code> clause does.
136 As before, the conflict can be resolved with an explicit collation
138 </p><pre class="programlisting">
139 SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
141 </p></div><div class="sect2" id="COLLATION-MANAGING"><div class="titlepage"><div><div><h3 class="title">23.2.2. Managing Collations <a href="#COLLATION-MANAGING" class="id_link">#</a></h3></div></div></div><p>
142 A collation is an SQL schema object that maps an SQL name to locales
143 provided by libraries installed in the operating system. A collation
144 definition has a <em class="firstterm">provider</em> that specifies which
145 library supplies the locale data. One standard provider name
146 is <code class="literal">libc</code>, which uses the locales provided by the
147 operating system C library. These are the locales used by most tools
148 provided by the operating system. Another provider
149 is <code class="literal">icu</code>, which uses the external
150 ICU<a id="id-1.6.10.4.5.2.4" class="indexterm"></a> library. ICU locales can only be
151 used if support for ICU was configured when PostgreSQL was built.
153 A collation object provided by <code class="literal">libc</code> maps to a
154 combination of <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>
155 settings, as accepted by the <code class="literal">setlocale()</code> system library call. (As
156 the name would suggest, the main purpose of a collation is to set
157 <code class="symbol">LC_COLLATE</code>, which controls the sort order. But
158 it is rarely necessary in practice to have an
159 <code class="symbol">LC_CTYPE</code> setting that is different from
160 <code class="symbol">LC_COLLATE</code>, so it is more convenient to collect
161 these under one concept than to create another infrastructure for
162 setting <code class="symbol">LC_CTYPE</code> per expression.) Also,
163 a <code class="literal">libc</code> collation
164 is tied to a character set encoding (see <a class="xref" href="multibyte.html" title="23.3. Character Set Support">Section 23.3</a>).
165 The same collation name may exist for different encodings.
167 A collation object provided by <code class="literal">icu</code> maps to a named
168 collator provided by the ICU library. ICU does not support
169 separate <span class="quote">“<span class="quote">collate</span>”</span> and <span class="quote">“<span class="quote">ctype</span>”</span> settings, so
170 they are always the same. Also, ICU collations are independent of the
171 encoding, so there is always only one ICU collation of a given name in
173 </p><div class="sect3" id="COLLATION-MANAGING-STANDARD"><div class="titlepage"><div><div><h4 class="title">23.2.2.1. Standard Collations <a href="#COLLATION-MANAGING-STANDARD" class="id_link">#</a></h4></div></div></div><p>
174 On all platforms, the following collations are supported:
176 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">unicode</code></span></dt><dd><p>
177 This SQL standard collation sorts using the Unicode Collation
178 Algorithm with the Default Unicode Collation Element Table. It is
179 available in all encodings. ICU support is required to use this
180 collation, and behavior may change if <span class="productname">PostgreSQL</span> is built with a
181 different version of ICU. (This collation has the same behavior as
182 the ICU root locale; see <a class="xref" href="collation.html#COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</a>.)
183 </p></dd><dt><span class="term"><code class="literal">ucs_basic</code></span></dt><dd><p>
184 This SQL standard collation sorts using the Unicode code point values
185 rather than natural language order, and only the ASCII letters
186 <span class="quote">“<span class="quote"><code class="literal">A</code></span>”</span> through
187 <span class="quote">“<span class="quote"><code class="literal">Z</code></span>”</span> are treated as letters. The
188 behavior is efficient and stable across all versions. Only available
189 for encoding <code class="literal">UTF8</code>. (This collation has the same
190 behavior as the libc locale specification <code class="literal">C</code> in
191 <code class="literal">UTF8</code> encoding.)
192 </p></dd><dt><span class="term"><code class="literal">pg_unicode_fast</code></span></dt><dd><p>
193 This collation sorts by Unicode code point values rather than natural
194 language order. For the functions <code class="function">lower</code>,
195 <code class="function">initcap</code>, and <code class="function">upper</code> it uses
196 Unicode full case mapping. For pattern matching (including regular
197 expressions), it uses the Standard variant of Unicode <a class="ulink" href="https://www.unicode.org/reports/tr18/#Compatibility_Properties" target="_top">Compatibility
198 Properties</a>. Behavior is efficient and stable within a
199 <span class="productname">Postgres</span> major version. It is only
200 available for encoding <code class="literal">UTF8</code>.
201 </p></dd><dt><span class="term"><code class="literal">pg_c_utf8</code></span></dt><dd><p>
202 This collation sorts by Unicode code point values rather than natural
203 language order. For the functions <code class="function">lower</code>,
204 <code class="function">initcap</code>, and <code class="function">upper</code>, it uses
205 Unicode simple case mapping. For pattern matching (including regular
206 expressions), it uses the POSIX Compatible variant of Unicode <a class="ulink" href="https://www.unicode.org/reports/tr18/#Compatibility_Properties" target="_top">Compatibility
207 Properties</a>. Behavior is efficient and stable within a
208 <span class="productname">PostgreSQL</span> major version. This collation is
209 only available for encoding <code class="literal">UTF8</code>.
210 </p></dd><dt><span class="term"><code class="literal">C</code> (equivalent to <code class="literal">POSIX</code>)</span></dt><dd><p>
211 The <code class="literal">C</code> and <code class="literal">POSIX</code> collations are
212 based on <span class="quote">“<span class="quote">traditional C</span>”</span> behavior. They sort by byte
213 values rather than natural language order, and only the ASCII letters
214 <span class="quote">“<span class="quote"><code class="literal">A</code></span>”</span> through
215 <span class="quote">“<span class="quote"><code class="literal">Z</code></span>”</span> are treated as letters. The
216 behavior is efficient and stable across all versions for a given
217 database encoding, but behavior may vary between different database
219 </p></dd><dt><span class="term"><code class="literal">default</code></span></dt><dd><p>
220 The <code class="literal">default</code> collation selects the locale specified
221 at database creation time.
222 </p></dd></dl></div><p>
224 Additional collations may be available depending on operating system
225 support. The efficiency and stability of these additional collations
226 depend on the collation provider, the provider version, and the locale.
227 </p></div><div class="sect3" id="COLLATION-MANAGING-PREDEFINED"><div class="titlepage"><div><div><h4 class="title">23.2.2.2. Predefined Collations <a href="#COLLATION-MANAGING-PREDEFINED" class="id_link">#</a></h4></div></div></div><p>
228 If the operating system provides support for using multiple locales
229 within a single program (<code class="function">newlocale</code> and related functions),
230 or if support for ICU is configured,
231 then when a database cluster is initialized, <code class="command">initdb</code>
232 populates the system catalog <code class="literal">pg_collation</code> with
233 collations based on all the locales it finds in the operating
236 To inspect the currently available locales, use the query <code class="literal">SELECT
237 * FROM pg_collation</code>, or the command <code class="command">\dOS+</code>
238 in <span class="application">psql</span>.
239 </p><div class="sect4" id="COLLATION-MANAGING-PREDEFINED-LIBC"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.1. libc Collations <a href="#COLLATION-MANAGING-PREDEFINED-LIBC" class="id_link">#</a></h5></div></div></div><p>
240 For example, the operating system might
241 provide a locale named <code class="literal">de_DE.utf8</code>.
242 <code class="command">initdb</code> would then create a collation named
243 <code class="literal">de_DE.utf8</code> for encoding <code class="literal">UTF8</code>
244 that has both <code class="symbol">LC_COLLATE</code> and
245 <code class="symbol">LC_CTYPE</code> set to <code class="literal">de_DE.utf8</code>.
246 It will also create a collation with the <code class="literal">.utf8</code>
247 tag stripped off the name. So you could also use the collation
248 under the name <code class="literal">de_DE</code>, which is less cumbersome
249 to write and makes the name less encoding-dependent. Note that,
250 nevertheless, the initial set of collation names is
253 The default set of collations provided by <code class="literal">libc</code> map
254 directly to the locales installed in the operating system, which can be
255 listed using the command <code class="literal">locale -a</code>. In case
256 a <code class="literal">libc</code> collation is needed that has different values
257 for <code class="symbol">LC_COLLATE</code> and <code class="symbol">LC_CTYPE</code>, or if new
258 locales are installed in the operating system after the database system
259 was initialized, then a new collation may be created using
260 the <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> command.
261 New operating system locales can also be imported en masse using
262 the <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.104. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a> function.
264 Within any particular database, only collations that use that
265 database's encoding are of interest. Other entries in
266 <code class="literal">pg_collation</code> are ignored. Thus, a stripped collation
267 name such as <code class="literal">de_DE</code> can be considered unique
268 within a given database even though it would not be unique globally.
269 Use of the stripped collation names is recommended, since it will
270 make one fewer thing you need to change if you decide to change to
271 another database encoding. Note however that the <code class="literal">default</code>,
272 <code class="literal">C</code>, and <code class="literal">POSIX</code> collations can be used regardless of
273 the database encoding.
275 <span class="productname">PostgreSQL</span> considers distinct collation
276 objects to be incompatible even when they have identical properties.
278 </p><pre class="programlisting">
279 SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
281 will draw an error even though the <code class="literal">C</code> and <code class="literal">POSIX</code>
282 collations have identical behaviors. Mixing stripped and non-stripped
283 collation names is therefore not recommended.
284 </p></div><div class="sect4" id="COLLATION-MANAGING-PREDEFINED-ICU"><div class="titlepage"><div><div><h5 class="title">23.2.2.2.2. ICU Collations <a href="#COLLATION-MANAGING-PREDEFINED-ICU" class="id_link">#</a></h5></div></div></div><p>
285 With ICU, it is not sensible to enumerate all possible locale names. ICU
286 uses a particular naming system for locales, but there are many more ways
287 to name a locale than there are actually distinct locales.
288 <code class="command">initdb</code> uses the ICU APIs to extract a set of distinct
289 locales to populate the initial set of collations. Collations provided by
290 ICU are created in the SQL environment with names in BCP 47 language tag
291 format, with a <span class="quote">“<span class="quote">private use</span>”</span>
292 extension <code class="literal">-x-icu</code> appended, to distinguish them from
295 Here are some example collations that might be created:
297 </p><div class="variablelist"><dl class="variablelist"><dt id="COLLATION-MANAGING-PREDEFINED-ICU-DE-X-ICU"><span class="term"><code class="literal">de-x-icu</code></span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-DE-X-ICU" class="id_link">#</a></dt><dd><p>German collation, default variant</p></dd><dt id="COLLATION-MANAGING-PREDEFINED-ICU-DE-AT-X-ICU"><span class="term"><code class="literal">de-AT-x-icu</code></span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-DE-AT-X-ICU" class="id_link">#</a></dt><dd><p>German collation for Austria, default variant</p><p>
298 (There are also, say, <code class="literal">de-DE-x-icu</code>
299 or <code class="literal">de-CH-x-icu</code>, but as of this writing, they are
300 equivalent to <code class="literal">de-x-icu</code>.)
301 </p></dd><dt id="COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU"><span class="term"><code class="literal">und-x-icu</code> (for <span class="quote">“<span class="quote">undefined</span>”</span>)</span> <a href="#COLLATION-MANAGING-PREDEFINED-ICU-UND-X-ICU" class="id_link">#</a></dt><dd><p>
302 ICU <span class="quote">“<span class="quote">root</span>”</span> collation. Use this to get a reasonable
303 language-agnostic sort order.
304 </p></dd></dl></div><p>
306 Some (less frequently used) encodings are not supported by ICU. When the
307 database encoding is one of these, ICU collation entries
308 in <code class="literal">pg_collation</code> are ignored. Attempting to use one
309 will draw an error along the lines of <span class="quote">“<span class="quote">collation "de-x-icu" for
310 encoding "WIN874" does not exist</span>”</span>.
311 </p></div></div><div class="sect3" id="COLLATION-CREATE"><div class="titlepage"><div><div><h4 class="title">23.2.2.3. Creating New Collation Objects <a href="#COLLATION-CREATE" class="id_link">#</a></h4></div></div></div><p>
312 If the standard and predefined collations are not sufficient, users can
313 create their own collation objects using the SQL
314 command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a>.
316 The standard and predefined collations are in the
317 schema <code class="literal">pg_catalog</code>, like all predefined objects.
318 User-defined collations should be created in user schemas. This also
319 ensures that they are saved by <code class="command">pg_dump</code>.
320 </p><div class="sect4" id="COLLATION-MANAGING-CREATE-LIBC"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.1. libc Collations <a href="#COLLATION-MANAGING-CREATE-LIBC" class="id_link">#</a></h5></div></div></div><p>
321 New libc collations can be created like this:
322 </p><pre class="programlisting">
323 CREATE COLLATION german (provider = libc, locale = 'de_DE');
325 The exact values that are acceptable for the <code class="literal">locale</code>
326 clause in this command depend on the operating system. On Unix-like
327 systems, the command <code class="literal">locale -a</code> will show a list.
329 Since the predefined libc collations already include all collations
330 defined in the operating system when the database instance is
331 initialized, it is not often necessary to manually create new ones.
332 Reasons might be if a different naming system is desired (in which case
333 see also <a class="xref" href="collation.html#COLLATION-COPY" title="23.2.2.3.3. Copying Collations">Section 23.2.2.3.3</a>) or if the operating system has
334 been upgraded to provide new locale definitions (in which case see
335 also <a class="link" href="functions-admin.html#FUNCTIONS-ADMIN-COLLATION" title="Table 9.104. Collation Management Functions"><code class="function">pg_import_system_collations()</code></a>).
336 </p></div><div class="sect4" id="COLLATION-MANAGING-CREATE-ICU"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.2. ICU Collations <a href="#COLLATION-MANAGING-CREATE-ICU" class="id_link">#</a></h5></div></div></div><p>
337 ICU collations can be created like:
339 </p><pre class="programlisting">
340 CREATE COLLATION german (provider = icu, locale = 'de-DE');
343 ICU locales are specified as a BCP 47 <a class="link" href="locale.html#ICU-LANGUAGE-TAG" title="23.1.5.3. Language Tag">Language Tag</a>, but can also accept most
344 libc-style locale names. If possible, libc-style locale names are
345 transformed into language tags.
347 New ICU collations can customize collation behavior extensively by
348 including collation attributes in the language tag. See <a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="23.2.3. ICU Custom Collations">Section 23.2.3</a> for details and examples.
349 </p></div><div class="sect4" id="COLLATION-COPY"><div class="titlepage"><div><div><h5 class="title">23.2.2.3.3. Copying Collations <a href="#COLLATION-COPY" class="id_link">#</a></h5></div></div></div><p>
350 The command <a class="xref" href="sql-createcollation.html" title="CREATE COLLATION"><span class="refentrytitle">CREATE COLLATION</span></a> can also be used to
351 create a new collation from an existing collation, which can be useful to
352 be able to use operating-system-independent collation names in
353 applications, create compatibility names, or use an ICU-provided collation
354 under a more readable name. For example:
355 </p><pre class="programlisting">
356 CREATE COLLATION german FROM "de_DE";
357 CREATE COLLATION french FROM "fr-x-icu";
359 </p></div></div><div class="sect3" id="COLLATION-NONDETERMINISTIC"><div class="titlepage"><div><div><h4 class="title">23.2.2.4. Nondeterministic Collations <a href="#COLLATION-NONDETERMINISTIC" class="id_link">#</a></h4></div></div></div><p>
360 A collation is either <em class="firstterm">deterministic</em> or
361 <em class="firstterm">nondeterministic</em>. A deterministic collation uses
362 deterministic comparisons, which means that it considers strings to be
363 equal only if they consist of the same byte sequence. Nondeterministic
364 comparison may determine strings to be equal even if they consist of
365 different bytes. Typical situations include case-insensitive comparison,
366 accent-insensitive comparison, as well as comparison of strings in
367 different Unicode normal forms. It is up to the collation provider to
368 actually implement such insensitive comparisons; the deterministic flag
369 only determines whether ties are to be broken using bytewise comparison.
370 See also <a class="ulink" href="https://www.unicode.org/reports/tr10" target="_top">Unicode Technical
371 Standard 10</a> for more information on the terminology.
373 To create a nondeterministic collation, specify the property
374 <code class="literal">deterministic = false</code> to <code class="command">CREATE
375 COLLATION</code>, for example:
376 </p><pre class="programlisting">
377 CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
379 This example would use the standard Unicode collation in a
380 nondeterministic way. In particular, this would allow strings in
381 different normal forms to be compared correctly. More interesting
382 examples make use of the ICU customization facilities explained above.
384 </p><pre class="programlisting">
385 CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
386 CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
389 All standard and predefined collations are deterministic, all
390 user-defined collations are deterministic by default. While
391 nondeterministic collations give a more <span class="quote">“<span class="quote">correct</span>”</span> behavior,
392 especially when considering the full power of Unicode and its many
393 special cases, they also have some drawbacks. Foremost, their use leads
394 to a performance penalty. Note, in particular, that B-tree cannot use
395 deduplication with indexes that use a nondeterministic collation. Also,
396 certain operations are not possible with nondeterministic collations,
397 such as some pattern matching operations. Therefore, they should be used
398 only in cases where they are specifically wanted.
399 </p><div class="tip"><h3 class="title">Tip</h3><p>
400 To deal with text in different Unicode normalization forms, it is also
401 an option to use the functions/expressions
402 <code class="function">normalize</code> and <code class="literal">is normalized</code> to
403 preprocess or check the strings, instead of using nondeterministic
404 collations. There are different trade-offs for each approach.
405 </p></div></div></div><div class="sect2" id="ICU-CUSTOM-COLLATIONS"><div class="titlepage"><div><div><h3 class="title">23.2.3. ICU Custom Collations <a href="#ICU-CUSTOM-COLLATIONS" class="id_link">#</a></h3></div></div></div><p>
406 ICU allows extensive control over collation behavior by defining new
407 collations with collation settings as a part of the language tag. These
408 settings can modify the collation order to suit a variety of needs. For
411 </p><pre class="programlisting">
412 -- ignore differences in accents and case
413 CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
414 SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
415 SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
417 -- upper case letters sort before lower case.
418 CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
419 SELECT 'B' < 'b' COLLATE upper_first; -- true
421 -- treat digits numerically and ignore punctuation
422 CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-kn');
423 SELECT 'id-45' < 'id-123' COLLATE num_ignore_punct; -- true
424 SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
427 Many of the available options are described in <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS" title="23.2.3.2. Collation Settings for an ICU Locale">Section 23.2.3.2</a>, or see <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="23.2.3.5. External References for ICU">Section 23.2.3.5</a> for more details.
428 </p><div class="sect3" id="ICU-COLLATION-COMPARISON-LEVELS"><div class="titlepage"><div><div><h4 class="title">23.2.3.1. ICU Comparison Levels <a href="#ICU-COLLATION-COMPARISON-LEVELS" class="id_link">#</a></h4></div></div></div><p>
429 Comparison of two strings (collation) in ICU is determined by a
430 multi-level process, where textual features are grouped into
431 "levels". Treatment of each level is controlled by the <a class="link" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 23.2. ICU Collation Settings">collation settings</a>. Higher
432 levels correspond to finer textual features.
434 <a class="xref" href="collation.html#ICU-COLLATION-LEVELS" title="Table 23.1. ICU Collation Levels">Table 23.1</a> shows which textual feature
435 differences are considered significant when determining equality at the
436 given level. The Unicode character <code class="literal">U+2063</code> is an
437 invisible separator, and as seen in the table, is ignored for at all
438 levels of comparison less than <code class="literal">identic</code>.
439 </p><div class="table" id="ICU-COLLATION-LEVELS"><p class="title"><strong>Table 23.1. ICU Collation Levels</strong></p><div class="table-contents"><table class="table" summary="ICU Collation Levels" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /><col class="col4" /><col class="col5" /><col class="col6" /><col class="col7" /><col class="col8" /></colgroup><thead><tr><th>Level</th><th>Description</th><th><code class="literal">'f' = 'f'</code></th><th><code class="literal">'ab' = U&'a\2063b'</code></th><th><code class="literal">'x-y' = 'x_y'</code></th><th><code class="literal">'g' = 'G'</code></th><th><code class="literal">'n' = 'ñ'</code></th><th><code class="literal">'y' = 'z'</code></th></tr></thead><tbody><tr><td>level1</td><td>Base Character</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td></tr><tr><td>level2</td><td>Accents</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>level3</td><td>Case/Variants</td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>level4</td><td>Punctuation<a href="#ftn.id-1.6.10.4.6.3.4.2.10.4.2.1" class="footnote"><sup class="footnote" id="id-1.6.10.4.6.3.4.2.10.4.2.1">[a]</sup></a></td><td><code class="literal">true</code></td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr><tr><td>identic</td><td>All</td><td><code class="literal">true</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td><td><code class="literal">false</code></td></tr></tbody><tbody class="footnotes"><tr><td colspan="8"><div id="ftn.id-1.6.10.4.6.3.4.2.10.4.2.1" class="footnote"><p><a href="#id-1.6.10.4.6.3.4.2.10.4.2.1" class="para"><sup class="para">[a] </sup></a>only with
440 <code class="literal">ka-shifted</code>; see <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 23.2. ICU Collation Settings">Table 23.2</a></p></div></td></tr></tbody></table></div></div><br class="table-break" /><p>
441 At every level, even with full normalization off, basic normalization is
442 performed. For example, <code class="literal">'á'</code> may be composed of the
443 code points <code class="literal">U&'\0061\0301'</code> or the single code
444 point <code class="literal">U&'\00E1'</code>, and those sequences will be
445 considered equal even at the <code class="literal">identic</code> level. To treat
446 any difference in code point representation as distinct, use a collation
447 created with <code class="symbol">deterministic</code> set to
448 <code class="literal">true</code>.
449 </p><div class="sect4" id="ICU-COLLATION-LEVEL-EXAMPLES"><div class="titlepage"><div><div><h5 class="title">23.2.3.1.1. Collation Level Examples <a href="#ICU-COLLATION-LEVEL-EXAMPLES" class="id_link">#</a></h5></div></div></div><pre class="programlisting">
450 CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
451 CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level4');
452 CREATE COLLATION identic (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-identic');
454 -- invisible separator ignored at all levels except identic
455 SELECT 'ab' = U&'a\2063b' COLLATE level4; -- true
456 SELECT 'ab' = U&'a\2063b' COLLATE identic; -- false
458 -- punctuation ignored at level3 but not at level 4
459 SELECT 'x-y' = 'x_y' COLLATE level3; -- true
460 SELECT 'x-y' = 'x_y' COLLATE level4; -- false
461 </pre></div></div><div class="sect3" id="ICU-COLLATION-SETTINGS"><div class="titlepage"><div><div><h4 class="title">23.2.3.2. Collation Settings for an ICU Locale <a href="#ICU-COLLATION-SETTINGS" class="id_link">#</a></h4></div></div></div><p>
462 <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 23.2. ICU Collation Settings">Table 23.2</a> shows the available
463 collation settings, which can be used as part of a language tag to
464 customize a collation.
465 </p><div class="table" id="ICU-COLLATION-SETTINGS-TABLE"><p class="title"><strong>Table 23.2. ICU Collation Settings</strong></p><div class="table-contents"><table class="table" summary="ICU Collation Settings" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /><col class="col4" /></colgroup><thead><tr><th>Key</th><th>Values</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code class="literal">co</code></td><td><code class="literal">emoji</code>, <code class="literal">phonebk</code>, <code class="literal">standard</code>, <em class="replaceable"><code>...</code></em></td><td><code class="literal">standard</code></td><td>
466 Collation type. See <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="23.2.3.5. External References for ICU">Section 23.2.3.5</a> for additional options and details.
467 </td></tr><tr><td><code class="literal">ka</code></td><td><code class="literal">noignore</code>, <code class="literal">shifted</code></td><td><code class="literal">noignore</code></td><td>
468 If set to <code class="literal">shifted</code>, causes some characters
469 (e.g. punctuation or space) to be ignored in comparison. Key
470 <code class="literal">ks</code> must be set to <code class="literal">level3</code> or
471 lower to take effect. Set key <code class="literal">kv</code> to control which
472 character classes are ignored.
473 </td></tr><tr><td><code class="literal">kb</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
474 Backwards comparison for the level 2 differences. For example,
475 locale <code class="literal">und-u-kb</code> sorts <code class="literal">'àe'</code>
476 before <code class="literal">'aé'</code>.
477 </td></tr><tr><td><code class="literal">kc</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
479 Separates case into a "level 2.5" that falls between accents and
480 other level 3 features.
483 If set to <code class="literal">true</code> and <code class="literal">ks</code> is set
484 to <code class="literal">level1</code>, will ignore accents but take case
487 </td></tr><tr><td><code class="literal">kf</code></td><td>
488 <code class="literal">upper</code>, <code class="literal">lower</code>,
489 <code class="literal">false</code>
490 </td><td><code class="literal">false</code></td><td>
491 If set to <code class="literal">upper</code>, upper case sorts before lower
492 case. If set to <code class="literal">lower</code>, lower case sorts before
493 upper case. If set to <code class="literal">false</code>, the sort depends on
494 the rules of the locale.
495 </td></tr><tr><td><code class="literal">kn</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
496 If set to <code class="literal">true</code>, numbers within a string are
497 treated as a single numeric value rather than a sequence of
498 digits. For example, <code class="literal">'id-45'</code> sorts before
499 <code class="literal">'id-123'</code>.
500 </td></tr><tr><td><code class="literal">kk</code></td><td><code class="literal">true</code>, <code class="literal">false</code></td><td><code class="literal">false</code></td><td>
502 Enable full normalization; may affect performance. Basic
503 normalization is performed even when set to
504 <code class="literal">false</code>. Locales for languages that require full
505 normalization typically enable it by default.
508 Full normalization is important in some cases, such as when
509 multiple accents are applied to a single character. For example,
510 the code point sequences <code class="literal">U&'\0065\0323\0302'</code>
511 and <code class="literal">U&'\0065\0302\0323'</code> represent
512 an <code class="literal">e</code> with circumflex and dot-below accents
513 applied in different orders. With full normalization
514 on, these code point sequences are treated as equal; otherwise they
517 </td></tr><tr><td><code class="literal">kr</code></td><td>
518 <code class="literal">space</code>, <code class="literal">punct</code>,
519 <code class="literal">symbol</code>, <code class="literal">currency</code>,
520 <code class="literal">digit</code>, <em class="replaceable"><code>script-id</code></em>
523 Set to one or more of the valid values, or any BCP 47
524 <em class="replaceable"><code>script-id</code></em>, e.g. <code class="literal">latn</code>
525 ("Latin") or <code class="literal">grek</code> ("Greek"). Multiple values are
526 separated by "<code class="literal">-</code>".
529 Redefines the ordering of classes of characters; those characters
530 belonging to a class earlier in the list sort before characters
531 belonging to a class later in the list. For instance, the value
532 <code class="literal">digit-currency-space</code> (as part of a language tag
533 like <code class="literal">und-u-kr-digit-currency-space</code>) sorts
534 punctuation before digits and spaces.
536 </td></tr><tr><td><code class="literal">ks</code></td><td><code class="literal">level1</code>, <code class="literal">level2</code>, <code class="literal">level3</code>, <code class="literal">level4</code>, <code class="literal">identic</code></td><td><code class="literal">level3</code></td><td>
537 Sensitivity (or "strength") when determining equality, with
538 <code class="literal">level1</code> the least sensitive to differences and
539 <code class="literal">identic</code> the most sensitive to differences. See
540 <a class="xref" href="collation.html#ICU-COLLATION-LEVELS" title="Table 23.1. ICU Collation Levels">Table 23.1</a> for details.
541 </td></tr><tr><td><code class="literal">kv</code></td><td>
542 <code class="literal">space</code>, <code class="literal">punct</code>,
543 <code class="literal">symbol</code>, <code class="literal">currency</code>
544 </td><td><code class="literal">punct</code></td><td>
545 Classes of characters ignored during comparison at level 3. Setting
546 to a later value includes earlier values;
547 e.g. <code class="literal">symbol</code> also includes
548 <code class="literal">punct</code> and <code class="literal">space</code> in the
549 characters to be ignored. Key <code class="literal">ka</code> must be set to
550 <code class="literal">shifted</code> and key <code class="literal">ks</code> must be set
551 to <code class="literal">level3</code> or lower to take effect.
552 </td></tr></tbody></table></div></div><br class="table-break" /><p>
553 Defaults may depend on locale. The above table is not meant to be
554 complete. See <a class="xref" href="collation.html#ICU-EXTERNAL-REFERENCES" title="23.2.3.5. External References for ICU">Section 23.2.3.5</a> for additional
556 </p><div class="note"><h3 class="title">Note</h3><p>
557 For many collation settings, you must create the collation with
558 <code class="option">deterministic</code> set to <code class="literal">false</code> for the
559 setting to have the desired effect (see <a class="xref" href="collation.html#COLLATION-NONDETERMINISTIC" title="23.2.2.4. Nondeterministic Collations">Section 23.2.2.4</a>). Additionally, some settings
560 only take effect when the key <code class="literal">ka</code> is set to
561 <code class="literal">shifted</code> (see <a class="xref" href="collation.html#ICU-COLLATION-SETTINGS-TABLE" title="Table 23.2. ICU Collation Settings">Table 23.2</a>).
562 </p></div></div><div class="sect3" id="ICU-LOCALE-EXAMPLES"><div class="titlepage"><div><div><h4 class="title">23.2.3.3. Collation Settings Examples <a href="#ICU-LOCALE-EXAMPLES" class="id_link">#</a></h4></div></div></div><div class="variablelist"><dl class="variablelist"><dt id="COLLATION-MANAGING-CREATE-ICU-DE-U-CO-PHONEBK-X-ICU"><span class="term"><code class="literal">CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-DE-U-CO-PHONEBK-X-ICU" class="id_link">#</a></dt><dd><p>German collation with phone book collation type</p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-UND-U-CO-EMOJI-X-ICU"><span class="term"><code class="literal">CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-UND-U-CO-EMOJI-X-ICU" class="id_link">#</a></dt><dd><p>
563 Root collation with Emoji collation type, per Unicode Technical Standard #51
564 </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KR-GREK-LATN"><span class="term"><code class="literal">CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KR-GREK-LATN" class="id_link">#</a></dt><dd><p>
565 Sort Greek letters before Latin ones. (The default is Latin before Greek.)
566 </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER"><span class="term"><code class="literal">CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER" class="id_link">#</a></dt><dd><p>
567 Sort upper-case letters before lower-case letters. (The default is
568 lower-case letters first.)
569 </p></dd><dt id="COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER-KR-GREK-LATN"><span class="term"><code class="literal">CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</code></span> <a href="#COLLATION-MANAGING-CREATE-ICU-EN-U-KF-UPPER-KR-GREK-LATN" class="id_link">#</a></dt><dd><p>
570 Combines both of the above options.
571 </p></dd></dl></div></div><div class="sect3" id="ICU-TAILORING-RULES"><div class="titlepage"><div><div><h4 class="title">23.2.3.4. ICU Tailoring Rules <a href="#ICU-TAILORING-RULES" class="id_link">#</a></h4></div></div></div><p>
572 If the options provided by the collation settings shown above are not
573 sufficient, the order of collation elements can be changed with tailoring
574 rules, whose syntax is detailed at <a class="ulink" href="https://unicode-org.github.io/icu/userguide/collation/customization/" target="_top">https://unicode-org.github.io/icu/userguide/collation/customization/</a>.
576 This small example creates a collation based on the root locale with a
578 </p><pre class="programlisting">
579 CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W');
581 With this rule, the letter <span class="quote">“<span class="quote">W</span>”</span> is sorted after
582 <span class="quote">“<span class="quote">V</span>”</span>, but is treated as a secondary difference similar to an
583 accent. Rules like this are contained in the locale definitions of some
584 languages. (Of course, if a locale definition already contains the
585 desired rules, then they don't need to be specified again explicitly.)
587 Here is a more complex example. The following statement sets up a
588 collation named <code class="literal">ebcdic</code> with rules to sort US-ASCII
589 characters in the order of the EBCDIC encoding.
591 </p><pre class="programlisting">
592 CREATE COLLATION ebcdic (provider = icu, locale = 'und',
594 & ' ' < '.' < '<' < '(' < '+' < \|
595 < '&' < '!' < '$' < '*' < ')' < ';'
596 < '-' < '/' < ',' < '%' < '_' < '>' < '?'
597 < '`' < ':' < '#' < '@' < \' < '=' < '"'
598 <*a-r < '~' <*s-z < '^' < '[' < ']'
599 < '{' <*A-I < '}' <*J-R < '\' <*S-Z <*0-9
603 FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c)
604 ORDER BY c COLLATE ebcdic;
616 </p></div><div class="sect3" id="ICU-EXTERNAL-REFERENCES"><div class="titlepage"><div><div><h4 class="title">23.2.3.5. External References for ICU <a href="#ICU-EXTERNAL-REFERENCES" class="id_link">#</a></h4></div></div></div><p>
617 This section (<a class="xref" href="collation.html#ICU-CUSTOM-COLLATIONS" title="23.2.3. ICU Custom Collations">Section 23.2.3</a>) is only a brief
618 overview of ICU behavior and language tags. Refer to the following
619 documents for technical details, additional options, and new behavior:
620 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
621 <a class="ulink" href="https://www.unicode.org/reports/tr35/tr35-collation.html" target="_top">Unicode Technical Standard #35</a>
622 </p></li><li class="listitem"><p>
623 <a class="ulink" href="https://www.rfc-editor.org/info/bcp47" target="_top">BCP 47</a>
624 </p></li><li class="listitem"><p>
625 <a class="ulink" href="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml" target="_top">CLDR repository</a>
626 </p></li><li class="listitem"><p>
627 <a class="ulink" href="https://unicode-org.github.io/icu/userguide/locale/" target="_top">https://unicode-org.github.io/icu/userguide/locale/</a>
628 </p></li><li class="listitem"><p>
629 <a class="ulink" href="https://unicode-org.github.io/icu/userguide/collation/" target="_top">https://unicode-org.github.io/icu/userguide/collation/</a>
630 </p></li></ul></div></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="locale.html" title="23.1. Locale Support">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="charset.html" title="Chapter 23. Localization">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="multibyte.html" title="23.3. Character Set Support">Next</a></td></tr><tr><td width="40%" align="left" valign="top">23.1. Locale Support </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 23.3. Character Set Support</td></tr></table></div></body></html>