begriffs open source - ai-pg/blob - full-docs/txt/collation.txt

   1
   2 23.2. Collation Support #
   3
   4    23.2.1. Concepts
   5    23.2.2. Managing Collations
   6    23.2.3. ICU Custom Collations
   7
   8    The collation feature allows specifying the sort order and character
   9    classification behavior of data per-column, or even per-operation. This
  10    alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of
  11    a database cannot be changed after its creation.
  12
  13 23.2.1. Concepts #
  14
  15    Conceptually, every expression of a collatable data type has a
  16    collation. (The built-in collatable data types are text, varchar, and
  17    char. User-defined base types can also be marked collatable, and of
  18    course a domain over a collatable data type is collatable.) If the
  19    expression is a column reference, the collation of the expression is
  20    the defined collation of the column. If the expression is a constant,
  21    the collation is the default collation of the data type of the
  22    constant. The collation of a more complex expression is derived from
  23    the collations of its inputs, as described below.
  24
  25    The collation of an expression can be the “default” collation, which
  26    means the locale settings defined for the database. It is also possible
  27    for an expression's collation to be indeterminate. In such cases,
  28    ordering operations and other operations that need to know the
  29    collation will fail.
  30
  31    When the database system has to perform an ordering or a character
  32    classification, it uses the collation of the input expression. This
  33    happens, for example, with ORDER BY clauses and function or operator
  34    calls such as <. The collation to apply for an ORDER BY clause is
  35    simply the collation of the sort key. The collation to apply for a
  36    function or operator call is derived from the arguments, as described
  37    below. In addition to comparison operators, collations are taken into
  38    account by functions that convert between lower and upper case letters,
  39    such as lower, upper, and initcap; by pattern matching operators; and
  40    by to_char and related functions.
  41
  42    For a function or operator call, the collation that is derived by
  43    examining the argument collations is used at run time for performing
  44    the specified operation. If the result of the function or operator call
  45    is of a collatable data type, the collation is also used at parse time
  46    as the defined collation of the function or operator expression, in
  47    case there is a surrounding expression that requires knowledge of its
  48    collation.
  49
  50    The collation derivation of an expression can be implicit or explicit.
  51    This distinction affects how collations are combined when multiple
  52    different collations appear in an expression. An explicit collation
  53    derivation occurs when a COLLATE clause is used; all other collation
  54    derivations are implicit. When multiple collations need to be combined,
  55    for example in a function call, the following rules are used:
  56     1. If any input expression has an explicit collation derivation, then
  57        all explicitly derived collations among the input expressions must
  58        be the same, otherwise an error is raised. If any explicitly
  59        derived collation is present, that is the result of the collation
  60        combination.
  61     2. Otherwise, all input expressions must have the same implicit
  62        collation derivation or the default collation. If any non-default
  63        collation is present, that is the result of the collation
  64        combination. Otherwise, the result is the default collation.
  65     3. If there are conflicting non-default implicit collations among the
  66        input expressions, then the combination is deemed to have
  67        indeterminate collation. This is not an error condition unless the
  68        particular function being invoked requires knowledge of the
  69        collation it should apply. If it does, an error will be raised at
  70        run-time.
  71
  72    For example, consider this table definition:
  73 CREATE TABLE test1 (
  74     a text COLLATE "de_DE",
  75     b text COLLATE "es_ES",
  76     ...
  77 );
  78
  79    Then in
  80 SELECT a < 'foo' FROM test1;
  81
  82    the < comparison is performed according to de_DE rules, because the
  83    expression combines an implicitly derived collation with the default
  84    collation. But in
  85 SELECT a < ('foo' COLLATE "fr_FR") FROM test1;
  86
  87    the comparison is performed using fr_FR rules, because the explicit
  88    collation derivation overrides the implicit one. Furthermore, given
  89 SELECT a < b FROM test1;
  90
  91    the parser cannot determine which collation to apply, since the a and b
  92    columns have conflicting implicit collations. Since the < operator does
  93    need to know which collation to use, this will result in an error. The
  94    error can be resolved by attaching an explicit collation specifier to
  95    either input expression, thus:
  96 SELECT a < b COLLATE "de_DE" FROM test1;
  97
  98    or equivalently
  99 SELECT a COLLATE "de_DE" < b FROM test1;
 100
 101    On the other hand, the structurally similar case
 102 SELECT a || b FROM test1;
 103
 104    does not result in an error, because the || operator does not care
 105    about collations: its result is the same regardless of the collation.
 106
 107    The collation assigned to a function or operator's combined input
 108    expressions is also considered to apply to the function or operator's
 109    result, if the function or operator delivers a result of a collatable
 110    data type. So, in
 111 SELECT * FROM test1 ORDER BY a || 'foo';
 112
 113    the ordering will be done according to de_DE rules. But this query:
 114 SELECT * FROM test1 ORDER BY a || b;
 115
 116    results in an error, because even though the || operator doesn't need
 117    to know a collation, the ORDER BY clause does. As before, the conflict
 118    can be resolved with an explicit collation specifier:
 119 SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 120
 121 23.2.2. Managing Collations #
 122
 123    A collation is an SQL schema object that maps an SQL name to locales
 124    provided by libraries installed in the operating system. A collation
 125    definition has a provider that specifies which library supplies the
 126    locale data. One standard provider name is libc, which uses the locales
 127    provided by the operating system C library. These are the locales used
 128    by most tools provided by the operating system. Another provider is
 129    icu, which uses the external ICU library. ICU locales can only be used
 130    if support for ICU was configured when PostgreSQL was built.
 131
 132    A collation object provided by libc maps to a combination of LC_COLLATE
 133    and LC_CTYPE settings, as accepted by the setlocale() system library
 134    call. (As the name would suggest, the main purpose of a collation is to
 135    set LC_COLLATE, which controls the sort order. But it is rarely
 136    necessary in practice to have an LC_CTYPE setting that is different
 137    from LC_COLLATE, so it is more convenient to collect these under one
 138    concept than to create another infrastructure for setting LC_CTYPE per
 139    expression.) Also, a libc collation is tied to a character set encoding
 140    (see Section 23.3). The same collation name may exist for different
 141    encodings.
 142
 143    A collation object provided by icu maps to a named collator provided by
 144    the ICU library. ICU does not support separate “collate” and “ctype”
 145    settings, so they are always the same. Also, ICU collations are
 146    independent of the encoding, so there is always only one ICU collation
 147    of a given name in a database.
 148
 149 23.2.2.1. Standard Collations #
 150
 151    On all platforms, the following collations are supported:
 152
 153    unicode
 154           This SQL standard collation sorts using the Unicode Collation
 155           Algorithm with the Default Unicode Collation Element Table. It
 156           is available in all encodings. ICU support is required to use
 157           this collation, and behavior may change if PostgreSQL is built
 158           with a different version of ICU. (This collation has the same
 159           behavior as the ICU root locale; see und-x-icu (for
 160           “undefined”).)
 161
 162    ucs_basic
 163           This SQL standard collation sorts using the Unicode code point
 164           values rather than natural language order, and only the ASCII
 165           letters “A” through “Z” are treated as letters. The behavior is
 166           efficient and stable across all versions. Only available for
 167           encoding UTF8. (This collation has the same behavior as the libc
 168           locale specification C in UTF8 encoding.)
 169
 170    pg_unicode_fast
 171           This collation sorts by Unicode code point values rather than
 172           natural language order. For the functions lower, initcap, and
 173           upper it uses Unicode full case mapping. For pattern matching
 174           (including regular expressions), it uses the Standard variant of
 175           Unicode Compatibility Properties. Behavior is efficient and
 176           stable within a Postgres major version. It is only available for
 177           encoding UTF8.
 178
 179    pg_c_utf8
 180           This collation sorts by Unicode code point values rather than
 181           natural language order. For the functions lower, initcap, and
 182           upper, it uses Unicode simple case mapping. For pattern matching
 183           (including regular expressions), it uses the POSIX Compatible
 184           variant of Unicode Compatibility Properties. Behavior is
 185           efficient and stable within a PostgreSQL major version. This
 186           collation is only available for encoding UTF8.
 187
 188    C (equivalent to POSIX)
 189           The C and POSIX collations are based on “traditional C”
 190           behavior. They sort by byte values rather than natural language
 191           order, and only the ASCII letters “A” through “Z” are treated as
 192           letters. The behavior is efficient and stable across all
 193           versions for a given database encoding, but behavior may vary
 194           between different database encodings.
 195
 196    default
 197           The default collation selects the locale specified at database
 198           creation time.
 199
 200    Additional collations may be available depending on operating system
 201    support. The efficiency and stability of these additional collations
 202    depend on the collation provider, the provider version, and the locale.
 203
 204 23.2.2.2. Predefined Collations #
 205
 206    If the operating system provides support for using multiple locales
 207    within a single program (newlocale and related functions), or if
 208    support for ICU is configured, then when a database cluster is
 209    initialized, initdb populates the system catalog pg_collation with
 210    collations based on all the locales it finds in the operating system at
 211    the time.
 212
 213    To inspect the currently available locales, use the query SELECT * FROM
 214    pg_collation, or the command \dOS+ in psql.
 215
 216 23.2.2.2.1. libc Collations #
 217
 218    For example, the operating system might provide a locale named
 219    de_DE.utf8. initdb would then create a collation named de_DE.utf8 for
 220    encoding UTF8 that has both LC_COLLATE and LC_CTYPE set to de_DE.utf8.
 221    It will also create a collation with the .utf8 tag stripped off the
 222    name. So you could also use the collation under the name de_DE, which
 223    is less cumbersome to write and makes the name less encoding-dependent.
 224    Note that, nevertheless, the initial set of collation names is
 225    platform-dependent.
 226
 227    The default set of collations provided by libc map directly to the
 228    locales installed in the operating system, which can be listed using
 229    the command locale -a. In case a libc collation is needed that has
 230    different values for LC_COLLATE and LC_CTYPE, or if new locales are
 231    installed in the operating system after the database system was
 232    initialized, then a new collation may be created using the CREATE
 233    COLLATION command. New operating system locales can also be imported en
 234    masse using the pg_import_system_collations() function.
 235
 236    Within any particular database, only collations that use that
 237    database's encoding are of interest. Other entries in pg_collation are
 238    ignored. Thus, a stripped collation name such as de_DE can be
 239    considered unique within a given database even though it would not be
 240    unique globally. Use of the stripped collation names is recommended,
 241    since it will make one fewer thing you need to change if you decide to
 242    change to another database encoding. Note however that the default, C,
 243    and POSIX collations can be used regardless of the database encoding.
 244
 245    PostgreSQL considers distinct collation objects to be incompatible even
 246    when they have identical properties. Thus for example,
 247 SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
 248
 249    will draw an error even though the C and POSIX collations have
 250    identical behaviors. Mixing stripped and non-stripped collation names
 251    is therefore not recommended.
 252
 253 23.2.2.2.2. ICU Collations #
 254
 255    With ICU, it is not sensible to enumerate all possible locale names.
 256    ICU uses a particular naming system for locales, but there are many
 257    more ways to name a locale than there are actually distinct locales.
 258    initdb uses the ICU APIs to extract a set of distinct locales to
 259    populate the initial set of collations. Collations provided by ICU are
 260    created in the SQL environment with names in BCP 47 language tag
 261    format, with a “private use” extension -x-icu appended, to distinguish
 262    them from libc locales.
 263
 264    Here are some example collations that might be created:
 265
 266    de-x-icu #
 267           German collation, default variant
 268
 269    de-AT-x-icu #
 270           German collation for Austria, default variant
 271
 272           (There are also, say, de-DE-x-icu or de-CH-x-icu, but as of this
 273           writing, they are equivalent to de-x-icu.)
 274
 275    und-x-icu (for “undefined”) #
 276           ICU “root” collation. Use this to get a reasonable
 277           language-agnostic sort order.
 278
 279    Some (less frequently used) encodings are not supported by ICU. When
 280    the database encoding is one of these, ICU collation entries in
 281    pg_collation are ignored. Attempting to use one will draw an error
 282    along the lines of “collation "de-x-icu" for encoding "WIN874" does not
 283    exist”.
 284
 285 23.2.2.3. Creating New Collation Objects #
 286
 287    If the standard and predefined collations are not sufficient, users can
 288    create their own collation objects using the SQL command CREATE
 289    COLLATION.
 290
 291    The standard and predefined collations are in the schema pg_catalog,
 292    like all predefined objects. User-defined collations should be created
 293    in user schemas. This also ensures that they are saved by pg_dump.
 294
 295 23.2.2.3.1. libc Collations #
 296
 297    New libc collations can be created like this:
 298 CREATE COLLATION german (provider = libc, locale = 'de_DE');
 299
 300    The exact values that are acceptable for the locale clause in this
 301    command depend on the operating system. On Unix-like systems, the
 302    command locale -a will show a list.
 303
 304    Since the predefined libc collations already include all collations
 305    defined in the operating system when the database instance is
 306    initialized, it is not often necessary to manually create new ones.
 307    Reasons might be if a different naming system is desired (in which case
 308    see also Section 23.2.2.3.3) or if the operating system has been
 309    upgraded to provide new locale definitions (in which case see also
 310    pg_import_system_collations()).
 311
 312 23.2.2.3.2. ICU Collations #
 313
 314    ICU collations can be created like:
 315 CREATE COLLATION german (provider = icu, locale = 'de-DE');
 316
 317    ICU locales are specified as a BCP 47 Language Tag, but can also accept
 318    most libc-style locale names. If possible, libc-style locale names are
 319    transformed into language tags.
 320
 321    New ICU collations can customize collation behavior extensively by
 322    including collation attributes in the language tag. See Section 23.2.3
 323    for details and examples.
 324
 325 23.2.2.3.3. Copying Collations #
 326
 327    The command CREATE COLLATION can also be used to create a new collation
 328    from an existing collation, which can be useful to be able to use
 329    operating-system-independent collation names in applications, create
 330    compatibility names, or use an ICU-provided collation under a more
 331    readable name. For example:
 332 CREATE COLLATION german FROM "de_DE";
 333 CREATE COLLATION french FROM "fr-x-icu";
 334
 335 23.2.2.4. Nondeterministic Collations #
 336
 337    A collation is either deterministic or nondeterministic. A
 338    deterministic collation uses deterministic comparisons, which means
 339    that it considers strings to be equal only if they consist of the same
 340    byte sequence. Nondeterministic comparison may determine strings to be
 341    equal even if they consist of different bytes. Typical situations
 342    include case-insensitive comparison, accent-insensitive comparison, as
 343    well as comparison of strings in different Unicode normal forms. It is
 344    up to the collation provider to actually implement such insensitive
 345    comparisons; the deterministic flag only determines whether ties are to
 346    be broken using bytewise comparison. See also Unicode Technical
 347    Standard 10 for more information on the terminology.
 348
 349    To create a nondeterministic collation, specify the property
 350    deterministic = false to CREATE COLLATION, for example:
 351 CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
 352
 353    This example would use the standard Unicode collation in a
 354    nondeterministic way. In particular, this would allow strings in
 355    different normal forms to be compared correctly. More interesting
 356    examples make use of the ICU customization facilities explained above.
 357    For example:
 358 CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', d
 359 eterministic = false);
 360 CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
 361 ue', deterministic = false);
 362
 363    All standard and predefined collations are deterministic, all
 364    user-defined collations are deterministic by default. While
 365    nondeterministic collations give a more “correct” behavior, especially
 366    when considering the full power of Unicode and its many special cases,
 367    they also have some drawbacks. Foremost, their use leads to a
 368    performance penalty. Note, in particular, that B-tree cannot use
 369    deduplication with indexes that use a nondeterministic collation. Also,
 370    certain operations are not possible with nondeterministic collations,
 371    such as some pattern matching operations. Therefore, they should be
 372    used only in cases where they are specifically wanted.
 373
 374 Tip
 375
 376    To deal with text in different Unicode normalization forms, it is also
 377    an option to use the functions/expressions normalize and is normalized
 378    to preprocess or check the strings, instead of using nondeterministic
 379    collations. There are different trade-offs for each approach.
 380
 381 23.2.3. ICU Custom Collations #
 382
 383    ICU allows extensive control over collation behavior by defining new
 384    collations with collation settings as a part of the language tag. These
 385    settings can modify the collation order to suit a variety of needs. For
 386    instance:
 387 -- ignore differences in accents and case
 388 CREATE COLLATION ignore_accent_case (provider = icu, deterministic = false, loca
 389 le = 'und-u-ks-level1');
 390 SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
 391 SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
 392
 393 -- upper case letters sort before lower case.
 394 CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
 395 SELECT 'B' < 'b' COLLATE upper_first; -- true
 396
 397 -- treat digits numerically and ignore punctuation
 398 CREATE COLLATION num_ignore_punct (provider = icu, deterministic = false, locale
 399  = 'und-u-ka-shifted-kn');
 400 SELECT 'id-45' < 'id-123' COLLATE num_ignore_punct; -- true
 401 SELECT 'w;x*y-z' = 'wxyz' COLLATE num_ignore_punct; -- true
 402
 403    Many of the available options are described in Section 23.2.3.2, or see
 404    Section 23.2.3.5 for more details.
 405
 406 23.2.3.1. ICU Comparison Levels #
 407
 408    Comparison of two strings (collation) in ICU is determined by a
 409    multi-level process, where textual features are grouped into "levels".
 410    Treatment of each level is controlled by the collation settings. Higher
 411    levels correspond to finer textual features.
 412
 413    Table 23.1 shows which textual feature differences are considered
 414    significant when determining equality at the given level. The Unicode
 415    character U+2063 is an invisible separator, and as seen in the table,
 416    is ignored for at all levels of comparison less than identic.
 417
 418    Table 23.1. ICU Collation Levels
 419    Level Description 'f' = 'f' 'ab' = U&'a\2063b' 'x-y' = 'x_y' 'g' = 'G'
 420    'n' = 'ñ' 'y' = 'z'
 421    level1 Base Character true true true true true false
 422    level2 Accents true true true true false false
 423    level3 Case/Variants true true true false false false
 424    level4 Punctuation^[a] true true false false false false
 425    identic All true false false false false false
 426
 427    ^[a] only with ka-shifted; see Table 23.2
 428
 429    At every level, even with full normalization off, basic normalization
 430    is performed. For example, 'á' may be composed of the code points
 431    U&'\0061\0301' or the single code point U&'\00E1', and those sequences
 432    will be considered equal even at the identic level. To treat any
 433    difference in code point representation as distinct, use a collation
 434    created with deterministic set to true.
 435
 436 23.2.3.1.1. Collation Level Examples #
 437
 438 CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-
 439 ka-shifted-ks-level3');
 440 CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-
 441 ka-shifted-ks-level4');
 442 CREATE COLLATION identic (provider = icu, deterministic = false, locale = 'und-u
 443 -ka-shifted-ks-identic');
 444
 445 -- invisible separator ignored at all levels except identic
 446 SELECT 'ab' = U&'a\2063b' COLLATE level4; -- true
 447 SELECT 'ab' = U&'a\2063b' COLLATE identic; -- false
 448
 449 -- punctuation ignored at level3 but not at level 4
 450 SELECT 'x-y' = 'x_y' COLLATE level3; -- true
 451 SELECT 'x-y' = 'x_y' COLLATE level4; -- false
 452
 453 23.2.3.2. Collation Settings for an ICU Locale #
 454
 455    Table 23.2 shows the available collation settings, which can be used as
 456    part of a language tag to customize a collation.
 457
 458    Table 23.2. ICU Collation Settings
 459    Key Values Default Description
 460    co emoji, phonebk, standard, ... standard Collation type. See
 461    Section 23.2.3.5 for additional options and details.
 462    ka noignore, shifted noignore If set to shifted, causes some characters
 463    (e.g. punctuation or space) to be ignored in comparison. Key ks must be
 464    set to level3 or lower to take effect. Set key kv to control which
 465    character classes are ignored.
 466    kb true, false false Backwards comparison for the level 2 differences.
 467    For example, locale und-u-kb sorts 'àe' before 'aé'.
 468    kc true, false false
 469
 470    Separates case into a "level 2.5" that falls between accents and other
 471    level 3 features.
 472
 473    If set to true and ks is set to level1, will ignore accents but take
 474    case into account.
 475    kf upper, lower, false false If set to upper, upper case sorts before
 476    lower case. If set to lower, lower case sorts before upper case. If set
 477    to false, the sort depends on the rules of the locale.
 478    kn true, false false If set to true, numbers within a string are
 479    treated as a single numeric value rather than a sequence of digits. For
 480    example, 'id-45' sorts before 'id-123'.
 481    kk true, false false
 482
 483    Enable full normalization; may affect performance. Basic normalization
 484    is performed even when set to false. Locales for languages that require
 485    full normalization typically enable it by default.
 486
 487    Full normalization is important in some cases, such as when multiple
 488    accents are applied to a single character. For example, the code point
 489    sequences U&'\0065\0323\0302' and U&'\0065\0302\0323' represent an e
 490    with circumflex and dot-below accents applied in different orders. With
 491    full normalization on, these code point sequences are treated as equal;
 492    otherwise they are unequal.
 493    kr space, punct, symbol, currency, digit, script-id
 494
 495    Set to one or more of the valid values, or any BCP 47 script-id, e.g.
 496    latn ("Latin") or grek ("Greek"). Multiple values are separated by "-".
 497
 498    Redefines the ordering of classes of characters; those characters
 499    belonging to a class earlier in the list sort before characters
 500    belonging to a class later in the list. For instance, the value
 501    digit-currency-space (as part of a language tag like
 502    und-u-kr-digit-currency-space) sorts punctuation before digits and
 503    spaces.
 504    ks level1, level2, level3, level4, identic level3 Sensitivity (or
 505    "strength") when determining equality, with level1 the least sensitive
 506    to differences and identic the most sensitive to differences. See
 507    Table 23.1 for details.
 508    kv space, punct, symbol, currency punct Classes of characters ignored
 509    during comparison at level 3. Setting to a later value includes earlier
 510    values; e.g. symbol also includes punct and space in the characters to
 511    be ignored. Key ka must be set to shifted and key ks must be set to
 512    level3 or lower to take effect.
 513
 514    Defaults may depend on locale. The above table is not meant to be
 515    complete. See Section 23.2.3.5 for additional options and details.
 516
 517 Note
 518
 519    For many collation settings, you must create the collation with
 520    deterministic set to false for the setting to have the desired effect
 521    (see Section 23.2.2.4). Additionally, some settings only take effect
 522    when the key ka is set to shifted (see Table 23.2).
 523
 524 23.2.3.3. Collation Settings Examples #
 525
 526    CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale =
 527           'de-u-co-phonebk'); #
 528           German collation with phone book collation type
 529
 530    CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale =
 531           'und-u-co-emoji'); #
 532           Root collation with Emoji collation type, per Unicode Technical
 533           Standard #51
 534
 535    CREATE COLLATION latinlast (provider = icu, locale =
 536           'en-u-kr-grek-latn'); #
 537           Sort Greek letters before Latin ones. (The default is Latin
 538           before Greek.)
 539
 540    CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
 541           #
 542           Sort upper-case letters before lower-case letters. (The default
 543           is lower-case letters first.)
 544
 545    CREATE COLLATION special (provider = icu, locale =
 546           'en-u-kf-upper-kr-grek-latn'); #
 547           Combines both of the above options.
 548
 549 23.2.3.4. ICU Tailoring Rules #
 550
 551    If the options provided by the collation settings shown above are not
 552    sufficient, the order of collation elements can be changed with
 553    tailoring rules, whose syntax is detailed at
 554    https://unicode-org.github.io/icu/userguide/collation/customization/.
 555
 556    This small example creates a collation based on the root locale with a
 557    tailoring rule:
 558 CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W'
 559 );
 560
 561    With this rule, the letter “W” is sorted after “V”, but is treated as a
 562    secondary difference similar to an accent. Rules like this are
 563    contained in the locale definitions of some languages. (Of course, if a
 564    locale definition already contains the desired rules, then they don't
 565    need to be specified again explicitly.)
 566
 567    Here is a more complex example. The following statement sets up a
 568    collation named ebcdic with rules to sort US-ASCII characters in the
 569    order of the EBCDIC encoding.
 570 CREATE COLLATION ebcdic (provider = icu, locale = 'und',
 571 rules = $$
 572 & ' ' < '.' < '<' < '(' < '+' < \|
 573 < '&' < '!' < '$' < '*' < ')' < ';'
 574 < '-' < '/' < ',' < '%' < '_' < '>' < '?'
 575 < '`' < ':' < '#' < '@' < \' < '=' < '"'
 576 <*a-r < '~' <*s-z < '^' < '[' < ']'
 577 < '{' <*A-I < '}' <*J-R < '\' <*S-Z <*0-9
 578 $$);
 579
 580 SELECT c
 581 FROM (VALUES ('a'), ('b'), ('A'), ('B'), ('1'), ('2'), ('!'), ('^')) AS x(c)
 582 ORDER BY c COLLATE ebcdic;
 583  c
 584 ---
 585  !
 586  a
 587  b
 588  ^
 589  A
 590  B
 591  1
 592  2
 593
 594 23.2.3.5. External References for ICU #
 595
 596    This section (Section 23.2.3) is only a brief overview of ICU behavior
 597    and language tags. Refer to the following documents for technical
 598    details, additional options, and new behavior:
 599      * Unicode Technical Standard #35
 600      * BCP 47
 601      * CLDR repository
 602      * https://unicode-org.github.io/icu/userguide/locale/
 603      * https://unicode-org.github.io/icu/userguide/collation/