6 23.1.3. Selecting Locales
7 23.1.4. Locale Providers
11 Locale support refers to an application respecting cultural preferences
12 regarding alphabets, sorting, number formatting, etc. PostgreSQL uses
13 the standard ISO C and POSIX locale facilities provided by the server
14 operating system. For additional information refer to the documentation
19 Locale support is automatically initialized when a database cluster is
20 created using initdb. initdb will initialize the database cluster with
21 the locale setting of its execution environment by default, so if your
22 system is already set to use the locale that you want in your database
23 cluster then there is nothing else you need to do. If you want to use a
24 different locale (or you are not sure which locale your system is set
25 to), you can instruct initdb exactly which locale to use by specifying
26 the --locale option. For example:
29 This example for Unix systems sets the locale to Swedish (sv) as spoken
30 in Sweden (SE). Other possibilities might include en_US (U.S. English)
31 and fr_CA (French Canadian). If more than one character set can be used
32 for a locale then the specifications can take the form
33 language_territory.codeset. For example, fr_BE.UTF-8 represents the
34 French language (fr) as spoken in Belgium (BE), with a UTF-8 character
37 What locales are available on your system under what names depends on
38 what was provided by the operating system vendor and what was
39 installed. On most Unix systems, the command locale -a will provide a
40 list of available locales. Windows uses more verbose locale names, such
41 as German_Germany or Swedish_Sweden.1252, but the principles are the
44 Occasionally it is useful to mix rules from several locales, e.g., use
45 English collation rules but Spanish messages. To support that, a set of
46 locale subcategories exist that control only certain aspects of the
48 LC_COLLATE String sort order
49 LC_CTYPE Character classification (What is a letter? Its upper-case
51 LC_MESSAGES Language of messages
52 LC_MONETARY Formatting of currency amounts
53 LC_NUMERIC Formatting of numbers
54 LC_TIME Formatting of dates and times
56 The category names translate into names of initdb options to override
57 the locale choice for a specific category. For instance, to set the
58 locale to French Canadian, but use U.S. rules for formatting currency,
59 use initdb --locale=fr_CA --lc-monetary=en_US.
61 If you want the system to behave as if it had no locale support, use
62 the special locale name C, or equivalently POSIX.
64 Some locale categories must have their values fixed when the database
65 is created. You can use different settings for different databases, but
66 once a database is created, you cannot change them for that database
67 anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the
68 sort order of indexes, so they must be kept fixed, or indexes on text
69 columns would become corrupt. (But you can alleviate this restriction
70 using collations, as discussed in Section 23.2.) The default values for
71 these categories are determined when initdb is run, and those values
72 are used when new databases are created, unless specified otherwise in
73 the CREATE DATABASE command.
75 The other locale categories can be changed whenever desired by setting
76 the server configuration parameters that have the same name as the
77 locale categories (see Section 19.11.2 for details). The values that
78 are chosen by initdb are actually only written into the configuration
79 file postgresql.conf to serve as defaults when the server is started.
80 If you remove these assignments from postgresql.conf then the server
81 will inherit the settings from its execution environment.
83 Note that the locale behavior of the server is determined by the
84 environment variables seen by the server, not by the environment of any
85 client. Therefore, be careful to configure the correct locale settings
86 before starting the server. A consequence of this is that if client and
87 server are set up in different locales, messages might appear in
88 different languages depending on where they originated.
92 When we speak of inheriting the locale from the execution environment,
93 this means the following on most operating systems: For a given locale
94 category, say the collation, the following environment variables are
95 consulted in this order until one is found to be set: LC_ALL,
96 LC_COLLATE (or the variable corresponding to the respective category),
97 LANG. If none of these environment variables are set then the locale
100 Some message localization libraries also look at the environment
101 variable LANGUAGE which overrides all other locale settings for the
102 purpose of setting the language of messages. If in doubt, please refer
103 to the documentation of your operating system, in particular the
104 documentation about gettext.
106 To enable messages to be translated to the user's preferred language,
107 NLS must have been selected at build time (configure --enable-nls). All
108 other locale support is built in automatically.
112 The locale settings influence the following SQL features:
113 * Sort order in queries using ORDER BY or the standard comparison
114 operators on textual data
115 * The upper, lower, and initcap functions
116 * Pattern matching operators (LIKE, SIMILAR TO, and POSIX-style
117 regular expressions); locales affect both case insensitive matching
118 and the classification of characters by character-class regular
120 * The to_char family of functions
121 * The ability to use indexes with LIKE clauses
123 The drawback of using locales other than C or POSIX in PostgreSQL is
124 its performance impact. It slows character handling and prevents
125 ordinary indexes from being used by LIKE. For this reason use locales
126 only if you actually need them.
128 As a workaround to allow PostgreSQL to use indexes with LIKE clauses
129 under a non-C locale, several custom operator classes exist. These
130 allow the creation of an index that performs a strict
131 character-by-character comparison, ignoring locale comparison rules.
132 Refer to Section 11.10 for more information. Another approach is to
133 create indexes using the C collation, as discussed in Section 23.2.
135 23.1.3. Selecting Locales #
137 Locales can be selected in different scopes depending on requirements.
138 The above overview showed how locales are specified using initdb to set
139 the defaults for the entire cluster. The following list shows where
140 locales can be selected. Each item provides the defaults for the
141 subsequent items, and each lower item allows overriding the defaults on
143 1. As explained above, the environment of the operating system
144 provides the defaults for the locales of a newly initialized
145 database cluster. In many cases, this is enough: if the operating
146 system is configured for the desired language/territory, by default
147 PostgreSQL will also behave according to that locale.
148 2. As shown above, command-line options for initdb specify the locale
149 settings for a newly initialized database cluster. Use this if the
150 operating system does not have the locale configuration you want
151 for your database system.
152 3. A locale can be selected separately for each database. The SQL
153 command CREATE DATABASE and its command-line equivalent createdb
154 have options for that. Use this for example if a database cluster
155 houses databases for multiple tenants with different requirements.
156 4. Locale settings can be made for individual table columns. This uses
157 an SQL object called collation and is explained in Section 23.2.
158 Use this for example to sort data in different languages or
159 customize the sort order of a particular table.
160 5. Finally, locales can be selected for an individual query. Again,
161 this uses SQL collation objects. This could be used to change the
162 sort order based on run-time choices or for ad-hoc experimentation.
164 23.1.4. Locale Providers #
166 A locale provider specifies which library defines the locale behavior
167 for collations and character classifications.
169 The commands and tools that select the locale settings, as described
170 above, each have an option to select the locale provider. Here is an
171 example to initialize a database cluster using the ICU provider:
172 initdb --locale-provider=icu --icu-locale=en
174 See the description of the respective commands and programs for
175 details. Note that you can mix locale providers at different
176 granularities, for example use libc by default for the cluster but have
177 one database that uses the icu provider, and then have collation
178 objects using either provider within those databases.
180 Regardless of the locale provider, the operating system is still used
181 to provide some locale-aware behavior, such as messages (see
184 The available locale providers are listed below:
187 The builtin provider uses built-in operations. Only the C,
188 C.UTF-8, and PG_UNICODE_FAST locales are supported for this
191 The C locale behavior is identical to the C locale in the libc
192 provider. When using this locale, the behavior may depend on the
195 The C.UTF-8 locale is available only for when the database
196 encoding is UTF-8, and the behavior is based on Unicode. The
197 collation uses the code point values only. The regular
198 expression character classes are based on the "POSIX Compatible"
199 semantics, and the case mapping is the "simple" variant.
201 The PG_UNICODE_FAST locale is available only when the database
202 encoding is UTF-8, and the behavior is based on Unicode. The
203 collation uses the code point values only. The regular
204 expression character classes are based on the "Standard"
205 semantics, and the case mapping is the "full" variant.
208 The icu provider uses the external ICU library. PostgreSQL must
209 have been configured with support.
211 ICU provides collation and character classification behavior
212 that is independent of the operating system and database
213 encoding, which is preferable if you expect to transition to
214 other platforms without any change in results. LC_COLLATE and
215 LC_CTYPE can be set independently of the ICU locale.
219 For the ICU provider, results may depend on the version of the
220 ICU library used, as it is updated to reflect changes in natural
224 The libc provider uses the operating system's C library. The
225 collation and character classification behavior is controlled by
226 the settings LC_COLLATE and LC_CTYPE, so they cannot be set
231 The same locale name may have different behavior on different
232 platforms when using the libc provider.
234 23.1.5. ICU Locales #
236 23.1.5.1. ICU Locale Names #
238 The ICU format for the locale name is a Language Tag.
239 CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
240 CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
242 23.1.5.2. Locale Canonicalization and Validation #
244 When defining a new ICU collation object or database with ICU as the
245 provider, the given locale name is transformed ("canonicalized") into a
246 language tag if not already in that form. For instance,
247 CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');
248 NOTICE: using standard form "en-US-u-kn" for locale "en-US-u-kn-true"
249 CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
250 NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
252 If you see this notice, ensure that the provider and locale are the
253 expected result. For consistent results when using the ICU provider,
254 specify the canonical language tag instead of relying on the
257 A locale with no language name, or the special language name root, is
258 transformed to have the language und ("undefined").
260 ICU can transform most libc locale names, as well as some other
261 formats, into language tags for easier transition to ICU. If a libc
262 locale name is used in ICU, it may not have precisely the same behavior
265 If there is a problem interpreting the locale name, or if the locale
266 name represents a language or region that ICU does not recognize, you
267 will see the following warning:
268 CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');
269 WARNING: ICU locale "nonsense" has unknown language "nonsense"
270 HINT: To disable ICU locale validation, set parameter icu_validation_level to D
274 icu_validation_level controls how the message is reported. Unless set
275 to ERROR, the collation will still be created, but the behavior may not
276 be what the user intended.
278 23.1.5.3. Language Tag #
280 A language tag, defined in BCP 47, is a standardized identifier used to
281 identify languages, regions, and other information about a locale.
283 Basic language tags are simply language-region; or even just language.
284 The language is a language code (e.g. fr for French), and region is a
285 region code (e.g. CA for Canada). Examples: ja-JP, de, or fr-CA.
287 Collation settings may be included in the language tag to customize
288 collation behavior. ICU allows extensive customization, such as
289 sensitivity (or insensitivity) to accents, case, and punctuation;
290 treatment of digits within text; and many other options to satisfy a
293 To include this additional collation information in a language tag,
294 append -u, which indicates there are additional collation settings,
295 followed by one or more -key-value pairs. The key is the key for a
296 collation setting and value is a valid value for that setting. For
297 boolean settings, the -key may be specified without a corresponding
298 -value, which implies a value of true.
300 For example, the language tag en-US-u-kn-ks-level2 means the locale
301 with the English language in the US region, with collation settings kn
302 set to true and ks set to level2. Those settings mean the collation
303 will be case-insensitive and treat a sequence of digits as a single
305 CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = '
306 en-US-u-kn-ks-level2');
307 SELECT 'aB' = 'Ab' COLLATE mycollation5 as result;
313 SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result;
319 See Section 23.2.3 for details and additional examples of using
320 language tags with custom collation information for the locale.
324 If locale support doesn't work according to the explanation above,
325 check that the locale support in your operating system is correctly
326 configured. To check what locales are installed on your system, you can
327 use the command locale -a if your operating system provides it.
329 Check that PostgreSQL is actually using the locale that you think it
330 is. The LC_COLLATE and LC_CTYPE settings are determined when a database
331 is created, and cannot be changed except by creating a new database.
332 Other locale settings including LC_MESSAGES and LC_MONETARY are
333 initially determined by the environment the server is started in, but
334 can be changed on-the-fly. You can check the active locale settings
335 using the SHOW command.
337 The directory src/test/locale in the source distribution contains a
338 test suite for PostgreSQL's locale support.
340 Client applications that handle server-side errors by parsing the text
341 of the error message will obviously have problems when the server's
342 messages are in a different language. Authors of such applications are
343 advised to make use of the error code scheme instead.
345 Maintaining catalogs of message translations requires the on-going
346 efforts of many volunteers that want to see PostgreSQL speak their
347 preferred language well. If messages in your language are currently not
348 available or not fully translated, your assistance would be
349 appreciated. If you want to help, refer to Chapter 56 or write to the
350 developers' mailing list.