begriffs open source - ai-pg/blob - full-docs/txt/multibyte.txt

   1
   2 23.3. Character Set Support #
   3
   4    23.3.1. Supported Character Sets
   5    23.3.2. Setting the Character Set
   6    23.3.3. Automatic Character Set Conversion Between Server and Client
   7    23.3.4. Available Character Set Conversions
   8    23.3.5. Further Reading
   9
  10    The character set support in PostgreSQL allows you to store text in a
  11    variety of character sets (also called encodings), including
  12    single-byte character sets such as the ISO 8859 series and
  13    multiple-byte character sets such as EUC (Extended Unix Code), UTF-8,
  14    and Mule internal code. All supported character sets can be used
  15    transparently by clients, but a few are not supported for use within
  16    the server (that is, as a server-side encoding). The default character
  17    set is selected while initializing your PostgreSQL database cluster
  18    using initdb. It can be overridden when you create a database, so you
  19    can have multiple databases each with a different character set.
  20
  21    An important restriction, however, is that each database's character
  22    set must be compatible with the database's LC_CTYPE (character
  23    classification) and LC_COLLATE (string sort order) locale settings. For
  24    C or POSIX locale, any character set is allowed, but for other
  25    libc-provided locales there is only one character set that will work
  26    correctly. (On Windows, however, UTF-8 encoding can be used with any
  27    locale.) If you have ICU support configured, ICU-provided locales can
  28    be used with most but not all server-side encodings.
  29
  30 23.3.1. Supported Character Sets #
  31
  32    Table 23.3 shows the character sets available for use in PostgreSQL.
  33
  34    Table 23.3. PostgreSQL Character Sets
  35    Name Description Language Server? ICU? Bytes/Char Aliases
  36    BIG5 Big Five Traditional Chinese No No 1–2 WIN950, Windows950
  37    EUC_CN Extended UNIX Code-CN Simplified Chinese Yes Yes 1–3
  38    EUC_JP Extended UNIX Code-JP Japanese Yes Yes 1–3
  39    EUC_JIS_2004 Extended UNIX Code-JP, JIS X 0213 Japanese Yes No 1–3
  40    EUC_KR Extended UNIX Code-KR Korean Yes Yes 1–3
  41    EUC_TW Extended UNIX Code-TW Traditional Chinese, Taiwanese Yes Yes 1–4
  42
  43    GB18030 National Standard Chinese No No 1–4
  44    GBK Extended National Standard Simplified Chinese No No 1–2 WIN936,
  45    Windows936
  46    ISO_8859_5 ISO 8859-5, ECMA 113 Latin/Cyrillic Yes Yes 1
  47    ISO_8859_6 ISO 8859-6, ECMA 114 Latin/Arabic Yes Yes 1
  48    ISO_8859_7 ISO 8859-7, ECMA 118 Latin/Greek Yes Yes 1
  49    ISO_8859_8 ISO 8859-8, ECMA 121 Latin/Hebrew Yes Yes 1
  50    JOHAB JOHAB Korean (Hangul) No No 1–3
  51    KOI8R KOI8-R Cyrillic (Russian) Yes Yes 1 KOI8
  52    KOI8U KOI8-U Cyrillic (Ukrainian) Yes Yes 1
  53    LATIN1 ISO 8859-1, ECMA 94 Western European Yes Yes 1 ISO88591
  54    LATIN2 ISO 8859-2, ECMA 94 Central European Yes Yes 1 ISO88592
  55    LATIN3 ISO 8859-3, ECMA 94 South European Yes Yes 1 ISO88593
  56    LATIN4 ISO 8859-4, ECMA 94 North European Yes Yes 1 ISO88594
  57    LATIN5 ISO 8859-9, ECMA 128 Turkish Yes Yes 1 ISO88599
  58    LATIN6 ISO 8859-10, ECMA 144 Nordic Yes Yes 1 ISO885910
  59    LATIN7 ISO 8859-13 Baltic Yes Yes 1 ISO885913
  60    LATIN8 ISO 8859-14 Celtic Yes Yes 1 ISO885914
  61    LATIN9 ISO 8859-15 LATIN1 with Euro and accents Yes Yes 1 ISO885915
  62    LATIN10 ISO 8859-16, ASRO SR 14111 Romanian Yes No 1 ISO885916
  63    MULE_INTERNAL Mule internal code Multilingual Emacs Yes No 1–4
  64    SJIS Shift JIS Japanese No No 1–2 Mskanji, ShiftJIS, WIN932, Windows932
  65    SHIFT_JIS_2004 Shift JIS, JIS X 0213 Japanese No No 1–2
  66    SQL_ASCII unspecified (see text) any Yes No 1
  67    UHC Unified Hangul Code Korean No No 1–2 WIN949, Windows949
  68    UTF8 Unicode, 8-bit all Yes Yes 1–4 Unicode
  69    WIN866 Windows CP866 Cyrillic Yes Yes 1 ALT
  70    WIN874 Windows CP874 Thai Yes No 1
  71    WIN1250 Windows CP1250 Central European Yes Yes 1
  72    WIN1251 Windows CP1251 Cyrillic Yes Yes 1 WIN
  73    WIN1252 Windows CP1252 Western European Yes Yes 1
  74    WIN1253 Windows CP1253 Greek Yes Yes 1
  75    WIN1254 Windows CP1254 Turkish Yes Yes 1
  76    WIN1255 Windows CP1255 Hebrew Yes Yes 1
  77    WIN1256 Windows CP1256 Arabic Yes Yes 1
  78    WIN1257 Windows CP1257 Baltic Yes Yes 1
  79    WIN1258 Windows CP1258 Vietnamese Yes Yes 1 ABC, TCVN, TCVN5712, VSCII
  80
  81    Not all client APIs support all the listed character sets. For example,
  82    the PostgreSQL JDBC driver does not support MULE_INTERNAL, LATIN6,
  83    LATIN8, and LATIN10.
  84
  85    The SQL_ASCII setting behaves considerably differently from the other
  86    settings. When the server character set is SQL_ASCII, the server
  87    interprets byte values 0–127 according to the ASCII standard, while
  88    byte values 128–255 are taken as uninterpreted characters. No encoding
  89    conversion will be done when the setting is SQL_ASCII. Thus, this
  90    setting is not so much a declaration that a specific encoding is in
  91    use, as a declaration of ignorance about the encoding. In most cases,
  92    if you are working with any non-ASCII data, it is unwise to use the
  93    SQL_ASCII setting because PostgreSQL will be unable to help you by
  94    converting or validating non-ASCII characters.
  95
  96 23.3.2. Setting the Character Set #
  97
  98    initdb defines the default character set (encoding) for a PostgreSQL
  99    cluster. For example,
 100 initdb -E EUC_JP
 101
 102    sets the default character set to EUC_JP (Extended Unix Code for
 103    Japanese). You can use --encoding instead of -E if you prefer longer
 104    option strings. If no -E or --encoding option is given, initdb attempts
 105    to determine the appropriate encoding to use based on the specified or
 106    default locale.
 107
 108    You can specify a non-default encoding at database creation time,
 109    provided that the encoding is compatible with the selected locale:
 110 createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr
 111 korean
 112
 113    This will create a database named korean that uses the character set
 114    EUC_KR, and locale ko_KR. Another way to accomplish this is to use this
 115    SQL command:
 116 CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE=
 117 'ko_KR.euckr' TEMPLATE=template0;
 118
 119    Notice that the above commands specify copying the template0 database.
 120    When copying any other database, the encoding and locale settings
 121    cannot be changed from those of the source database, because that might
 122    result in corrupt data. For more information see Section 22.3.
 123
 124    The encoding for a database is stored in the system catalog
 125    pg_database. You can see it by using the psql -l option or the \l
 126    command.
 127 $ psql -l
 128                                          List of databases
 129    Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access
 130 Privileges
 131 -----------+----------+-----------+-------------+-------------+-----------------
 132 --------------------
 133  clocaledb | hlinnaka | SQL_ASCII | C           | C           |
 134  englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
 135  japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
 136  korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
 137  postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
 138  template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hli
 139 nnaka=CTc/hlinnaka}
 140  template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hli
 141 nnaka=CTc/hlinnaka}
 142 (7 rows)
 143
 144 Important
 145
 146    On most modern operating systems, PostgreSQL can determine which
 147    character set is implied by the LC_CTYPE setting, and it will enforce
 148    that only the matching database encoding is used. On older systems it
 149    is your responsibility to ensure that you use the encoding expected by
 150    the locale you have selected. A mistake in this area is likely to lead
 151    to strange behavior of locale-dependent operations such as sorting.
 152
 153    PostgreSQL will allow superusers to create databases with SQL_ASCII
 154    encoding even when LC_CTYPE is not C or POSIX. As noted above,
 155    SQL_ASCII does not enforce that the data stored in the database has any
 156    particular encoding, and so this choice poses risks of locale-dependent
 157    misbehavior. Using this combination of settings is deprecated and may
 158    someday be forbidden altogether.
 159
 160 23.3.3. Automatic Character Set Conversion Between Server and Client #
 161
 162    PostgreSQL supports automatic character set conversion between server
 163    and client for many combinations of character sets (Section 23.3.4
 164    shows which ones).
 165
 166    To enable automatic character set conversion, you have to tell
 167    PostgreSQL the character set (encoding) you would like to use in the
 168    client. There are several ways to accomplish this:
 169      * Using the \encoding command in psql. \encoding allows you to change
 170        client encoding on the fly. For example, to change the encoding to
 171        SJIS, type:
 172 \encoding SJIS
 173
 174      * libpq (Section 32.11) has functions to control the client encoding.
 175      * Using SET client_encoding TO. Setting the client encoding can be
 176        done with this SQL command:
 177 SET CLIENT_ENCODING TO 'value';
 178
 179        Also you can use the standard SQL syntax SET NAMES for this
 180        purpose:
 181 SET NAMES 'value';
 182
 183        To query the current client encoding:
 184 SHOW client_encoding;
 185
 186        To return to the default encoding:
 187 RESET client_encoding;
 188
 189      * Using PGCLIENTENCODING. If the environment variable
 190        PGCLIENTENCODING is defined in the client's environment, that
 191        client encoding is automatically selected when a connection to the
 192        server is made. (This can subsequently be overridden using any of
 193        the other methods mentioned above.)
 194      * Using the configuration variable client_encoding. If the
 195        client_encoding variable is set, that client encoding is
 196        automatically selected when a connection to the server is made.
 197        (This can subsequently be overridden using any of the other methods
 198        mentioned above.)
 199
 200    If the conversion of a particular character is not possible — suppose
 201    you chose EUC_JP for the server and LATIN1 for the client, and some
 202    Japanese characters are returned that do not have a representation in
 203    LATIN1 — an error is reported.
 204
 205    If the client character set is defined as SQL_ASCII, encoding
 206    conversion is disabled, regardless of the server's character set.
 207    (However, if the server's character set is not SQL_ASCII, the server
 208    will still check that incoming data is valid for that encoding; so the
 209    net effect is as though the client character set were the same as the
 210    server's.) Just as for the server, use of SQL_ASCII is unwise unless
 211    you are working with all-ASCII data.
 212
 213 23.3.4. Available Character Set Conversions #
 214
 215    PostgreSQL allows conversion between any two character sets for which a
 216    conversion function is listed in the pg_conversion system catalog.
 217    PostgreSQL comes with some predefined conversions, as summarized in
 218    Table 23.4 and shown in more detail in Table 23.5. You can create a new
 219    conversion using the SQL command CREATE CONVERSION. (To be used for
 220    automatic client/server conversions, a conversion must be marked as
 221    “default” for its character set pair.)
 222
 223    Table 23.4. Built-in Client/Server Character Set Conversions
 224    Server Character Set Available Client Character Sets
 225    BIG5 not supported as a server encoding
 226    EUC_CN EUC_CN, MULE_INTERNAL, UTF8
 227    EUC_JP EUC_JP, MULE_INTERNAL, SJIS, UTF8
 228    EUC_JIS_2004 EUC_JIS_2004, SHIFT_JIS_2004, UTF8
 229    EUC_KR EUC_KR, MULE_INTERNAL, UTF8
 230    EUC_TW EUC_TW, BIG5, MULE_INTERNAL, UTF8
 231    GB18030 not supported as a server encoding
 232    GBK not supported as a server encoding
 233    ISO_8859_5 ISO_8859_5, KOI8R, MULE_INTERNAL, UTF8, WIN866, WIN1251
 234    ISO_8859_6 ISO_8859_6, UTF8
 235    ISO_8859_7 ISO_8859_7, UTF8
 236    ISO_8859_8 ISO_8859_8, UTF8
 237    JOHAB not supported as a server encoding
 238    KOI8R KOI8R, ISO_8859_5, MULE_INTERNAL, UTF8, WIN866, WIN1251
 239    KOI8U KOI8U, UTF8
 240    LATIN1 LATIN1, MULE_INTERNAL, UTF8
 241    LATIN2 LATIN2, MULE_INTERNAL, UTF8, WIN1250
 242    LATIN3 LATIN3, MULE_INTERNAL, UTF8
 243    LATIN4 LATIN4, MULE_INTERNAL, UTF8
 244    LATIN5 LATIN5, UTF8
 245    LATIN6 LATIN6, UTF8
 246    LATIN7 LATIN7, UTF8
 247    LATIN8 LATIN8, UTF8
 248    LATIN9 LATIN9, UTF8
 249    LATIN10 LATIN10, UTF8
 250    MULE_INTERNAL MULE_INTERNAL, BIG5, EUC_CN, EUC_JP, EUC_KR, EUC_TW,
 251    ISO_8859_5, KOI8R, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251
 252    SJIS not supported as a server encoding
 253    SHIFT_JIS_2004 not supported as a server encoding
 254    SQL_ASCII any (no conversion will be performed)
 255    UHC not supported as a server encoding
 256    UTF8 all supported encodings
 257    WIN866 WIN866, ISO_8859_5, KOI8R, MULE_INTERNAL, UTF8, WIN1251
 258    WIN874 WIN874, UTF8
 259    WIN1250 WIN1250, LATIN2, MULE_INTERNAL, UTF8
 260    WIN1251 WIN1251, ISO_8859_5, KOI8R, MULE_INTERNAL, UTF8, WIN866
 261    WIN1252 WIN1252, UTF8
 262    WIN1253 WIN1253, UTF8
 263    WIN1254 WIN1254, UTF8
 264    WIN1255 WIN1255, UTF8
 265    WIN1256 WIN1256, UTF8
 266    WIN1257 WIN1257, UTF8
 267    WIN1258 WIN1258, UTF8
 268
 269    Table 23.5. All Built-in Character Set Conversions
 270        Conversion Name ^[a]       Source Encoding Destination Encoding
 271    big5_to_euc_tw                 BIG5            EUC_TW
 272    big5_to_mic                    BIG5            MULE_INTERNAL
 273    big5_to_utf8                   BIG5            UTF8
 274    euc_cn_to_mic                  EUC_CN          MULE_INTERNAL
 275    euc_cn_to_utf8                 EUC_CN          UTF8
 276    euc_jp_to_mic                  EUC_JP          MULE_INTERNAL
 277    euc_jp_to_sjis                 EUC_JP          SJIS
 278    euc_jp_to_utf8                 EUC_JP          UTF8
 279    euc_kr_to_mic                  EUC_KR          MULE_INTERNAL
 280    euc_kr_to_utf8                 EUC_KR          UTF8
 281    euc_tw_to_big5                 EUC_TW          BIG5
 282    euc_tw_to_mic                  EUC_TW          MULE_INTERNAL
 283    euc_tw_to_utf8                 EUC_TW          UTF8
 284    gb18030_to_utf8                GB18030         UTF8
 285    gbk_to_utf8                    GBK             UTF8
 286    iso_8859_10_to_utf8            LATIN6          UTF8
 287    iso_8859_13_to_utf8            LATIN7          UTF8
 288    iso_8859_14_to_utf8            LATIN8          UTF8
 289    iso_8859_15_to_utf8            LATIN9          UTF8
 290    iso_8859_16_to_utf8            LATIN10         UTF8
 291    iso_8859_1_to_mic              LATIN1          MULE_INTERNAL
 292    iso_8859_1_to_utf8             LATIN1          UTF8
 293    iso_8859_2_to_mic              LATIN2          MULE_INTERNAL
 294    iso_8859_2_to_utf8             LATIN2          UTF8
 295    iso_8859_2_to_windows_1250     LATIN2          WIN1250
 296    iso_8859_3_to_mic              LATIN3          MULE_INTERNAL
 297    iso_8859_3_to_utf8             LATIN3          UTF8
 298    iso_8859_4_to_mic              LATIN4          MULE_INTERNAL
 299    iso_8859_4_to_utf8             LATIN4          UTF8
 300    iso_8859_5_to_koi8_r           ISO_8859_5      KOI8R
 301    iso_8859_5_to_mic              ISO_8859_5      MULE_INTERNAL
 302    iso_8859_5_to_utf8             ISO_8859_5      UTF8
 303    iso_8859_5_to_windows_1251     ISO_8859_5      WIN1251
 304    iso_8859_5_to_windows_866      ISO_8859_5      WIN866
 305    iso_8859_6_to_utf8             ISO_8859_6      UTF8
 306    iso_8859_7_to_utf8             ISO_8859_7      UTF8
 307    iso_8859_8_to_utf8             ISO_8859_8      UTF8
 308    iso_8859_9_to_utf8             LATIN5          UTF8
 309    johab_to_utf8                  JOHAB           UTF8
 310    koi8_r_to_iso_8859_5           KOI8R           ISO_8859_5
 311    koi8_r_to_mic                  KOI8R           MULE_INTERNAL
 312    koi8_r_to_utf8                 KOI8R           UTF8
 313    koi8_r_to_windows_1251         KOI8R           WIN1251
 314    koi8_r_to_windows_866          KOI8R           WIN866
 315    koi8_u_to_utf8                 KOI8U           UTF8
 316    mic_to_big5                    MULE_INTERNAL   BIG5
 317    mic_to_euc_cn                  MULE_INTERNAL   EUC_CN
 318    mic_to_euc_jp                  MULE_INTERNAL   EUC_JP
 319    mic_to_euc_kr                  MULE_INTERNAL   EUC_KR
 320    mic_to_euc_tw                  MULE_INTERNAL   EUC_TW
 321    mic_to_iso_8859_1              MULE_INTERNAL   LATIN1
 322    mic_to_iso_8859_2              MULE_INTERNAL   LATIN2
 323    mic_to_iso_8859_3              MULE_INTERNAL   LATIN3
 324    mic_to_iso_8859_4              MULE_INTERNAL   LATIN4
 325    mic_to_iso_8859_5              MULE_INTERNAL   ISO_8859_5
 326    mic_to_koi8_r                  MULE_INTERNAL   KOI8R
 327    mic_to_sjis                    MULE_INTERNAL   SJIS
 328    mic_to_windows_1250            MULE_INTERNAL   WIN1250
 329    mic_to_windows_1251            MULE_INTERNAL   WIN1251
 330    mic_to_windows_866             MULE_INTERNAL   WIN866
 331    sjis_to_euc_jp                 SJIS            EUC_JP
 332    sjis_to_mic                    SJIS            MULE_INTERNAL
 333    sjis_to_utf8                   SJIS            UTF8
 334    windows_1258_to_utf8           WIN1258         UTF8
 335    uhc_to_utf8                    UHC             UTF8
 336    utf8_to_big5                   UTF8            BIG5
 337    utf8_to_euc_cn                 UTF8            EUC_CN
 338    utf8_to_euc_jp                 UTF8            EUC_JP
 339    utf8_to_euc_kr                 UTF8            EUC_KR
 340    utf8_to_euc_tw                 UTF8            EUC_TW
 341    utf8_to_gb18030                UTF8            GB18030
 342    utf8_to_gbk                    UTF8            GBK
 343    utf8_to_iso_8859_1             UTF8            LATIN1
 344    utf8_to_iso_8859_10            UTF8            LATIN6
 345    utf8_to_iso_8859_13            UTF8            LATIN7
 346    utf8_to_iso_8859_14            UTF8            LATIN8
 347    utf8_to_iso_8859_15            UTF8            LATIN9
 348    utf8_to_iso_8859_16            UTF8            LATIN10
 349    utf8_to_iso_8859_2             UTF8            LATIN2
 350    utf8_to_iso_8859_3             UTF8            LATIN3
 351    utf8_to_iso_8859_4             UTF8            LATIN4
 352    utf8_to_iso_8859_5             UTF8            ISO_8859_5
 353    utf8_to_iso_8859_6             UTF8            ISO_8859_6
 354    utf8_to_iso_8859_7             UTF8            ISO_8859_7
 355    utf8_to_iso_8859_8             UTF8            ISO_8859_8
 356    utf8_to_iso_8859_9             UTF8            LATIN5
 357    utf8_to_johab                  UTF8            JOHAB
 358    utf8_to_koi8_r                 UTF8            KOI8R
 359    utf8_to_koi8_u                 UTF8            KOI8U
 360    utf8_to_sjis                   UTF8            SJIS
 361    utf8_to_windows_1258           UTF8            WIN1258
 362    utf8_to_uhc                    UTF8            UHC
 363    utf8_to_windows_1250           UTF8            WIN1250
 364    utf8_to_windows_1251           UTF8            WIN1251
 365    utf8_to_windows_1252           UTF8            WIN1252
 366    utf8_to_windows_1253           UTF8            WIN1253
 367    utf8_to_windows_1254           UTF8            WIN1254
 368    utf8_to_windows_1255           UTF8            WIN1255
 369    utf8_to_windows_1256           UTF8            WIN1256
 370    utf8_to_windows_1257           UTF8            WIN1257
 371    utf8_to_windows_866            UTF8            WIN866
 372    utf8_to_windows_874            UTF8            WIN874
 373    windows_1250_to_iso_8859_2     WIN1250         LATIN2
 374    windows_1250_to_mic            WIN1250         MULE_INTERNAL
 375    windows_1250_to_utf8           WIN1250         UTF8
 376    windows_1251_to_iso_8859_5     WIN1251         ISO_8859_5
 377    windows_1251_to_koi8_r         WIN1251         KOI8R
 378    windows_1251_to_mic            WIN1251         MULE_INTERNAL
 379    windows_1251_to_utf8           WIN1251         UTF8
 380    windows_1251_to_windows_866    WIN1251         WIN866
 381    windows_1252_to_utf8           WIN1252         UTF8
 382    windows_1256_to_utf8           WIN1256         UTF8
 383    windows_866_to_iso_8859_5      WIN866          ISO_8859_5
 384    windows_866_to_koi8_r          WIN866          KOI8R
 385    windows_866_to_mic             WIN866          MULE_INTERNAL
 386    windows_866_to_utf8            WIN866          UTF8
 387    windows_866_to_windows_1251    WIN866          WIN
 388    windows_874_to_utf8            WIN874          UTF8
 389    euc_jis_2004_to_utf8           EUC_JIS_2004    UTF8
 390    utf8_to_euc_jis_2004           UTF8            EUC_JIS_2004
 391    shift_jis_2004_to_utf8         SHIFT_JIS_2004  UTF8
 392    utf8_to_shift_jis_2004         UTF8            SHIFT_JIS_2004
 393    euc_jis_2004_to_shift_jis_2004 EUC_JIS_2004    SHIFT_JIS_2004
 394    shift_jis_2004_to_euc_jis_2004 SHIFT_JIS_2004  EUC_JIS_2004
 395
 396    ^[a] The conversion names follow a standard naming scheme: The official
 397    name of the source encoding with all non-alphanumeric characters
 398    replaced by underscores, followed by _to_, followed by the similarly
 399    processed destination encoding name. Therefore, these names sometimes
 400    deviate from the customary encoding names shown in Table 23.3.
 401
 402 23.3.5. Further Reading #
 403
 404    These are good sources to start learning about various kinds of
 405    encoding systems.
 406
 407    CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese
 408           Computing
 409           Contains detailed explanations of EUC_JP, EUC_CN, EUC_KR,
 410           EUC_TW.
 411
 412    https://www.unicode.org/
 413           The web site of the Unicode Consortium.
 414
 415    RFC 3629
 416           UTF-8 (8-bit UCS/Unicode Transformation Format) is defined here.