4 8.13.1. Creating XML Values
5 8.13.2. Encoding Handling
6 8.13.3. Accessing XML Values
8 The xml data type can be used to store XML data. Its advantage over
9 storing XML data in a text field is that it checks the input values for
10 well-formedness, and there are support functions to perform type-safe
11 operations on it; see Section 9.15. Use of this data type requires the
12 installation to have been built with configure --with-libxml.
14 The xml type can store well-formed “documents”, as defined by the XML
15 standard, as well as “content” fragments, which are defined by
16 reference to the more permissive “document node” of the XQuery and
17 XPath data model. Roughly, this means that content fragments can have
18 more than one top-level element or character node. The expression
19 xmlvalue IS DOCUMENT can be used to evaluate whether a particular xml
20 value is a full document or only a content fragment.
22 Limits and compatibility notes for the xml data type can be found in
25 8.13.1. Creating XML Values #
27 To produce a value of type xml from character data, use the function
29 XMLPARSE ( { DOCUMENT | CONTENT } value)
32 XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...
34 XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')
36 While this is the only way to convert character strings into XML values
37 according to the SQL standard, the PostgreSQL-specific syntaxes:
43 The xml type does not validate input values against a document type
44 declaration (DTD), even when the input value specifies a DTD. There is
45 also currently no built-in support for validating against other XML
46 schema languages such as XML Schema.
48 The inverse operation, producing a character string value from xml,
49 uses the function xmlserialize:
50 XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )
52 type can be character, character varying, or text (or an alias for one
53 of those). Again, according to the SQL standard, this is the only way
54 to convert between type xml and character types, but PostgreSQL also
55 allows you to simply cast the value.
57 The INDENT option causes the result to be pretty-printed, while NO
58 INDENT (which is the default) just emits the original input string.
59 Casting to a character type likewise produces the original string.
61 When a character string value is cast to or from type xml without going
62 through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT
63 versus CONTENT is determined by the “XML option” session configuration
64 parameter, which can be set using the standard command:
65 SET XML OPTION { DOCUMENT | CONTENT };
67 or the more PostgreSQL-like syntax
68 SET xmloption TO { DOCUMENT | CONTENT };
70 The default is CONTENT, so all forms of XML data are allowed.
72 8.13.2. Encoding Handling #
74 Care must be taken when dealing with multiple character encodings on
75 the client, server, and in the XML data passed through them. When using
76 the text mode to pass queries to the server and query results to the
77 client (which is the normal mode), PostgreSQL converts all character
78 data passed between the client and the server and vice versa to the
79 character encoding of the respective end; see Section 23.3. This
80 includes string representations of XML values, such as in the above
81 examples. This would ordinarily mean that encoding declarations
82 contained in XML data can become invalid as the character data is
83 converted to other encodings while traveling between client and server,
84 because the embedded encoding declaration is not changed. To cope with
85 this behavior, encoding declarations contained in character strings
86 presented for input to the xml type are ignored, and content is assumed
87 to be in the current server encoding. Consequently, for correct
88 processing, character strings of XML data must be sent from the client
89 in the current client encoding. It is the responsibility of the client
90 to either convert documents to the current client encoding before
91 sending them to the server, or to adjust the client encoding
92 appropriately. On output, values of type xml will not have an encoding
93 declaration, and clients should assume all data is in the current
96 When using binary mode to pass query parameters to the server and query
97 results back to the client, no encoding conversion is performed, so the
98 situation is different. In this case, an encoding declaration in the
99 XML data will be observed, and if it is absent, the data will be
100 assumed to be in UTF-8 (as required by the XML standard; note that
101 PostgreSQL does not support UTF-16). On output, data will have an
102 encoding declaration specifying the client encoding, unless the client
103 encoding is UTF-8, in which case it will be omitted.
105 Needless to say, processing XML data with PostgreSQL will be less
106 error-prone and more efficient if the XML data encoding, client
107 encoding, and server encoding are the same. Since XML data is
108 internally processed in UTF-8, computations will be most efficient if
109 the server encoding is also UTF-8.
113 Some XML-related functions may not work at all on non-ASCII data when
114 the server encoding is not UTF-8. This is known to be an issue for
115 xmltable() and xpath() in particular.
117 8.13.3. Accessing XML Values #
119 The xml data type is unusual in that it does not provide any comparison
120 operators. This is because there is no well-defined and universally
121 useful comparison algorithm for XML data. One consequence of this is
122 that you cannot retrieve rows by comparing an xml column against a
123 search value. XML values should therefore typically be accompanied by a
124 separate key field such as an ID. An alternative solution for comparing
125 XML values is to convert them to character strings first, but note that
126 character string comparison has little to do with a useful XML
129 Since there are no comparison operators for the xml data type, it is
130 not possible to create an index directly on a column of this type. If
131 speedy searches in XML data are desired, possible workarounds include
132 casting the expression to a character string type and indexing that, or
133 indexing an XPath expression. Of course, the actual query would have to
134 be adjusted to search by the indexed expression.
136 The text-search functionality in PostgreSQL can also be used to speed
137 up full-document searches of XML data. The necessary preprocessing
138 support is, however, not yet available in the PostgreSQL distribution.