2 F.50. xml2 — XPath querying and XSLT functionality #
4 F.50.1. Deprecation Notice
5 F.50.2. Description of Functions
10 The xml2 module provides XPath querying and XSLT functionality.
12 F.50.1. Deprecation Notice #
14 From PostgreSQL 8.3 on, there is XML-related functionality based on the
15 SQL/XML standard in the core server. That functionality covers XML
16 syntax checking and XPath queries, which is what this module does, and
17 more, but the API is not at all compatible. It is planned that this
18 module will be removed in a future version of PostgreSQL in favor of
19 the newer standard API, so you are encouraged to try converting your
20 applications. If you find that some of the functionality of this module
21 is not available in an adequate form with the newer API, please explain
22 your issue to <pgsql-hackers@lists.postgresql.org> so that the
23 deficiency can be addressed.
25 F.50.2. Description of Functions #
27 Table F.37 shows the functions provided by this module. These functions
28 provide straightforward XML parsing and XPath queries.
30 Table F.37. xml2 Functions
36 xml_valid ( document text ) → boolean
38 Parses the given document and returns true if the document is
39 well-formed XML. (Note: this is an alias for the standard PostgreSQL
40 function xml_is_well_formed(). The name xml_valid() is technically
41 incorrect since validity and well-formedness have different meanings in
44 xpath_string ( document text, query text ) → text
46 Evaluates the XPath query on the supplied document, and casts the
49 xpath_number ( document text, query text ) → real
51 Evaluates the XPath query on the supplied document, and casts the
54 xpath_bool ( document text, query text ) → boolean
56 Evaluates the XPath query on the supplied document, and casts the
59 xpath_nodeset ( document text, query text, toptag text, itemtag text )
62 Evaluates the query on the document and wraps the result in XML tags.
63 If the result is multivalued, the output will look like:
65 <itemtag>Value 1 which could be an XML fragment</itemtag>
66 <itemtag>Value 2....</itemtag>
69 If either toptag or itemtag is an empty string, the relevant tag is
72 xpath_nodeset ( document text, query text, itemtag text ) → text
74 Like xpath_nodeset(document, query, toptag, itemtag) but result omits
77 xpath_nodeset ( document text, query text ) → text
79 Like xpath_nodeset(document, query, toptag, itemtag) but result omits
82 xpath_list ( document text, query text, separator text ) → text
84 Evaluates the query on the document and returns multiple values
85 separated by the specified separator, for example Value 1,Value 2,Value
88 xpath_list ( document text, query text ) → text
90 This is a wrapper for the above function that uses , as the separator.
94 xpath_table(text key, text document, text relation, text xpaths, text criteria)
97 xpath_table is a table function that evaluates a set of XPath queries
98 on each of a set of documents and returns the results as a table. The
99 primary key field from the original document table is returned as the
100 first column of the result so that the result set can readily be used
101 in joins. The parameters are described in Table F.38.
103 Table F.38. xpath_table Parameters
104 Parameter Description
107 the name of the “key” field — this is just a field to be used as the
108 first column of the output table, i.e., it identifies the record from
109 which each output row came (see note below about multiple values)
112 the name of the field containing the XML document
115 the name of the table or view containing the documents
118 one or more XPath expressions, separated by |
121 the contents of the WHERE clause. This cannot be omitted, so use true
122 or 1=1 if you want to process all the rows in the relation
124 These parameters (except the XPath strings) are just substituted into a
125 plain SQL SELECT statement, so you have some flexibility — the
128 SELECT <key>, <document> FROM <relation> WHERE <criteria>
130 so those parameters can be anything valid in those particular
131 locations. The result from this SELECT needs to return exactly two
132 columns (which it will unless you try to list multiple fields for key
133 or document). Beware that this simplistic approach requires that you
134 validate any user-supplied values to avoid SQL injection attacks.
136 The function has to be used in a FROM expression, with an AS clause to
137 specify the output columns; for example
139 xpath_table('article_id',
142 '/article/author|/article/pages|/article/title',
143 'date_entered > ''2003-01-01'' ')
144 AS t(article_id integer, author text, page_count integer, title text);
146 The AS clause defines the names and types of the columns in the output
147 table. The first is the “key” field and the rest correspond to the
148 XPath queries. If there are more XPath queries than result columns, the
149 extra queries will be ignored. If there are more result columns than
150 XPath queries, the extra columns will be NULL.
152 Notice that this example defines the page_count result column as an
153 integer. The function deals internally with string representations, so
154 when you say you want an integer in the output, it will take the string
155 representation of the XPath result and use PostgreSQL input functions
156 to transform it into an integer (or whatever type the AS clause
157 requests). An error will result if it can't do this — for example if
158 the result is empty — so you may wish to just stick to text as the
159 column type if you think your data has any problems.
161 The calling SELECT statement doesn't necessarily have to be just SELECT
162 * — it can reference the output columns by name or join them to other
163 tables. The function produces a virtual table with which you can
164 perform any operation you wish (e.g., aggregation, joining, sorting
165 etc.). So we could also have:
166 SELECT t.title, p.fullname, p.email
167 FROM xpath_table('article_id', 'article_xml', 'articles',
168 '/article/title|/article/author/@id',
169 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20''
171 AS t(article_id integer, title text, author_id integer),
173 WHERE t.author_id = p.person_id;
175 as a more complicated example. Of course, you could wrap all of this in
176 a view for convenience.
178 F.50.3.1. Multivalued Results #
180 The xpath_table function assumes that the results of each XPath query
181 might be multivalued, so the number of rows returned by the function
182 may not be the same as the number of input documents. The first row
183 returned contains the first result from each query, the second row the
184 second result from each query. If one of the queries has fewer values
185 than the others, null values will be returned instead.
187 In some cases, a user will know that a given XPath query will return
188 only a single result (perhaps a unique document identifier) — if used
189 alongside an XPath query returning multiple results, the single-valued
190 result will appear only on the first row of the result. The solution to
191 this is to use the key field as part of a join against a simpler XPath
192 query. As an example:
198 INSERT INTO test VALUES (1, '<doc num="C1">
199 <line num="L1"><a>1</a><b>2</b><c>3</c></line>
200 <line num="L2"><a>11</a><b>22</b><c>33</c></line>
203 INSERT INTO test VALUES (2, '<doc num="C2">
204 <line num="L1"><a>111</a><b>222</b><c>333</c></line>
205 <line num="L2"><a>111</a><b>222</b><c>333</c></line>
209 xpath_table('id','xml','test',
210 '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
212 AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int, va
214 WHERE id = 1 ORDER BY doc_num, line_num
216 id | doc_num | line_num | val1 | val2 | val3
217 ----+---------+----------+------+------+------
218 1 | C1 | L1 | 1 | 2 | 3
219 1 | | L2 | 11 | 22 | 33
221 To get doc_num on every line, the solution is to use two invocations of
222 xpath_table and join the results:
223 SELECT t.*,i.doc_num FROM
224 xpath_table('id', 'xml', 'test',
225 '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
227 AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int),
228 xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
229 AS i(id int, doc_num varchar(10))
230 WHERE i.id=t.id AND i.id=1
231 ORDER BY doc_num, line_num;
233 id | line_num | val1 | val2 | val3 | doc_num
234 ----+----------+------+------+------+---------
235 1 | L1 | 1 | 2 | 3 | C1
236 1 | L2 | 11 | 22 | 33 | C1
239 F.50.4. XSLT Functions #
241 The following functions are available if libxslt is installed:
243 F.50.4.1. xslt_process #
245 xslt_process(text document, text stylesheet, text paramlist) returns text
247 This function applies the XSL stylesheet to the document and returns
248 the transformed result. The paramlist is a list of parameter
249 assignments to be used in the transformation, specified in the form
250 a=1,b=2. Note that the parameter parsing is very simple-minded:
251 parameter values cannot contain commas!
253 There is also a two-parameter version of xslt_process which does not
254 pass any parameters to the transformation.
258 John Gray <jgray@azuli.co.uk>
260 Development of this module was sponsored by Torchbox Ltd.
261 (www.torchbox.com). It has the same BSD license as PostgreSQL.