begriffs open source - ai-pg/blob - full-docs/src/sgml/html/xtypes.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>36.13. User-Defined Types</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="xaggr.html" title="36.12. User-Defined Aggregates" /><link rel="next" href="xoper.html" title="36.14. User-Defined Operators" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">36.13. User-Defined Types</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="xaggr.html" title="36.12. User-Defined Aggregates">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="extend.html" title="Chapter 36. Extending SQL">Up</a></td><th width="60%" align="center">Chapter 36. Extending <acronym class="acronym">SQL</acronym></th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="xoper.html" title="36.14. User-Defined Operators">Next</a></td></tr></table><hr /></div><div class="sect1" id="XTYPES"><div class="titlepage"><div><div><h2 class="title" style="clear: both">36.13. User-Defined Types <a href="#XTYPES" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="xtypes.html#XTYPES-TOAST">36.13.1. TOAST Considerations</a></span></dt></dl></div><a id="id-1.8.3.16.2" class="indexterm"></a><p>
   3    As described in <a class="xref" href="extend-type-system.html" title="36.2. The PostgreSQL Type System">Section 36.2</a>,
   4    <span class="productname">PostgreSQL</span> can be extended to support new
   5    data types.  This section describes how to define new base types,
   6    which are data types defined below the level of the <acronym class="acronym">SQL</acronym>
   7    language.  Creating a new base type requires implementing functions
   8    to operate on the type in a low-level language, usually C.
   9   </p><p>
  10    The examples in this section can be found in
  11    <code class="filename">complex.sql</code> and <code class="filename">complex.c</code>
  12    in the <code class="filename">src/tutorial</code> directory of the source distribution.
  13    See the <code class="filename">README</code> file in that directory for instructions
  14    about running the examples.
  15   </p><p>
  16   <a id="id-1.8.3.16.5.1" class="indexterm"></a>
  17   <a id="id-1.8.3.16.5.2" class="indexterm"></a>
  18   A user-defined type must always have input and output functions.
  19   These functions determine how the type appears in strings (for input
  20   by the user and output to the user) and how the type is organized in
  21   memory.  The input function takes a null-terminated character string
  22   as its argument and returns the internal (in memory) representation
  23   of the type.  The output function takes the internal representation
  24   of the type as argument and returns a null-terminated character
  25   string.  If we want to do anything more with the type than merely
  26   store it, we must provide additional functions to implement whatever
  27   operations we'd like to have for the type.
  28  </p><p>
  29   Suppose we want to define a type <code class="type">complex</code> that represents
  30   complex numbers. A natural way to represent a complex number in
  31   memory would be the following C structure:
  32
  33 </p><pre class="programlisting">
  34 typedef struct Complex {
  35     double      x;
  36     double      y;
  37 } Complex;
  38 </pre><p>
  39
  40   We will need to make this a pass-by-reference type, since it's too
  41   large to fit into a single <code class="type">Datum</code> value.
  42  </p><p>
  43   As the external string representation of the type, we choose a
  44   string of the form <code class="literal">(x,y)</code>.
  45  </p><p>
  46   The input and output functions are usually not hard to write,
  47   especially the output function.  But when defining the external
  48   string representation of the type, remember that you must eventually
  49   write a complete and robust parser for that representation as your
  50   input function.  For instance:
  51
  52 </p><pre class="programlisting">
  53 PG_FUNCTION_INFO_V1(complex_in);
  54
  55 Datum
  56 complex_in(PG_FUNCTION_ARGS)
  57 {
  58     char       *str = PG_GETARG_CSTRING(0);
  59     double      x,
  60                 y;
  61     Complex    *result;
  62
  63     if (sscanf(str, " ( %lf , %lf )", &amp;x, &amp;y) != 2)
  64         ereport(ERROR,
  65                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
  66                  errmsg("invalid input syntax for type %s: \"%s\"",
  67                         "complex", str)));
  68
  69     result = (Complex *) palloc(sizeof(Complex));
  70     result-&gt;x = x;
  71     result-&gt;y = y;
  72     PG_RETURN_POINTER(result);
  73 }
  74
  75 </pre><p>
  76
  77   The output function can simply be:
  78
  79 </p><pre class="programlisting">
  80 PG_FUNCTION_INFO_V1(complex_out);
  81
  82 Datum
  83 complex_out(PG_FUNCTION_ARGS)
  84 {
  85     Complex    *complex = (Complex *) PG_GETARG_POINTER(0);
  86     char       *result;
  87
  88     result = psprintf("(%g,%g)", complex-&gt;x, complex-&gt;y);
  89     PG_RETURN_CSTRING(result);
  90 }
  91
  92 </pre><p>
  93  </p><p>
  94   You should be careful to make the input and output functions inverses of
  95   each other.  If you do not, you will have severe problems when you
  96   need to dump your data into a file and then read it back in.  This
  97   is a particularly common problem when floating-point numbers are
  98   involved.
  99  </p><p>
 100   Optionally, a user-defined type can provide binary input and output
 101   routines.  Binary I/O is normally faster but less portable than textual
 102   I/O.  As with textual I/O, it is up to you to define exactly what the
 103   external binary representation is.  Most of the built-in data types
 104   try to provide a machine-independent binary representation.  For
 105   <code class="type">complex</code>, we will piggy-back on the binary I/O converters
 106   for type <code class="type">float8</code>:
 107
 108 </p><pre class="programlisting">
 109 PG_FUNCTION_INFO_V1(complex_recv);
 110
 111 Datum
 112 complex_recv(PG_FUNCTION_ARGS)
 113 {
 114     StringInfo  buf = (StringInfo) PG_GETARG_POINTER(0);
 115     Complex    *result;
 116
 117     result = (Complex *) palloc(sizeof(Complex));
 118     result-&gt;x = pq_getmsgfloat8(buf);
 119     result-&gt;y = pq_getmsgfloat8(buf);
 120     PG_RETURN_POINTER(result);
 121 }
 122
 123 PG_FUNCTION_INFO_V1(complex_send);
 124
 125 Datum
 126 complex_send(PG_FUNCTION_ARGS)
 127 {
 128     Complex    *complex = (Complex *) PG_GETARG_POINTER(0);
 129     StringInfoData buf;
 130
 131     pq_begintypsend(&amp;buf);
 132     pq_sendfloat8(&amp;buf, complex-&gt;x);
 133     pq_sendfloat8(&amp;buf, complex-&gt;y);
 134     PG_RETURN_BYTEA_P(pq_endtypsend(&amp;buf));
 135 }
 136
 137 </pre><p>
 138  </p><p>
 139   Once we have written the I/O functions and compiled them into a shared
 140   library, we can define the <code class="type">complex</code> type in SQL.
 141   First we declare it as a shell type:
 142
 143 </p><pre class="programlisting">
 144 CREATE TYPE complex;
 145 </pre><p>
 146
 147   This serves as a placeholder that allows us to reference the type while
 148   defining its I/O functions.  Now we can define the I/O functions:
 149
 150 </p><pre class="programlisting">
 151 CREATE FUNCTION complex_in(cstring)
 152     RETURNS complex
 153     AS '<em class="replaceable"><code>filename</code></em>'
 154     LANGUAGE C IMMUTABLE STRICT;
 155
 156 CREATE FUNCTION complex_out(complex)
 157     RETURNS cstring
 158     AS '<em class="replaceable"><code>filename</code></em>'
 159     LANGUAGE C IMMUTABLE STRICT;
 160
 161 CREATE FUNCTION complex_recv(internal)
 162    RETURNS complex
 163    AS '<em class="replaceable"><code>filename</code></em>'
 164    LANGUAGE C IMMUTABLE STRICT;
 165
 166 CREATE FUNCTION complex_send(complex)
 167    RETURNS bytea
 168    AS '<em class="replaceable"><code>filename</code></em>'
 169    LANGUAGE C IMMUTABLE STRICT;
 170 </pre><p>
 171  </p><p>
 172   Finally, we can provide the full definition of the data type:
 173 </p><pre class="programlisting">
 174 CREATE TYPE complex (
 175    internallength = 16,
 176    input = complex_in,
 177    output = complex_out,
 178    receive = complex_recv,
 179    send = complex_send,
 180    alignment = double
 181 );
 182 </pre><p>
 183  </p><p>
 184   <a id="id-1.8.3.16.13.1" class="indexterm"></a>
 185   When you define a new base type,
 186   <span class="productname">PostgreSQL</span> automatically provides support
 187   for arrays of that type.  The array type typically
 188   has the same name as the base type with the underscore character
 189   (<code class="literal">_</code>) prepended.
 190  </p><p>
 191   Once the data type exists, we can declare additional functions to
 192   provide useful operations on the data type.  Operators can then be
 193   defined atop the functions, and if needed, operator classes can be
 194   created to support indexing of the data type.  These additional
 195   layers are discussed in following sections.
 196  </p><p>
 197   If the internal representation of the data type is variable-length, the
 198   internal representation must follow the standard layout for variable-length
 199   data: the first four bytes must be a <code class="type">char[4]</code> field which is
 200   never accessed directly (customarily named <code class="structfield">vl_len_</code>). You
 201   must use the <code class="function">SET_VARSIZE()</code> macro to store the total
 202   size of the datum (including the length field itself) in this field
 203   and <code class="function">VARSIZE()</code> to retrieve it.  (These macros exist
 204   because the length field may be encoded depending on platform.)
 205  </p><p>
 206   For further details see the description of the
 207   <a class="xref" href="sql-createtype.html" title="CREATE TYPE"><span class="refentrytitle">CREATE TYPE</span></a> command.
 208  </p><div class="sect2" id="XTYPES-TOAST"><div class="titlepage"><div><div><h3 class="title">36.13.1. TOAST Considerations <a href="#XTYPES-TOAST" class="id_link">#</a></h3></div></div></div><a id="id-1.8.3.16.17.2" class="indexterm"></a><p>
 209   If the values of your data type vary in size (in internal form), it's
 210   usually desirable to make the data type <acronym class="acronym">TOAST</acronym>-able (see <a class="xref" href="storage-toast.html" title="66.2. TOAST">Section 66.2</a>). You should do this even if the values are always
 211   too small to be compressed or stored externally, because
 212   <acronym class="acronym">TOAST</acronym> can save space on small data too, by reducing header
 213   overhead.
 214  </p><p>
 215   To support <acronym class="acronym">TOAST</acronym> storage, the C functions operating on the data
 216   type must always be careful to unpack any toasted values they are handed
 217   by using <code class="function">PG_DETOAST_DATUM</code>.  (This detail is customarily hidden
 218   by defining type-specific <code class="function">GETARG_DATATYPE_P</code> macros.)
 219   Then, when running the <code class="command">CREATE TYPE</code> command, specify the
 220   internal length as <code class="literal">variable</code> and select some appropriate storage
 221   option other than <code class="literal">plain</code>.
 222  </p><p>
 223   If data alignment is unimportant (either just for a specific function or
 224   because the data type specifies byte alignment anyway) then it's possible
 225   to avoid some of the overhead of <code class="function">PG_DETOAST_DATUM</code>. You can use
 226   <code class="function">PG_DETOAST_DATUM_PACKED</code> instead (customarily hidden by
 227   defining a <code class="function">GETARG_DATATYPE_PP</code> macro) and using the macros
 228   <code class="function">VARSIZE_ANY_EXHDR</code> and <code class="function">VARDATA_ANY</code> to access
 229   a potentially-packed datum.
 230   Again, the data returned by these macros is not aligned even if the data
 231   type definition specifies an alignment. If the alignment is important you
 232   must go through the regular <code class="function">PG_DETOAST_DATUM</code> interface.
 233  </p><div class="note"><h3 class="title">Note</h3><p>
 234    Older code frequently declares <code class="structfield">vl_len_</code> as an
 235    <code class="type">int32</code> field instead of <code class="type">char[4]</code>.  This is OK as long as
 236    the struct definition has other fields that have at least <code class="type">int32</code>
 237    alignment.  But it is dangerous to use such a struct definition when
 238    working with a potentially unaligned datum; the compiler may take it as
 239    license to assume the datum actually is aligned, leading to core dumps on
 240    architectures that are strict about alignment.
 241   </p></div><p>
 242   Another feature that's enabled by <acronym class="acronym">TOAST</acronym> support is the
 243   possibility of having an <em class="firstterm">expanded</em> in-memory data
 244   representation that is more convenient to work with than the format that
 245   is stored on disk.  The regular or <span class="quote">“<span class="quote">flat</span>”</span> varlena storage format
 246   is ultimately just a blob of bytes; it cannot for example contain
 247   pointers, since it may get copied to other locations in memory.
 248   For complex data types, the flat format may be quite expensive to work
 249   with, so <span class="productname">PostgreSQL</span> provides a way to <span class="quote">“<span class="quote">expand</span>”</span>
 250   the flat format into a representation that is more suited to computation,
 251   and then pass that format in-memory between functions of the data type.
 252  </p><p>
 253   To use expanded storage, a data type must define an expanded format that
 254   follows the rules given in <code class="filename">src/include/utils/expandeddatum.h</code>,
 255   and provide functions to <span class="quote">“<span class="quote">expand</span>”</span> a flat varlena value into
 256   expanded format and <span class="quote">“<span class="quote">flatten</span>”</span> the expanded format back to the
 257   regular varlena representation.  Then ensure that all C functions for
 258   the data type can accept either representation, possibly by converting
 259   one into the other immediately upon receipt.  This does not require fixing
 260   all existing functions for the data type at once, because the standard
 261   <code class="function">PG_DETOAST_DATUM</code> macro is defined to convert expanded inputs
 262   into regular flat format.  Therefore, existing functions that work with
 263   the flat varlena format will continue to work, though slightly
 264   inefficiently, with expanded inputs; they need not be converted until and
 265   unless better performance is important.
 266  </p><p>
 267   C functions that know how to work with an expanded representation
 268   typically fall into two categories: those that can only handle expanded
 269   format, and those that can handle either expanded or flat varlena inputs.
 270   The former are easier to write but may be less efficient overall, because
 271   converting a flat input to expanded form for use by a single function may
 272   cost more than is saved by operating on the expanded format.
 273   When only expanded format need be handled, conversion of flat inputs to
 274   expanded form can be hidden inside an argument-fetching macro, so that
 275   the function appears no more complex than one working with traditional
 276   varlena input.
 277   To handle both types of input, write an argument-fetching function that
 278   will detoast external, short-header, and compressed varlena inputs, but
 279   not expanded inputs.  Such a function can be defined as returning a
 280   pointer to a union of the flat varlena format and the expanded format.
 281   Callers can use the <code class="function">VARATT_IS_EXPANDED_HEADER()</code> macro to
 282   determine which format they received.
 283  </p><p>
 284   The <acronym class="acronym">TOAST</acronym> infrastructure not only allows regular varlena
 285   values to be distinguished from expanded values, but also
 286   distinguishes <span class="quote">“<span class="quote">read-write</span>”</span> and <span class="quote">“<span class="quote">read-only</span>”</span> pointers to
 287   expanded values.  C functions that only need to examine an expanded
 288   value, or will only change it in safe and non-semantically-visible ways,
 289   need not care which type of pointer they receive.  C functions that
 290   produce a modified version of an input value are allowed to modify an
 291   expanded input value in-place if they receive a read-write pointer, but
 292   must not modify the input if they receive a read-only pointer; in that
 293   case they have to copy the value first, producing a new value to modify.
 294   A C function that has constructed a new expanded value should always
 295   return a read-write pointer to it.  Also, a C function that is modifying
 296   a read-write expanded value in-place should take care to leave the value
 297   in a sane state if it fails partway through.
 298  </p><p>
 299   For examples of working with expanded values, see the standard array
 300   infrastructure, particularly
 301   <code class="filename">src/backend/utils/adt/array_expanded.c</code>.
 302  </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="xaggr.html" title="36.12. User-Defined Aggregates">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="extend.html" title="Chapter 36. Extending SQL">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="xoper.html" title="36.14. User-Defined Operators">Next</a></td></tr><tr><td width="40%" align="left" valign="top">36.12. User-Defined Aggregates </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 36.14. User-Defined Operators</td></tr></table></div></body></html>