2 F.39. seg — a datatype for line segments or floating point intervals #
11 This module implements a data type seg for representing line segments,
12 or floating point intervals. seg can represent uncertainty in the
13 interval endpoints, making it especially useful for representing
14 laboratory measurements.
16 This module is considered “trusted”, that is, it can be installed by
17 non-superusers who have CREATE privilege on the current database.
21 The geometry of measurements is usually more complex than that of a
22 point in a numeric continuum. A measurement is usually a segment of
23 that continuum with somewhat fuzzy limits. The measurements come out as
24 intervals because of uncertainty and randomness, as well as because the
25 value being measured may naturally be an interval indicating some
26 condition, such as the temperature range of stability of a protein.
28 Using just common sense, it appears more convenient to store such data
29 as intervals, rather than pairs of numbers. In practice, it even turns
30 out more efficient in most applications.
32 Further along the line of common sense, the fuzziness of the limits
33 suggests that the use of traditional numeric data types leads to a
34 certain loss of information. Consider this: your instrument reads 6.50,
35 and you input this reading into the database. What do you get when you
37 test=> select 6.50 :: float8 as "pH";
43 In the world of measurements, 6.50 is not the same as 6.5. It may
44 sometimes be critically different. The experimenters usually write down
45 (and publish) the digits they trust. 6.50 is actually a fuzzy interval
46 contained within a bigger and even fuzzier interval, 6.5, with their
47 center points being (probably) the only common feature they share. We
48 definitely do not want such different data items to appear the same.
50 Conclusion? It is nice to have a special data type that can record the
51 limits of an interval with arbitrarily variable precision. Variable in
52 the sense that each data element records its own precision.
55 test=> select '6.25 .. 6.50'::seg as "pH";
63 The external representation of an interval is formed using one or two
64 floating-point numbers joined by the range operator (.. or ...).
65 Alternatively, it can be specified as a center point plus or minus a
66 deviation. Optional certainty indicators (<, > or ~) can be stored as
67 well. (Certainty indicators are ignored by all the built-in operators,
68 however.) Table F.29 gives an overview of allowed representations;
69 Table F.30 shows some examples.
71 In Table F.29, x, y, and delta denote floating-point numbers. x and y,
72 but not delta, can be preceded by a certainty indicator.
74 Table F.29. seg External Representations
75 x Single value (zero-length interval)
76 x .. y Interval from x to y
77 x (+-) delta Interval from x - delta to x + delta
78 x .. Open interval with lower bound x
79 .. x Open interval with upper bound x
81 Table F.30. Examples of Valid seg Input
82 5.0 Creates a zero-length segment (a point, if you will)
83 ~5.0 Creates a zero-length segment and records ~ in the data. ~ is
84 ignored by seg operations, but is preserved as a comment.
85 <5.0 Creates a point at 5.0. < is ignored but is preserved as a
87 >5.0 Creates a point at 5.0. > is ignored but is preserved as a
89 5(+-)0.3 Creates an interval 4.7 .. 5.3. Note that the (+-) notation
91 50 .. Everything that is greater than or equal to 50
92 .. 0 Everything that is less than or equal to 0
93 1.5e-2 .. 2E-2 Creates an interval 0.015 .. 0.02
94 1 ... 2 The same as 1...2, or 1 .. 2, or 1..2 (spaces around the range
97 Because the ... operator is widely used in data sources, it is allowed
98 as an alternative spelling of the .. operator. Unfortunately, this
99 creates a parsing ambiguity: it is not clear whether the upper bound in
100 0...23 is meant to be 23 or 0.23. This is resolved by requiring at
101 least one digit before the decimal point in all numbers in seg input.
103 As a sanity check, seg rejects intervals with the lower bound greater
104 than the upper, for example 5 .. 2.
108 seg values are stored internally as pairs of 32-bit floating point
109 numbers. This means that numbers with more than 7 significant digits
112 Numbers with 7 or fewer significant digits retain their original
113 precision. That is, if your query returns 0.00, you will be sure that
114 the trailing zeroes are not the artifacts of formatting: they reflect
115 the precision of the original data. The number of leading zeroes does
116 not affect precision: the value 0.0067 is considered to have just 2
121 The seg module includes a GiST index operator class for seg values. The
122 operators supported by the GiST operator class are shown in Table F.31.
124 Table F.31. Seg GiST Operators
132 Is the first seg entirely to the left of the second? [a, b] << [c, d]
137 Is the first seg entirely to the right of the second? [a, b] >> [c, d]
142 Does the first seg not extend to the right of the second? [a, b] &< [c,
143 d] is true if b <= d.
147 Does the first seg not extend to the left of the second? [a, b] &> [c,
148 d] is true if a >= c.
152 Are the two segs equal?
156 Do the two segs overlap?
160 Does the first seg contain the second?
164 Is the first seg contained in the second?
166 In addition to the above operators, the usual comparison operators
167 shown in Table 9.1 are available for type seg. These operators first
168 compare (a) to (c), and if these are equal, compare (b) to (d). That
169 results in reasonably good sorting in most cases, which is useful if
170 you want to use ORDER BY with this type.
174 For examples of usage, see the regression test sql/seg.sql.
176 The mechanism that converts (+-) to regular ranges isn't completely
177 accurate in determining the number of significant digits for the
178 boundaries. For example, it adds an extra digit to the lower boundary
179 if the resulting interval includes a power of ten:
180 postgres=> select '10(+-)1'::seg as seg;
183 9.0 .. 11 -- should be: 9 .. 11
185 The performance of an R-tree index can largely depend on the initial
186 order of input values. It may be very helpful to sort the input table
187 on the seg column; see the script sort-segments.pl for an example.
191 Original author: Gene Selkov, Jr. <selkovjr@mcs.anl.gov>, Mathematics
192 and Computer Science Division, Argonne National Laboratory.
194 My thanks are primarily to Prof. Joe Hellerstein
195 (https://dsf.berkeley.edu/jmh/) for elucidating the gist of the GiST
196 (http://gist.cs.berkeley.edu/). I am also grateful to all Postgres
197 developers, present and past, for enabling myself to create my own
198 world and live undisturbed in it. And I would like to acknowledge my
199 gratitude to Argonne Lab and to the U.S. Department of Energy for the
200 years of faithful support of my database research.