2 68.2. System Catalog Initial Data #
4 68.2.1. Data File Format
6 68.2.3. OID Reference Lookup
7 68.2.4. Automatic Creation of Array Types
8 68.2.5. Recipes for Editing Data Files
10 Each catalog that has any manually-created initial data (some do not)
11 has a corresponding .dat file that contains its initial data in an
14 68.2.1. Data File Format #
16 Each .dat file contains Perl data structure literals that are simply
17 eval'd to produce an in-memory data structure consisting of an array of
18 hash references, one per catalog row. A slightly modified excerpt from
19 pg_database.dat will demonstrate the key features:
22 # A comment could appear here.
23 { oid => '1', oid_symbol => 'Template1DbOid',
24 descr => 'database\'s default template',
25 datname => 'template1', encoding => 'ENCODING',
26 datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
27 datallowconn => 't', dathasloginevt => 'f', datconnlimit => '-1', datfrozenxid
29 datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
30 datctype => 'LC_CTYPE', datlocale => 'DATLOCALE', datacl => '_null_' },
35 * The overall file layout is: open square bracket, one or more sets
36 of curly braces each of which represents a catalog row, close
37 square bracket. Write a comma after each closing curly brace.
38 * Within each catalog row, write comma-separated key => value pairs.
39 The allowed keys are the names of the catalog's columns, plus the
40 metadata keys oid, oid_symbol, array_type_oid, and descr. (The use
41 of oid and oid_symbol is described in Section 68.2.2 below, while
42 array_type_oid is described in Section 68.2.4. descr supplies a
43 description string for the object, which will be inserted into
44 pg_description or pg_shdescription as appropriate.) While the
45 metadata keys are optional, the catalog's defined columns must all
46 be provided, except when the catalog's .h file specifies a default
47 value for the column. (In the example above, the datdba field has
48 been omitted because pg_database.h supplies a suitable default
50 * All values must be single-quoted. Escape single quotes used within
51 a value with a backslash. Backslashes meant as data can, but need
52 not, be doubled; this follows Perl's rules for simple quoted
53 literals. Note that backslashes appearing as data will be treated
54 as escapes by the bootstrap scanner, according to the same rules as
55 for escape string constants (see Section 4.1.2.2); for example \t
56 converts to a tab character. If you actually want a backslash in
57 the final value, you will need to write four of them: Perl strips
58 two, leaving \\ for the bootstrap scanner to see.
59 * Null values are represented by _null_. (Note that there is no way
60 to create a value that is just that string.)
61 * Comments are preceded by #, and must be on their own lines.
62 * Field values that are OIDs of other catalog entries should be
63 represented by symbolic names rather than actual numeric OIDs. (In
64 the example above, dattablespace contains such a reference.) This
65 is described in Section 68.2.3 below.
66 * Since hashes are unordered data structures, field order and line
67 layout aren't semantically significant. However, to maintain a
68 consistent appearance, we set a few rules that are applied by the
69 formatting script reformat_dat_file.pl:
70 + Within each pair of curly braces, the metadata fields oid,
71 oid_symbol, array_type_oid, and descr (if present) come first,
72 in that order, then the catalog's own fields appear in their
74 + Newlines are inserted between fields as needed to limit line
75 length to 80 characters, if possible. A newline is also
76 inserted between the metadata fields and the regular fields.
77 + If the catalog's .h file specifies a default value for a
78 column, and a data entry has that same value,
79 reformat_dat_file.pl will omit it from the data file. This
80 keeps the data representation compact.
81 + reformat_dat_file.pl preserves blank lines and comment lines
83 It's recommended to run reformat_dat_file.pl before submitting
84 catalog data patches. For convenience, you can simply change to
85 src/include/catalog/ and run make reformat-dat-files.
86 * If you want to add a new method of making the data representation
87 smaller, you must implement it in reformat_dat_file.pl and also
88 teach Catalog::ParseData() how to expand the data back into the
91 68.2.2. OID Assignment #
93 A catalog row appearing in the initial data can be given a
94 manually-assigned OID by writing an oid => nnnn metadata field.
95 Furthermore, if an OID is assigned, a C macro for that OID can be
96 created by writing an oid_symbol => name metadata field.
98 Pre-loaded catalog rows must have preassigned OIDs if there are OID
99 references to them in other pre-loaded rows. A preassigned OID is also
100 needed if the row's OID must be referenced from C code. If neither case
101 applies, the oid metadata field can be omitted, in which case the
102 bootstrap code assigns an OID automatically. In practice we usually
103 preassign OIDs for all or none of the pre-loaded rows in a given
104 catalog, even if only some of them are actually cross-referenced.
106 Writing the actual numeric value of any OID in C code is considered
107 very bad form; always use a macro, instead. Direct references to
108 pg_proc OIDs are common enough that there's a special mechanism to
109 create the necessary macros automatically; see
110 src/backend/utils/Gen_fmgrtab.pl. Similarly — but, for historical
111 reasons, not done the same way — there's an automatic method for
112 creating macros for pg_type OIDs. oid_symbol entries are therefore not
113 necessary in those two catalogs. Likewise, macros for the pg_class OIDs
114 of system catalogs and indexes are set up automatically. For all other
115 system catalogs, you have to manually specify any macros you need via
118 To find an available OID for a new pre-loaded row, run the script
119 src/include/catalog/unused_oids. It prints inclusive ranges of unused
120 OIDs (e.g., the output line 45-900 means OIDs 45 through 900 have not
121 been allocated yet). Currently, OIDs 1–9999 are reserved for manual
122 assignment; the unused_oids script simply looks through the catalog
123 headers and .dat files to see which ones do not appear. You can also
124 use the duplicate_oids script to check for mistakes. (genbki.pl will
125 assign OIDs for any rows that didn't get one hand-assigned to them, and
126 it will also detect duplicate OIDs at compile time.)
128 When choosing OIDs for a patch that is not expected to be committed
129 immediately, best practice is to use a group of more-or-less
130 consecutive OIDs starting with some random choice in the range
131 8000—9999. This minimizes the risk of OID collisions with other patches
132 being developed concurrently. To keep the 8000—9999 range free for
133 development purposes, after a patch has been committed to the master
134 git repository its OIDs should be renumbered into available space below
135 that range. Typically, this will be done near the end of each
136 development cycle, moving all OIDs consumed by patches committed in
137 that cycle at the same time. The script renumber_oids.pl can be used
138 for this purpose. If an uncommitted patch is found to have OID
139 conflicts with some recently-committed patch, renumber_oids.pl may also
140 be useful for recovering from that situation.
142 Because of this convention of possibly renumbering OIDs assigned by
143 patches, the OIDs assigned by a patch should not be considered stable
144 until the patch has been included in an official release. We do not
145 change manually-assigned object OIDs once released, however, as that
146 would create assorted compatibility problems.
148 If genbki.pl needs to assign an OID to a catalog entry that does not
149 have a manually-assigned OID, it will use a value in the range
150 10000—11999. The server's OID counter is set to 10000 at the start of a
151 bootstrap run, so that any objects created on-the-fly during bootstrap
152 processing also receive OIDs in this range. (The usual OID assignment
153 mechanism takes care of preventing any conflicts.)
155 Objects with OIDs below FirstUnpinnedObjectId (12000) are considered
156 “pinned”, preventing them from being deleted. (There are a small number
157 of exceptions, which are hard-wired into IsPinnedObject().) initdb
158 forces the OID counter up to FirstUnpinnedObjectId as soon as it's
159 ready to create unpinned objects. Thus objects created during the later
160 phases of initdb, such as objects created while running the
161 information_schema.sql script, will not be pinned, while all objects
162 known to genbki.pl will be.
164 OIDs assigned during normal database operation are constrained to be
165 16384 or higher. This ensures that the range 10000—16383 is free for
166 OIDs assigned automatically by genbki.pl or during initdb. These
167 automatically-assigned OIDs are not considered stable, and may change
168 from one installation to another.
170 68.2.3. OID Reference Lookup #
172 In principle, cross-references from one initial catalog row to another
173 could be written just by writing the preassigned OID of the referenced
174 row in the referencing field. However, that is against project policy,
175 because it is error-prone, hard to read, and subject to breakage if a
176 newly-assigned OID is renumbered. Therefore genbki.pl provides
177 mechanisms to write symbolic references instead. The rules are as
179 * Use of symbolic references is enabled in a particular catalog
180 column by attaching BKI_LOOKUP(lookuprule) to the column's
181 definition, where lookuprule is the name of the referenced catalog,
182 e.g., pg_proc. BKI_LOOKUP can be attached to columns of type Oid,
183 regproc, oidvector, or Oid[]; in the latter two cases it implies
184 performing a lookup on each element of the array.
185 * It's also permissible to attach BKI_LOOKUP(encoding) to integer
186 columns to reference character set encodings, which are not
187 currently represented as catalog OIDs, but have a set of values
189 * In some catalog columns, it's allowed for entries to be zero
190 instead of a valid reference. If this is allowed, write
191 BKI_LOOKUP_OPT instead of BKI_LOOKUP. Then you can write 0 for an
192 entry. (If the column is declared regproc, you can optionally write
193 - instead of 0.) Except for this special case, all entries in a
194 BKI_LOOKUP column must be symbolic references. genbki.pl will warn
195 about unrecognized names.
196 * Most kinds of catalog objects are simply referenced by their names.
197 Note that type names must exactly match the referenced pg_type
198 entry's typname; you do not get to use any aliases such as integer
200 * A function can be represented by its proname, if that is unique
201 among the pg_proc.dat entries (this works like regproc input).
202 Otherwise, write it as proname(argtypename,argtypename,...), like
203 regprocedure. The argument type names must be spelled exactly as
204 they are in the pg_proc.dat entry's proargtypes field. Do not
206 * Operators are represented by oprname(lefttype,righttype), writing
207 the type names exactly as they appear in the pg_operator.dat
208 entry's oprleft and oprright fields. (Write 0 for the omitted
209 operand of a unary operator.)
210 * The names of opclasses and opfamilies are only unique within an
211 access method, so they are represented by
212 access_method_name/object_name.
213 * In none of these cases is there any provision for
214 schema-qualification; all objects created during bootstrap are
215 expected to be in the pg_catalog schema.
217 genbki.pl resolves all symbolic references while it runs, and puts
218 simple numeric OIDs into the emitted BKI file. There is therefore no
219 need for the bootstrap backend to deal with symbolic references.
221 It's desirable to mark OID reference columns with BKI_LOOKUP or
222 BKI_LOOKUP_OPT even if the catalog has no initial data that requires
223 lookup. This allows genbki.pl to record the foreign key relationships
224 that exist in the system catalogs. That information is used in the
225 regression tests to check for incorrect entries. See also the macros
226 DECLARE_FOREIGN_KEY, DECLARE_FOREIGN_KEY_OPT,
227 DECLARE_ARRAY_FOREIGN_KEY, and DECLARE_ARRAY_FOREIGN_KEY_OPT, which are
228 used to declare foreign key relationships that are too complex for
229 BKI_LOOKUP (typically, multi-column foreign keys).
231 68.2.4. Automatic Creation of Array Types #
233 Most scalar data types should have a corresponding array type (that is,
234 a standard varlena array type whose element type is the scalar type,
235 and which is referenced by the typarray field of the scalar type's
236 pg_type entry). genbki.pl is able to generate the pg_type entry for the
237 array type automatically in most cases.
239 To use this facility, just write an array_type_oid => nnnn metadata
240 field in the scalar type's pg_type entry, specifying the OID to use for
241 the array type. You may then omit the typarray field, since it will be
242 filled automatically with that OID.
244 The generated array type's name is the scalar type's name with an
245 underscore prepended. The array entry's other fields are filled from
246 BKI_ARRAY_DEFAULT(value) annotations in pg_type.h, or if there isn't
247 one, copied from the scalar type. (There's also a special case for
248 typalign.) Then the typelem and typarray fields of the two entries are
249 set to cross-reference each other.
251 68.2.5. Recipes for Editing Data Files #
253 Here are some suggestions about the easiest ways to perform common
254 tasks when updating catalog data files.
256 Add a new column with a default to a catalog: Add the column to the
257 header file with a BKI_DEFAULT(value) annotation. The data file need
258 only be adjusted by adding the field in existing rows where a
259 non-default value is needed.
261 Add a default value to an existing column that doesn't have one: Add a
262 BKI_DEFAULT annotation to the header file, then run make
263 reformat-dat-files to remove now-redundant field entries.
265 Remove a column, whether it has a default or not: Remove the column
266 from the header, then run make reformat-dat-files to remove now-useless
269 Change or remove an existing default value: You cannot simply change
270 the header file, since that will cause the current data to be
271 interpreted incorrectly. First run make expand-dat-files to rewrite the
272 data files with all default values inserted explicitly, then change or
273 remove the BKI_DEFAULT annotation, then run make reformat-dat-files to
274 remove superfluous fields again.
276 Ad-hoc bulk editing: reformat_dat_file.pl can be adapted to perform
277 many kinds of bulk changes. Look for its block comments showing where
278 one-off code can be inserted. In the following example, we are going to
279 consolidate two Boolean fields in pg_proc into a char field:
280 1. Add the new column, with a default, to pg_proc.h:
281 + /* see PROKIND_ categories below */
282 + char prokind BKI_DEFAULT(f);
284 2. Create a new script based on reformat_dat_file.pl to insert
285 appropriate values on-the-fly:
286 - # At this point we have the full row in memory as a hash
287 - # and can do any operations we want. As written, it only
288 - # removes default values, but this script can be adapted to
289 - # do one-off bulk-editing.
290 + # One-off change to migrate to prokind
291 + # Default has already been filled in by now, so change to other
292 + # values as appropriate
293 + if ($values{proisagg} eq 't')
295 + $values{prokind} = 'a';
297 + elsif ($values{proiswindow} eq 't')
299 + $values{prokind} = 'w';
302 3. Run the new script:
303 $ cd src/include/catalog
304 $ perl rewrite_dat_with_prokind.pl pg_proc.dat
306 At this point pg_proc.dat has all three columns, prokind, proisagg,
307 and proiswindow, though they will appear only in rows where they
308 have non-default values.
309 4. Remove the old columns from pg_proc.h:
310 - /* is it an aggregate? */
311 - bool proisagg BKI_DEFAULT(f);
313 - /* is it a window function? */
314 - bool proiswindow BKI_DEFAULT(f);
316 5. Finally, run make reformat-dat-files to remove the useless old
317 entries from pg_proc.dat.
319 For further examples of scripts used for bulk editing, see
320 convert_oid2name.pl and remove_pg_type_oid_symbols.pl attached to this
322 https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tN
323 cCT5D0O3ShfnM6jehA@mail.gmail.com