begriffs open source - ai-pg/blob - full-docs/txt/system-catalog-initial-data.txt

   1
   2 68.2. System Catalog Initial Data #
   3
   4    68.2.1. Data File Format
   5    68.2.2. OID Assignment
   6    68.2.3. OID Reference Lookup
   7    68.2.4. Automatic Creation of Array Types
   8    68.2.5. Recipes for Editing Data Files
   9
  10    Each catalog that has any manually-created initial data (some do not)
  11    has a corresponding .dat file that contains its initial data in an
  12    editable format.
  13
  14 68.2.1. Data File Format #
  15
  16    Each .dat file contains Perl data structure literals that are simply
  17    eval'd to produce an in-memory data structure consisting of an array of
  18    hash references, one per catalog row. A slightly modified excerpt from
  19    pg_database.dat will demonstrate the key features:
  20 [
  21
  22 # A comment could appear here.
  23 { oid => '1', oid_symbol => 'Template1DbOid',
  24   descr => 'database\'s default template',
  25   datname => 'template1', encoding => 'ENCODING',
  26   datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
  27   datallowconn => 't', dathasloginevt => 'f', datconnlimit => '-1', datfrozenxid
  28  => '0',
  29   datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
  30   datctype => 'LC_CTYPE', datlocale => 'DATLOCALE', datacl => '_null_' },
  31
  32 ]
  33
  34    Points to note:
  35      * The overall file layout is: open square bracket, one or more sets
  36        of curly braces each of which represents a catalog row, close
  37        square bracket. Write a comma after each closing curly brace.
  38      * Within each catalog row, write comma-separated key => value pairs.
  39        The allowed keys are the names of the catalog's columns, plus the
  40        metadata keys oid, oid_symbol, array_type_oid, and descr. (The use
  41        of oid and oid_symbol is described in Section 68.2.2 below, while
  42        array_type_oid is described in Section 68.2.4. descr supplies a
  43        description string for the object, which will be inserted into
  44        pg_description or pg_shdescription as appropriate.) While the
  45        metadata keys are optional, the catalog's defined columns must all
  46        be provided, except when the catalog's .h file specifies a default
  47        value for the column. (In the example above, the datdba field has
  48        been omitted because pg_database.h supplies a suitable default
  49        value for it.)
  50      * All values must be single-quoted. Escape single quotes used within
  51        a value with a backslash. Backslashes meant as data can, but need
  52        not, be doubled; this follows Perl's rules for simple quoted
  53        literals. Note that backslashes appearing as data will be treated
  54        as escapes by the bootstrap scanner, according to the same rules as
  55        for escape string constants (see Section 4.1.2.2); for example \t
  56        converts to a tab character. If you actually want a backslash in
  57        the final value, you will need to write four of them: Perl strips
  58        two, leaving \\ for the bootstrap scanner to see.
  59      * Null values are represented by _null_. (Note that there is no way
  60        to create a value that is just that string.)
  61      * Comments are preceded by #, and must be on their own lines.
  62      * Field values that are OIDs of other catalog entries should be
  63        represented by symbolic names rather than actual numeric OIDs. (In
  64        the example above, dattablespace contains such a reference.) This
  65        is described in Section 68.2.3 below.
  66      * Since hashes are unordered data structures, field order and line
  67        layout aren't semantically significant. However, to maintain a
  68        consistent appearance, we set a few rules that are applied by the
  69        formatting script reformat_dat_file.pl:
  70           + Within each pair of curly braces, the metadata fields oid,
  71             oid_symbol, array_type_oid, and descr (if present) come first,
  72             in that order, then the catalog's own fields appear in their
  73             defined order.
  74           + Newlines are inserted between fields as needed to limit line
  75             length to 80 characters, if possible. A newline is also
  76             inserted between the metadata fields and the regular fields.
  77           + If the catalog's .h file specifies a default value for a
  78             column, and a data entry has that same value,
  79             reformat_dat_file.pl will omit it from the data file. This
  80             keeps the data representation compact.
  81           + reformat_dat_file.pl preserves blank lines and comment lines
  82             as-is.
  83        It's recommended to run reformat_dat_file.pl before submitting
  84        catalog data patches. For convenience, you can simply change to
  85        src/include/catalog/ and run make reformat-dat-files.
  86      * If you want to add a new method of making the data representation
  87        smaller, you must implement it in reformat_dat_file.pl and also
  88        teach Catalog::ParseData() how to expand the data back into the
  89        full representation.
  90
  91 68.2.2. OID Assignment #
  92
  93    A catalog row appearing in the initial data can be given a
  94    manually-assigned OID by writing an oid => nnnn metadata field.
  95    Furthermore, if an OID is assigned, a C macro for that OID can be
  96    created by writing an oid_symbol => name metadata field.
  97
  98    Pre-loaded catalog rows must have preassigned OIDs if there are OID
  99    references to them in other pre-loaded rows. A preassigned OID is also
 100    needed if the row's OID must be referenced from C code. If neither case
 101    applies, the oid metadata field can be omitted, in which case the
 102    bootstrap code assigns an OID automatically. In practice we usually
 103    preassign OIDs for all or none of the pre-loaded rows in a given
 104    catalog, even if only some of them are actually cross-referenced.
 105
 106    Writing the actual numeric value of any OID in C code is considered
 107    very bad form; always use a macro, instead. Direct references to
 108    pg_proc OIDs are common enough that there's a special mechanism to
 109    create the necessary macros automatically; see
 110    src/backend/utils/Gen_fmgrtab.pl. Similarly — but, for historical
 111    reasons, not done the same way — there's an automatic method for
 112    creating macros for pg_type OIDs. oid_symbol entries are therefore not
 113    necessary in those two catalogs. Likewise, macros for the pg_class OIDs
 114    of system catalogs and indexes are set up automatically. For all other
 115    system catalogs, you have to manually specify any macros you need via
 116    oid_symbol entries.
 117
 118    To find an available OID for a new pre-loaded row, run the script
 119    src/include/catalog/unused_oids. It prints inclusive ranges of unused
 120    OIDs (e.g., the output line 45-900 means OIDs 45 through 900 have not
 121    been allocated yet). Currently, OIDs 1–9999 are reserved for manual
 122    assignment; the unused_oids script simply looks through the catalog
 123    headers and .dat files to see which ones do not appear. You can also
 124    use the duplicate_oids script to check for mistakes. (genbki.pl will
 125    assign OIDs for any rows that didn't get one hand-assigned to them, and
 126    it will also detect duplicate OIDs at compile time.)
 127
 128    When choosing OIDs for a patch that is not expected to be committed
 129    immediately, best practice is to use a group of more-or-less
 130    consecutive OIDs starting with some random choice in the range
 131    8000—9999. This minimizes the risk of OID collisions with other patches
 132    being developed concurrently. To keep the 8000—9999 range free for
 133    development purposes, after a patch has been committed to the master
 134    git repository its OIDs should be renumbered into available space below
 135    that range. Typically, this will be done near the end of each
 136    development cycle, moving all OIDs consumed by patches committed in
 137    that cycle at the same time. The script renumber_oids.pl can be used
 138    for this purpose. If an uncommitted patch is found to have OID
 139    conflicts with some recently-committed patch, renumber_oids.pl may also
 140    be useful for recovering from that situation.
 141
 142    Because of this convention of possibly renumbering OIDs assigned by
 143    patches, the OIDs assigned by a patch should not be considered stable
 144    until the patch has been included in an official release. We do not
 145    change manually-assigned object OIDs once released, however, as that
 146    would create assorted compatibility problems.
 147
 148    If genbki.pl needs to assign an OID to a catalog entry that does not
 149    have a manually-assigned OID, it will use a value in the range
 150    10000—11999. The server's OID counter is set to 10000 at the start of a
 151    bootstrap run, so that any objects created on-the-fly during bootstrap
 152    processing also receive OIDs in this range. (The usual OID assignment
 153    mechanism takes care of preventing any conflicts.)
 154
 155    Objects with OIDs below FirstUnpinnedObjectId (12000) are considered
 156    “pinned”, preventing them from being deleted. (There are a small number
 157    of exceptions, which are hard-wired into IsPinnedObject().) initdb
 158    forces the OID counter up to FirstUnpinnedObjectId as soon as it's
 159    ready to create unpinned objects. Thus objects created during the later
 160    phases of initdb, such as objects created while running the
 161    information_schema.sql script, will not be pinned, while all objects
 162    known to genbki.pl will be.
 163
 164    OIDs assigned during normal database operation are constrained to be
 165    16384 or higher. This ensures that the range 10000—16383 is free for
 166    OIDs assigned automatically by genbki.pl or during initdb. These
 167    automatically-assigned OIDs are not considered stable, and may change
 168    from one installation to another.
 169
 170 68.2.3. OID Reference Lookup #
 171
 172    In principle, cross-references from one initial catalog row to another
 173    could be written just by writing the preassigned OID of the referenced
 174    row in the referencing field. However, that is against project policy,
 175    because it is error-prone, hard to read, and subject to breakage if a
 176    newly-assigned OID is renumbered. Therefore genbki.pl provides
 177    mechanisms to write symbolic references instead. The rules are as
 178    follows:
 179      * Use of symbolic references is enabled in a particular catalog
 180        column by attaching BKI_LOOKUP(lookuprule) to the column's
 181        definition, where lookuprule is the name of the referenced catalog,
 182        e.g., pg_proc. BKI_LOOKUP can be attached to columns of type Oid,
 183        regproc, oidvector, or Oid[]; in the latter two cases it implies
 184        performing a lookup on each element of the array.
 185      * It's also permissible to attach BKI_LOOKUP(encoding) to integer
 186        columns to reference character set encodings, which are not
 187        currently represented as catalog OIDs, but have a set of values
 188        known to genbki.pl.
 189      * In some catalog columns, it's allowed for entries to be zero
 190        instead of a valid reference. If this is allowed, write
 191        BKI_LOOKUP_OPT instead of BKI_LOOKUP. Then you can write 0 for an
 192        entry. (If the column is declared regproc, you can optionally write
 193        - instead of 0.) Except for this special case, all entries in a
 194        BKI_LOOKUP column must be symbolic references. genbki.pl will warn
 195        about unrecognized names.
 196      * Most kinds of catalog objects are simply referenced by their names.
 197        Note that type names must exactly match the referenced pg_type
 198        entry's typname; you do not get to use any aliases such as integer
 199        for int4.
 200      * A function can be represented by its proname, if that is unique
 201        among the pg_proc.dat entries (this works like regproc input).
 202        Otherwise, write it as proname(argtypename,argtypename,...), like
 203        regprocedure. The argument type names must be spelled exactly as
 204        they are in the pg_proc.dat entry's proargtypes field. Do not
 205        insert any spaces.
 206      * Operators are represented by oprname(lefttype,righttype), writing
 207        the type names exactly as they appear in the pg_operator.dat
 208        entry's oprleft and oprright fields. (Write 0 for the omitted
 209        operand of a unary operator.)
 210      * The names of opclasses and opfamilies are only unique within an
 211        access method, so they are represented by
 212        access_method_name/object_name.
 213      * In none of these cases is there any provision for
 214        schema-qualification; all objects created during bootstrap are
 215        expected to be in the pg_catalog schema.
 216
 217    genbki.pl resolves all symbolic references while it runs, and puts
 218    simple numeric OIDs into the emitted BKI file. There is therefore no
 219    need for the bootstrap backend to deal with symbolic references.
 220
 221    It's desirable to mark OID reference columns with BKI_LOOKUP or
 222    BKI_LOOKUP_OPT even if the catalog has no initial data that requires
 223    lookup. This allows genbki.pl to record the foreign key relationships
 224    that exist in the system catalogs. That information is used in the
 225    regression tests to check for incorrect entries. See also the macros
 226    DECLARE_FOREIGN_KEY, DECLARE_FOREIGN_KEY_OPT,
 227    DECLARE_ARRAY_FOREIGN_KEY, and DECLARE_ARRAY_FOREIGN_KEY_OPT, which are
 228    used to declare foreign key relationships that are too complex for
 229    BKI_LOOKUP (typically, multi-column foreign keys).
 230
 231 68.2.4. Automatic Creation of Array Types #
 232
 233    Most scalar data types should have a corresponding array type (that is,
 234    a standard varlena array type whose element type is the scalar type,
 235    and which is referenced by the typarray field of the scalar type's
 236    pg_type entry). genbki.pl is able to generate the pg_type entry for the
 237    array type automatically in most cases.
 238
 239    To use this facility, just write an array_type_oid => nnnn metadata
 240    field in the scalar type's pg_type entry, specifying the OID to use for
 241    the array type. You may then omit the typarray field, since it will be
 242    filled automatically with that OID.
 243
 244    The generated array type's name is the scalar type's name with an
 245    underscore prepended. The array entry's other fields are filled from
 246    BKI_ARRAY_DEFAULT(value) annotations in pg_type.h, or if there isn't
 247    one, copied from the scalar type. (There's also a special case for
 248    typalign.) Then the typelem and typarray fields of the two entries are
 249    set to cross-reference each other.
 250
 251 68.2.5. Recipes for Editing Data Files #
 252
 253    Here are some suggestions about the easiest ways to perform common
 254    tasks when updating catalog data files.
 255
 256    Add a new column with a default to a catalog:  Add the column to the
 257    header file with a BKI_DEFAULT(value) annotation. The data file need
 258    only be adjusted by adding the field in existing rows where a
 259    non-default value is needed.
 260
 261    Add a default value to an existing column that doesn't have one:  Add a
 262    BKI_DEFAULT annotation to the header file, then run make
 263    reformat-dat-files to remove now-redundant field entries.
 264
 265    Remove a column, whether it has a default or not:  Remove the column
 266    from the header, then run make reformat-dat-files to remove now-useless
 267    field entries.
 268
 269    Change or remove an existing default value:  You cannot simply change
 270    the header file, since that will cause the current data to be
 271    interpreted incorrectly. First run make expand-dat-files to rewrite the
 272    data files with all default values inserted explicitly, then change or
 273    remove the BKI_DEFAULT annotation, then run make reformat-dat-files to
 274    remove superfluous fields again.
 275
 276    Ad-hoc bulk editing:  reformat_dat_file.pl can be adapted to perform
 277    many kinds of bulk changes. Look for its block comments showing where
 278    one-off code can be inserted. In the following example, we are going to
 279    consolidate two Boolean fields in pg_proc into a char field:
 280     1. Add the new column, with a default, to pg_proc.h:
 281 +    /* see PROKIND_ categories below */
 282 +    char        prokind BKI_DEFAULT(f);
 283
 284     2. Create a new script based on reformat_dat_file.pl to insert
 285        appropriate values on-the-fly:
 286 -           # At this point we have the full row in memory as a hash
 287 -           # and can do any operations we want. As written, it only
 288 -           # removes default values, but this script can be adapted to
 289 -           # do one-off bulk-editing.
 290 +           # One-off change to migrate to prokind
 291 +           # Default has already been filled in by now, so change to other
 292 +           # values as appropriate
 293 +           if ($values{proisagg} eq 't')
 294 +           {
 295 +               $values{prokind} = 'a';
 296 +           }
 297 +           elsif ($values{proiswindow} eq 't')
 298 +           {
 299 +               $values{prokind} = 'w';
 300 +           }
 301
 302     3. Run the new script:
 303 $ cd src/include/catalog
 304 $ perl  rewrite_dat_with_prokind.pl  pg_proc.dat
 305
 306        At this point pg_proc.dat has all three columns, prokind, proisagg,
 307        and proiswindow, though they will appear only in rows where they
 308        have non-default values.
 309     4. Remove the old columns from pg_proc.h:
 310 -    /* is it an aggregate? */
 311 -    bool        proisagg BKI_DEFAULT(f);
 312 -
 313 -    /* is it a window function? */
 314 -    bool        proiswindow BKI_DEFAULT(f);
 315
 316     5. Finally, run make reformat-dat-files to remove the useless old
 317        entries from pg_proc.dat.
 318
 319    For further examples of scripts used for bulk editing, see
 320    convert_oid2name.pl and remove_pg_type_oid_symbols.pl attached to this
 321    message:
 322    https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tN
 323    cCT5D0O3ShfnM6jehA@mail.gmail.com