begriffs open source - ai-pg/blob - full-docs/txt/sql-createindex.txt

   1
   2 CREATE INDEX
   3
   4    CREATE INDEX — define a new index
   5
   6 Synopsis
   7
   8 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ]
   9 table_name [ USING method ]
  10     ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opcla
  11 ss_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [,
  12  ...] )
  13     [ INCLUDE ( column_name [, ...] ) ]
  14     [ NULLS [ NOT ] DISTINCT ]
  15     [ WITH ( storage_parameter [= value] [, ... ] ) ]
  16     [ TABLESPACE tablespace_name ]
  17     [ WHERE predicate ]
  18
  19 Description
  20
  21    CREATE INDEX constructs an index on the specified column(s) of the
  22    specified relation, which can be a table or a materialized view.
  23    Indexes are primarily used to enhance database performance (though
  24    inappropriate use can result in slower performance).
  25
  26    The key field(s) for the index are specified as column names, or
  27    alternatively as expressions written in parentheses. Multiple fields
  28    can be specified if the index method supports multicolumn indexes.
  29
  30    An index field can be an expression computed from the values of one or
  31    more columns of the table row. This feature can be used to obtain fast
  32    access to data based on some transformation of the basic data. For
  33    example, an index computed on upper(col) would allow the clause WHERE
  34    upper(col) = 'JIM' to use an index.
  35
  36    PostgreSQL provides the index methods B-tree, hash, GiST, SP-GiST, GIN,
  37    and BRIN. Users can also define their own index methods, but that is
  38    fairly complicated.
  39
  40    When the WHERE clause is present, a partial index is created. A partial
  41    index is an index that contains entries for only a portion of a table,
  42    usually a portion that is more useful for indexing than the rest of the
  43    table. For example, if you have a table that contains both billed and
  44    unbilled orders where the unbilled orders take up a small fraction of
  45    the total table and yet that is an often used section, you can improve
  46    performance by creating an index on just that portion. Another possible
  47    application is to use WHERE with UNIQUE to enforce uniqueness over a
  48    subset of a table. See Section 11.8 for more discussion.
  49
  50    The expression used in the WHERE clause can refer only to columns of
  51    the underlying table, but it can use all columns, not just the ones
  52    being indexed. Presently, subqueries and aggregate expressions are also
  53    forbidden in WHERE. The same restrictions apply to index fields that
  54    are expressions.
  55
  56    All functions and operators used in an index definition must be
  57    “immutable”, that is, their results must depend only on their arguments
  58    and never on any outside influence (such as the contents of another
  59    table or the current time). This restriction ensures that the behavior
  60    of the index is well-defined. To use a user-defined function in an
  61    index expression or WHERE clause, remember to mark the function
  62    immutable when you create it.
  63
  64 Parameters
  65
  66    UNIQUE
  67           Causes the system to check for duplicate values in the table
  68           when the index is created (if data already exist) and each time
  69           data is added. Attempts to insert or update data which would
  70           result in duplicate entries will generate an error.
  71
  72           Additional restrictions apply when unique indexes are applied to
  73           partitioned tables; see CREATE TABLE.
  74
  75    CONCURRENTLY
  76           When this option is used, PostgreSQL will build the index
  77           without taking any locks that prevent concurrent inserts,
  78           updates, or deletes on the table; whereas a standard index build
  79           locks out writes (but not reads) on the table until it's done.
  80           There are several caveats to be aware of when using this option
  81           — see Building Indexes Concurrently below.
  82
  83           For temporary tables, CREATE INDEX is always non-concurrent, as
  84           no other session can access them, and non-concurrent index
  85           creation is cheaper.
  86
  87    IF NOT EXISTS
  88           Do not throw an error if a relation with the same name already
  89           exists. A notice is issued in this case. Note that there is no
  90           guarantee that the existing index is anything like the one that
  91           would have been created. Index name is required when IF NOT
  92           EXISTS is specified.
  93
  94    INCLUDE
  95           The optional INCLUDE clause specifies a list of columns which
  96           will be included in the index as non-key columns. A non-key
  97           column cannot be used in an index scan search qualification, and
  98           it is disregarded for purposes of any uniqueness or exclusion
  99           constraint enforced by the index. However, an index-only scan
 100           can return the contents of non-key columns without having to
 101           visit the index's table, since they are available directly from
 102           the index entry. Thus, addition of non-key columns allows
 103           index-only scans to be used for queries that otherwise could not
 104           use them.
 105
 106           It's wise to be conservative about adding non-key columns to an
 107           index, especially wide columns. If an index tuple exceeds the
 108           maximum size allowed for the index type, data insertion will
 109           fail. In any case, non-key columns duplicate data from the
 110           index's table and bloat the size of the index, thus potentially
 111           slowing searches. Furthermore, B-tree deduplication is never
 112           used with indexes that have a non-key column.
 113
 114           Columns listed in the INCLUDE clause don't need appropriate
 115           operator classes; the clause can include columns whose data
 116           types don't have operator classes defined for a given access
 117           method.
 118
 119           Expressions are not supported as included columns since they
 120           cannot be used in index-only scans.
 121
 122           Currently, the B-tree, GiST and SP-GiST index access methods
 123           support this feature. In these indexes, the values of columns
 124           listed in the INCLUDE clause are included in leaf tuples which
 125           correspond to heap tuples, but are not included in upper-level
 126           index entries used for tree navigation.
 127
 128    name
 129           The name of the index to be created. No schema name can be
 130           included here; the index is always created in the same schema as
 131           its parent table. The name of the index must be distinct from
 132           the name of any other relation (table, sequence, index, view,
 133           materialized view, or foreign table) in that schema. If the name
 134           is omitted, PostgreSQL chooses a suitable name based on the
 135           parent table's name and the indexed column name(s).
 136
 137    ONLY
 138           Indicates not to recurse creating indexes on partitions, if the
 139           table is partitioned. The default is to recurse.
 140
 141    table_name
 142           The name (possibly schema-qualified) of the table to be indexed.
 143
 144    method
 145           The name of the index method to be used. Choices are btree,
 146           hash, gist, spgist, gin, brin, or user-installed access methods
 147           like bloom. The default method is btree.
 148
 149    column_name
 150           The name of a column of the table.
 151
 152    expression
 153           An expression based on one or more columns of the table. The
 154           expression usually must be written with surrounding parentheses,
 155           as shown in the syntax. However, the parentheses can be omitted
 156           if the expression has the form of a function call.
 157
 158    collation
 159           The name of the collation to use for the index. By default, the
 160           index uses the collation declared for the column to be indexed
 161           or the result collation of the expression to be indexed. Indexes
 162           with non-default collations can be useful for queries that
 163           involve expressions using non-default collations.
 164
 165    opclass
 166           The name of an operator class. See below for details.
 167
 168    opclass_parameter
 169           The name of an operator class parameter. See below for details.
 170
 171    ASC
 172           Specifies ascending sort order (which is the default).
 173
 174    DESC
 175           Specifies descending sort order.
 176
 177    NULLS FIRST
 178           Specifies that nulls sort before non-nulls. This is the default
 179           when DESC is specified.
 180
 181    NULLS LAST
 182           Specifies that nulls sort after non-nulls. This is the default
 183           when DESC is not specified.
 184
 185    NULLS DISTINCT
 186           NULLS NOT DISTINCT
 187           Specifies whether for a unique index, null values should be
 188           considered distinct (not equal). The default is that they are
 189           distinct, so that a unique index could contain multiple null
 190           values in a column.
 191
 192    storage_parameter
 193           The name of an index-method-specific storage parameter. See
 194           Index Storage Parameters below for details.
 195
 196    tablespace_name
 197           The tablespace in which to create the index. If not specified,
 198           default_tablespace is consulted, or temp_tablespaces for indexes
 199           on temporary tables.
 200
 201    predicate
 202           The constraint expression for a partial index.
 203
 204 Index Storage Parameters
 205
 206    The optional WITH clause specifies storage parameters for the index.
 207    Each index method has its own set of allowed storage parameters.
 208
 209    The B-tree, hash, GiST and SP-GiST index methods all accept this
 210    parameter:
 211
 212    fillfactor (integer) #
 213           Controls how full the index method will try to pack index pages.
 214           For B-trees, leaf pages are filled to this percentage during
 215           initial index builds, and also when extending the index at the
 216           right (adding new largest key values). If pages subsequently
 217           become completely full, they will be split, leading to
 218           fragmentation of the on-disk index structure. B-trees use a
 219           default fillfactor of 90, but any integer value from 10 to 100
 220           can be selected.
 221
 222           B-tree indexes on tables where many inserts and/or updates are
 223           anticipated can benefit from lower fillfactor settings at CREATE
 224           INDEX time (following bulk loading into the table). Values in
 225           the range of 50 - 90 can usefully “smooth out” the rate of page
 226           splits during the early life of the B-tree index (lowering
 227           fillfactor like this may even lower the absolute number of page
 228           splits, though this effect is highly workload dependent). The
 229           B-tree bottom-up index deletion technique described in
 230           Section 65.1.4.2 is dependent on having some “extra” space on
 231           pages to store “extra” tuple versions, and so can be affected by
 232           fillfactor (though the effect is usually not significant).
 233
 234           In other specific cases it might be useful to increase
 235           fillfactor to 100 at CREATE INDEX time as a way of maximizing
 236           space utilization. You should only consider this when you are
 237           completely sure that the table is static (i.e. that it will
 238           never be affected by either inserts or updates). A fillfactor
 239           setting of 100 otherwise risks harming performance: even a few
 240           updates or inserts will cause a sudden flood of page splits.
 241
 242           The other index methods use fillfactor in different but roughly
 243           analogous ways; the default fillfactor varies between methods.
 244
 245    B-tree indexes additionally accept this parameter:
 246
 247    deduplicate_items (boolean) #
 248           Controls usage of the B-tree deduplication technique described
 249           in Section 65.1.4.3. Set to ON or OFF to enable or disable the
 250           optimization. (Alternative spellings of ON and OFF are allowed
 251           as described in Section 19.1.) The default is ON.
 252
 253 Note
 254
 255           Turning deduplicate_items off via ALTER INDEX prevents future
 256           insertions from triggering deduplication, but does not in itself
 257           make existing posting list tuples use the standard tuple
 258           representation.
 259
 260    GiST indexes additionally accept this parameter:
 261
 262    buffering (enum) #
 263           Controls whether the buffered build technique described in
 264           Section 65.2.4.1 is used to build the index. With OFF buffering
 265           is disabled, with ON it is enabled, and with AUTO it is
 266           initially disabled, but is turned on on-the-fly once the index
 267           size reaches effective_cache_size. The default is AUTO. Note
 268           that if sorted build is possible, it will be used instead of
 269           buffered build unless buffering=ON is specified.
 270
 271    GIN indexes accept these parameters:
 272
 273    fastupdate (boolean) #
 274           Controls usage of the fast update technique described in
 275           Section 65.4.4.1. ON enables fast update, OFF disables it. The
 276           default is ON.
 277
 278 Note
 279
 280           Turning fastupdate off via ALTER INDEX prevents future
 281           insertions from going into the list of pending index entries,
 282           but does not in itself flush existing entries. You might want to
 283           VACUUM the table or call the gin_clean_pending_list function
 284           afterward to ensure the pending list is emptied.
 285
 286    gin_pending_list_limit (integer) #
 287           Overrides the global setting of gin_pending_list_limit for this
 288           index. This value is specified in kilobytes.
 289
 290    BRIN indexes accept these parameters:
 291
 292    pages_per_range (integer) #
 293           Defines the number of table blocks that make up one block range
 294           for each entry of a BRIN index (see Section 65.5.1 for more
 295           details). The default is 128.
 296
 297    autosummarize (boolean) #
 298           Defines whether a summarization run is queued for the previous
 299           page range whenever an insertion is detected on the next one
 300           (see Section 65.5.1.1 for more details). The default is off.
 301
 302 Building Indexes Concurrently
 303
 304    Creating an index can interfere with regular operation of a database.
 305    Normally PostgreSQL locks the table to be indexed against writes and
 306    performs the entire index build with a single scan of the table. Other
 307    transactions can still read the table, but if they try to insert,
 308    update, or delete rows in the table they will block until the index
 309    build is finished. This could have a severe effect if the system is a
 310    live production database. Very large tables can take many hours to be
 311    indexed, and even for smaller tables, an index build can lock out
 312    writers for periods that are unacceptably long for a production system.
 313
 314    PostgreSQL supports building indexes without locking out writes. This
 315    method is invoked by specifying the CONCURRENTLY option of CREATE
 316    INDEX. When this option is used, PostgreSQL must perform two scans of
 317    the table, and in addition it must wait for all existing transactions
 318    that could potentially modify or use the index to terminate. Thus this
 319    method requires more total work than a standard index build and takes
 320    significantly longer to complete. However, since it allows normal
 321    operations to continue while the index is built, this method is useful
 322    for adding new indexes in a production environment. Of course, the
 323    extra CPU and I/O load imposed by the index creation might slow other
 324    operations.
 325
 326    In a concurrent index build, the index is actually entered as an
 327    “invalid” index into the system catalogs in one transaction, then two
 328    table scans occur in two more transactions. Before each table scan, the
 329    index build must wait for existing transactions that have modified the
 330    table to terminate. After the second scan, the index build must wait
 331    for any transactions that have a snapshot (see Chapter 13) predating
 332    the second scan to terminate, including transactions used by any phase
 333    of concurrent index builds on other tables, if the indexes involved are
 334    partial or have columns that are not simple column references. Then
 335    finally the index can be marked “valid” and ready for use, and the
 336    CREATE INDEX command terminates. Even then, however, the index may not
 337    be immediately usable for queries: in the worst case, it cannot be used
 338    as long as transactions exist that predate the start of the index
 339    build.
 340
 341    If a problem arises while scanning the table, such as a deadlock or a
 342    uniqueness violation in a unique index, the CREATE INDEX command will
 343    fail but leave behind an “invalid” index. This index will be ignored
 344    for querying purposes because it might be incomplete; however it will
 345    still consume update overhead. The psql \d command will report such an
 346    index as INVALID:
 347 postgres=# \d tab
 348        Table "public.tab"
 349  Column |  Type   | Collation | Nullable | Default
 350 --------+---------+-----------+----------+---------
 351  col    | integer |           |          |
 352 Indexes:
 353     "idx" btree (col) INVALID
 354
 355    The recommended recovery method in such cases is to drop the index and
 356    try again to perform CREATE INDEX CONCURRENTLY. (Another possibility is
 357    to rebuild the index with REINDEX INDEX CONCURRENTLY).
 358
 359    Another caveat when building a unique index concurrently is that the
 360    uniqueness constraint is already being enforced against other
 361    transactions when the second table scan begins. This means that
 362    constraint violations could be reported in other queries prior to the
 363    index becoming available for use, or even in cases where the index
 364    build eventually fails. Also, if a failure does occur in the second
 365    scan, the “invalid” index continues to enforce its uniqueness
 366    constraint afterwards.
 367
 368    Concurrent builds of expression indexes and partial indexes are
 369    supported. Errors occurring in the evaluation of these expressions
 370    could cause behavior similar to that described above for unique
 371    constraint violations.
 372
 373    Regular index builds permit other regular index builds on the same
 374    table to occur simultaneously, but only one concurrent index build can
 375    occur on a table at a time. In either case, schema modification of the
 376    table is not allowed while the index is being built. Another difference
 377    is that a regular CREATE INDEX command can be performed within a
 378    transaction block, but CREATE INDEX CONCURRENTLY cannot.
 379
 380    Concurrent builds for indexes on partitioned tables are currently not
 381    supported. However, you may concurrently build the index on each
 382    partition individually and then finally create the partitioned index
 383    non-concurrently in order to reduce the time where writes to the
 384    partitioned table will be locked out. In this case, building the
 385    partitioned index is a metadata only operation.
 386
 387 Notes
 388
 389    See Chapter 11 for information about when indexes can be used, when
 390    they are not used, and in which particular situations they can be
 391    useful.
 392
 393    Currently, only the B-tree, GiST, GIN, and BRIN index methods support
 394    multiple-key-column indexes. Whether there can be multiple key columns
 395    is independent of whether INCLUDE columns can be added to the index.
 396    Indexes can have up to 32 columns, including INCLUDE columns. (This
 397    limit can be altered when building PostgreSQL.) Only B-tree currently
 398    supports unique indexes.
 399
 400    An operator class with optional parameters can be specified for each
 401    column of an index. The operator class identifies the operators to be
 402    used by the index for that column. For example, a B-tree index on
 403    four-byte integers would use the int4_ops class; this operator class
 404    includes comparison functions for four-byte integers. In practice the
 405    default operator class for the column's data type is usually
 406    sufficient. The main point of having operator classes is that for some
 407    data types, there could be more than one meaningful ordering. For
 408    example, we might want to sort a complex-number data type either by
 409    absolute value or by real part. We could do this by defining two
 410    operator classes for the data type and then selecting the proper class
 411    when creating an index. More information about operator classes is in
 412    Section 11.10 and in Section 36.16.
 413
 414    When CREATE INDEX is invoked on a partitioned table, the default
 415    behavior is to recurse to all partitions to ensure they all have
 416    matching indexes. Each partition is first checked to determine whether
 417    an equivalent index already exists, and if so, that index will become
 418    attached as a partition index to the index being created, which will
 419    become its parent index. If no matching index exists, a new index will
 420    be created and automatically attached; the name of the new index in
 421    each partition will be determined as if no index name had been
 422    specified in the command. If the ONLY option is specified, no recursion
 423    is done, and the index is marked invalid. (ALTER INDEX ... ATTACH
 424    PARTITION marks the index valid, once all partitions acquire matching
 425    indexes.) Note, however, that any partition that is created in the
 426    future using CREATE TABLE ... PARTITION OF will automatically have a
 427    matching index, regardless of whether ONLY is specified.
 428
 429    For index methods that support ordered scans (currently, only B-tree),
 430    the optional clauses ASC, DESC, NULLS FIRST, and/or NULLS LAST can be
 431    specified to modify the sort ordering of the index. Since an ordered
 432    index can be scanned either forward or backward, it is not normally
 433    useful to create a single-column DESC index — that sort ordering is
 434    already available with a regular index. The value of these options is
 435    that multicolumn indexes can be created that match the sort ordering
 436    requested by a mixed-ordering query, such as SELECT ... ORDER BY x ASC,
 437    y DESC. The NULLS options are useful if you need to support “nulls sort
 438    low” behavior, rather than the default “nulls sort high”, in queries
 439    that depend on indexes to avoid sorting steps.
 440
 441    The system regularly collects statistics on all of a table's columns.
 442    Newly-created non-expression indexes can immediately use these
 443    statistics to determine an index's usefulness. For new expression
 444    indexes, it is necessary to run ANALYZE or wait for the autovacuum
 445    daemon to analyze the table to generate statistics for these indexes.
 446
 447    While CREATE INDEX is running, the search_path is temporarily changed
 448    to pg_catalog, pg_temp.
 449
 450    For most index methods, the speed of creating an index is dependent on
 451    the setting of maintenance_work_mem. Larger values will reduce the time
 452    needed for index creation, so long as you don't make it larger than the
 453    amount of memory really available, which would drive the machine into
 454    swapping.
 455
 456    PostgreSQL can build indexes while leveraging multiple CPUs in order to
 457    process the table rows faster. This feature is known as parallel index
 458    build. For index methods that support building indexes in parallel
 459    (currently, B-tree, GIN, and BRIN), maintenance_work_mem specifies the
 460    maximum amount of memory that can be used by each index build operation
 461    as a whole, regardless of how many worker processes were started.
 462    Generally, a cost model automatically determines how many worker
 463    processes should be requested, if any.
 464
 465    Parallel index builds may benefit from increasing maintenance_work_mem
 466    where an equivalent serial index build will see little or no benefit.
 467    Note that maintenance_work_mem may influence the number of worker
 468    processes requested, since parallel workers must have at least a 32MB
 469    share of the total maintenance_work_mem budget. There must also be a
 470    remaining 32MB share for the leader process. Increasing
 471    max_parallel_maintenance_workers may allow more workers to be used,
 472    which will reduce the time needed for index creation, so long as the
 473    index build is not already I/O bound. Of course, there should also be
 474    sufficient CPU capacity that would otherwise lie idle.
 475
 476    Setting a value for parallel_workers via ALTER TABLE directly controls
 477    how many parallel worker processes will be requested by a CREATE INDEX
 478    against the table. This bypasses the cost model completely, and
 479    prevents maintenance_work_mem from affecting how many parallel workers
 480    are requested. Setting parallel_workers to 0 via ALTER TABLE will
 481    disable parallel index builds on the table in all cases.
 482
 483 Tip
 484
 485    You might want to reset parallel_workers after setting it as part of
 486    tuning an index build. This avoids inadvertent changes to query plans,
 487    since parallel_workers affects all parallel table scans.
 488
 489    While CREATE INDEX with the CONCURRENTLY option supports parallel
 490    builds without special restrictions, only the first table scan is
 491    actually performed in parallel.
 492
 493    Use DROP INDEX to remove an index.
 494
 495    Like any long-running transaction, CREATE INDEX on a table can affect
 496    which tuples can be removed by concurrent VACUUM on any other table.
 497
 498    Prior releases of PostgreSQL also had an R-tree index method. This
 499    method has been removed because it had no significant advantages over
 500    the GiST method. If USING rtree is specified, CREATE INDEX will
 501    interpret it as USING gist, to simplify conversion of old databases to
 502    GiST.
 503
 504    Each backend running CREATE INDEX will report its progress in the
 505    pg_stat_progress_create_index view. See Section 27.4.4 for details.
 506
 507 Examples
 508
 509    To create a unique B-tree index on the column title in the table films:
 510 CREATE UNIQUE INDEX title_idx ON films (title);
 511
 512    To create a unique B-tree index on the column title with included
 513    columns director and rating in the table films:
 514 CREATE UNIQUE INDEX title_idx ON films (title) INCLUDE (director, rating);
 515
 516    To create a B-Tree index with deduplication disabled:
 517 CREATE INDEX title_idx ON films (title) WITH (deduplicate_items = off);
 518
 519    To create an index on the expression lower(title), allowing efficient
 520    case-insensitive searches:
 521 CREATE INDEX ON films ((lower(title)));
 522
 523    (In this example we have chosen to omit the index name, so the system
 524    will choose a name, typically films_lower_idx.)
 525
 526    To create an index with non-default collation:
 527 CREATE INDEX title_idx_german ON films (title COLLATE "de_DE");
 528
 529    To create an index with non-default sort ordering of nulls:
 530 CREATE INDEX title_idx_nulls_low ON films (title NULLS FIRST);
 531
 532    To create an index with non-default fill factor:
 533 CREATE UNIQUE INDEX title_idx ON films (title) WITH (fillfactor = 70);
 534
 535    To create a GIN index with fast updates disabled:
 536 CREATE INDEX gin_idx ON documents_table USING GIN (locations) WITH (fastupdate =
 537  off);
 538
 539    To create an index on the column code in the table films and have the
 540    index reside in the tablespace indexspace:
 541 CREATE INDEX code_idx ON films (code) TABLESPACE indexspace;
 542
 543    To create a GiST index on a point attribute so that we can efficiently
 544    use box operators on the result of the conversion function:
 545 CREATE INDEX pointloc
 546     ON points USING gist (box(location,location));
 547 SELECT * FROM points
 548     WHERE box(location,location) && '(0,0),(1,1)'::box;
 549
 550    To create an index without locking out writes to the table:
 551 CREATE INDEX CONCURRENTLY sales_quantity_index ON sales_table (quantity);
 552
 553 Compatibility
 554
 555    CREATE INDEX is a PostgreSQL language extension. There are no
 556    provisions for indexes in the SQL standard.
 557
 558 See Also
 559
 560    ALTER INDEX, DROP INDEX, REINDEX, Section 27.4.4