begriffs open source - ai-pg/blob - full-docs/txt/parser-stage.txt

   1
   2 51.3. The Parser Stage #
   3
   4    51.3.1. Parser
   5    51.3.2. Transformation Process
   6
   7    The parser stage consists of two parts:
   8      * The parser defined in gram.y and scan.l is built using the Unix
   9        tools bison and flex.
  10      * The transformation process does modifications and augmentations to
  11        the data structures returned by the parser.
  12
  13 51.3.1. Parser #
  14
  15    The parser has to check the query string (which arrives as plain text)
  16    for valid syntax. If the syntax is correct a parse tree is built up and
  17    handed back; otherwise an error is returned. The parser and lexer are
  18    implemented using the well-known Unix tools bison and flex.
  19
  20    The lexer is defined in the file scan.l and is responsible for
  21    recognizing identifiers, the SQL key words etc. For every key word or
  22    identifier that is found, a token is generated and handed to the
  23    parser.
  24
  25    The parser is defined in the file gram.y and consists of a set of
  26    grammar rules and actions that are executed whenever a rule is fired.
  27    The code of the actions (which is actually C code) is used to build up
  28    the parse tree.
  29
  30    The file scan.l is transformed to the C source file scan.c using the
  31    program flex and gram.y is transformed to gram.c using bison. After
  32    these transformations have taken place a normal C compiler can be used
  33    to create the parser. Never make any changes to the generated C files
  34    as they will be overwritten the next time flex or bison is called.
  35
  36 Note
  37
  38    The mentioned transformations and compilations are normally done
  39    automatically using the makefiles shipped with the PostgreSQL source
  40    distribution.
  41
  42    A detailed description of bison or the grammar rules given in gram.y
  43    would be beyond the scope of this manual. There are many books and
  44    documents dealing with flex and bison. You should be familiar with
  45    bison before you start to study the grammar given in gram.y otherwise
  46    you won't understand what happens there.
  47
  48 51.3.2. Transformation Process #
  49
  50    The parser stage creates a parse tree using only fixed rules about the
  51    syntactic structure of SQL. It does not make any lookups in the system
  52    catalogs, so there is no possibility to understand the detailed
  53    semantics of the requested operations. After the parser completes, the
  54    transformation process takes the tree handed back by the parser as
  55    input and does the semantic interpretation needed to understand which
  56    tables, functions, and operators are referenced by the query. The data
  57    structure that is built to represent this information is called the
  58    query tree.
  59
  60    The reason for separating raw parsing from semantic analysis is that
  61    system catalog lookups can only be done within a transaction, and we do
  62    not wish to start a transaction immediately upon receiving a query
  63    string. The raw parsing stage is sufficient to identify the transaction
  64    control commands (BEGIN, ROLLBACK, etc.), and these can then be
  65    correctly executed without any further analysis. Once we know that we
  66    are dealing with an actual query (such as SELECT or UPDATE), it is okay
  67    to start a transaction if we're not already in one. Only then can the
  68    transformation process be invoked.
  69
  70    The query tree created by the transformation process is structurally
  71    similar to the raw parse tree in most places, but it has many
  72    differences in detail. For example, a FuncCall node in the parse tree
  73    represents something that looks syntactically like a function call.
  74    This might be transformed to either a FuncExpr or Aggref node depending
  75    on whether the referenced name turns out to be an ordinary function or
  76    an aggregate function. Also, information about the actual data types of
  77    columns and expression results is added to the query tree.