2 28.3. Write-Ahead Logging (WAL) #
4 Write-Ahead Logging (WAL) is a standard method for ensuring data
5 integrity. A detailed description can be found in most (if not all)
6 books about transaction processing. Briefly, WAL's central concept is
7 that changes to data files (where tables and indexes reside) must be
8 written only after those changes have been logged, that is, after WAL
9 records describing the changes have been flushed to permanent storage.
10 If we follow this procedure, we do not need to flush data pages to disk
11 on every transaction commit, because we know that in the event of a
12 crash we will be able to recover the database using the log: any
13 changes that have not been applied to the data pages can be redone from
14 the WAL records. (This is roll-forward recovery, also known as REDO.)
18 Because WAL restores database file contents after a crash, journaled
19 file systems are not necessary for reliable storage of the data files
20 or WAL files. In fact, journaling overhead can reduce performance,
21 especially if journaling causes file system data to be flushed to disk.
22 Fortunately, data flushing during journaling can often be disabled with
23 a file system mount option, e.g., data=writeback on a Linux ext3 file
24 system. Journaled file systems do improve boot speed after a crash.
26 Using WAL results in a significantly reduced number of disk writes,
27 because only the WAL file needs to be flushed to disk to guarantee that
28 a transaction is committed, rather than every data file changed by the
29 transaction. The WAL file is written sequentially, and so the cost of
30 syncing the WAL is much less than the cost of flushing the data pages.
31 This is especially true for servers handling many small transactions
32 touching different parts of the data store. Furthermore, when the
33 server is processing many small concurrent transactions, one fsync of
34 the WAL file may suffice to commit many transactions.
36 WAL also makes it possible to support on-line backup and point-in-time
37 recovery, as described in Section 25.3. By archiving the WAL data we
38 can support reverting to any time instant covered by the available WAL
39 data: we simply install a prior physical backup of the database, and
40 replay the WAL just as far as the desired time. What's more, the
41 physical backup doesn't have to be an instantaneous snapshot of the
42 database state — if it is made over some period of time, then replaying
43 the WAL for that period will fix any internal inconsistencies.