2 27.5. Dynamic Tracing #
4 27.5.1. Compiling for Dynamic Tracing
5 27.5.2. Built-in Probes
7 27.5.4. Defining New Probes
9 PostgreSQL provides facilities to support dynamic tracing of the
10 database server. This allows an external utility to be called at
11 specific points in the code and thereby trace execution.
13 A number of probes or trace points are already inserted into the source
14 code. These probes are intended to be used by database developers and
15 administrators. By default the probes are not compiled into PostgreSQL;
16 the user needs to explicitly tell the configure script to make the
19 Currently, the DTrace utility is supported, which, at the time of this
20 writing, is available on Solaris, macOS, FreeBSD, NetBSD, and Oracle
21 Linux. The SystemTap project for Linux provides a DTrace equivalent and
22 can also be used. Supporting other dynamic tracing utilities is
23 theoretically possible by changing the definitions for the macros in
24 src/include/utils/probes.h.
26 27.5.1. Compiling for Dynamic Tracing #
28 By default, probes are not available, so you will need to explicitly
29 tell the configure script to make the probes available in PostgreSQL.
30 To include DTrace support specify --enable-dtrace to configure. See
31 Section 17.3.3.6 for further information.
33 27.5.2. Built-in Probes #
35 A number of standard probes are provided in the source code, as shown
36 in Table 27.49; Table 27.50 shows the types used in the probes. More
37 probes can certainly be added to enhance PostgreSQL's observability.
39 Table 27.49. Built-in DTrace Probes
40 Name Parameters Description
41 transaction-start (LocalTransactionId) Probe that fires at the start of
42 a new transaction. arg0 is the transaction ID.
43 transaction-commit (LocalTransactionId) Probe that fires when a
44 transaction completes successfully. arg0 is the transaction ID.
45 transaction-abort (LocalTransactionId) Probe that fires when a
46 transaction completes unsuccessfully. arg0 is the transaction ID.
47 query-start (const char *) Probe that fires when the processing of a
48 query is started. arg0 is the query string.
49 query-done (const char *) Probe that fires when the processing of a
50 query is complete. arg0 is the query string.
51 query-parse-start (const char *) Probe that fires when the parsing of a
52 query is started. arg0 is the query string.
53 query-parse-done (const char *) Probe that fires when the parsing of a
54 query is complete. arg0 is the query string.
55 query-rewrite-start (const char *) Probe that fires when the rewriting
56 of a query is started. arg0 is the query string.
57 query-rewrite-done (const char *) Probe that fires when the rewriting
58 of a query is complete. arg0 is the query string.
59 query-plan-start () Probe that fires when the planning of a query is
61 query-plan-done () Probe that fires when the planning of a query is
63 query-execute-start () Probe that fires when the execution of a query
65 query-execute-done () Probe that fires when the execution of a query is
67 statement-status (const char *) Probe that fires anytime the server
68 process updates its pg_stat_activity.status. arg0 is the new status
70 checkpoint-start (int) Probe that fires when a checkpoint is started.
71 arg0 holds the bitwise flags used to distinguish different checkpoint
72 types, such as shutdown, immediate or force.
73 checkpoint-done (int, int, int, int, int) Probe that fires when a
74 checkpoint is complete. (The probes listed next fire in sequence during
75 checkpoint processing.) arg0 is the number of buffers written. arg1 is
76 the total number of buffers. arg2, arg3 and arg4 contain the number of
77 WAL files added, removed and recycled respectively.
78 clog-checkpoint-start (bool) Probe that fires when the CLOG portion of
79 a checkpoint is started. arg0 is true for normal checkpoint, false for
81 clog-checkpoint-done (bool) Probe that fires when the CLOG portion of a
82 checkpoint is complete. arg0 has the same meaning as for
83 clog-checkpoint-start.
84 subtrans-checkpoint-start (bool) Probe that fires when the SUBTRANS
85 portion of a checkpoint is started. arg0 is true for normal checkpoint,
86 false for shutdown checkpoint.
87 subtrans-checkpoint-done (bool) Probe that fires when the SUBTRANS
88 portion of a checkpoint is complete. arg0 has the same meaning as for
89 subtrans-checkpoint-start.
90 multixact-checkpoint-start (bool) Probe that fires when the MultiXact
91 portion of a checkpoint is started. arg0 is true for normal checkpoint,
92 false for shutdown checkpoint.
93 multixact-checkpoint-done (bool) Probe that fires when the MultiXact
94 portion of a checkpoint is complete. arg0 has the same meaning as for
95 multixact-checkpoint-start.
96 buffer-checkpoint-start (int) Probe that fires when the buffer-writing
97 portion of a checkpoint is started. arg0 holds the bitwise flags used
98 to distinguish different checkpoint types, such as shutdown, immediate
100 buffer-sync-start (int, int) Probe that fires when we begin to write
101 dirty buffers during checkpoint (after identifying which buffers must
102 be written). arg0 is the total number of buffers. arg1 is the number
103 that are currently dirty and need to be written.
104 buffer-sync-written (int) Probe that fires after each buffer is written
105 during checkpoint. arg0 is the ID number of the buffer.
106 buffer-sync-done (int, int, int) Probe that fires when all dirty
107 buffers have been written. arg0 is the total number of buffers. arg1 is
108 the number of buffers actually written by the checkpoint process. arg2
109 is the number that were expected to be written (arg1 of
110 buffer-sync-start); any difference reflects other processes flushing
111 buffers during the checkpoint.
112 buffer-checkpoint-sync-start () Probe that fires after dirty buffers
113 have been written to the kernel, and before starting to issue fsync
115 buffer-checkpoint-done () Probe that fires when syncing of buffers to
117 twophase-checkpoint-start () Probe that fires when the two-phase
118 portion of a checkpoint is started.
119 twophase-checkpoint-done () Probe that fires when the two-phase portion
120 of a checkpoint is complete.
121 buffer-extend-start (ForkNumber, BlockNumber, Oid, Oid, Oid, int,
122 unsigned int) Probe that fires when a relation extension starts. arg0
123 contains the fork to be extended. arg1, arg2, and arg3 contain the
124 tablespace, database, and relation OIDs identifying the relation. arg4
125 is the ID of the backend which created the temporary relation for a
126 local buffer, or INVALID_PROC_NUMBER (-1) for a shared buffer. arg5 is
127 the number of blocks the caller would like to extend by.
128 buffer-extend-done (ForkNumber, BlockNumber, Oid, Oid, Oid, int,
129 unsigned int, BlockNumber) Probe that fires when a relation extension
130 is complete. arg0 contains the fork to be extended. arg1, arg2, and
131 arg3 contain the tablespace, database, and relation OIDs identifying
132 the relation. arg4 is the ID of the backend which created the temporary
133 relation for a local buffer, or INVALID_PROC_NUMBER (-1) for a shared
134 buffer. arg5 is the number of blocks the relation was extended by, this
135 can be less than the number in the buffer-extend-start due to resource
136 constraints. arg6 contains the BlockNumber of the first new block.
137 buffer-read-start (ForkNumber, BlockNumber, Oid, Oid, Oid, int) Probe
138 that fires when a buffer read is started. arg0 and arg1 contain the
139 fork and block numbers of the page. arg2, arg3, and arg4 contain the
140 tablespace, database, and relation OIDs identifying the relation. arg5
141 is the ID of the backend which created the temporary relation for a
142 local buffer, or INVALID_PROC_NUMBER (-1) for a shared buffer.
143 buffer-read-done (ForkNumber, BlockNumber, Oid, Oid, Oid, int, bool)
144 Probe that fires when a buffer read is complete. arg0 and arg1 contain
145 the fork and block numbers of the page. arg2, arg3, and arg4 contain
146 the tablespace, database, and relation OIDs identifying the relation.
147 arg5 is the ID of the backend which created the temporary relation for
148 a local buffer, or INVALID_PROC_NUMBER (-1) for a shared buffer. arg6
149 is true if the buffer was found in the pool, false if not.
150 buffer-flush-start (ForkNumber, BlockNumber, Oid, Oid, Oid) Probe that
151 fires before issuing any write request for a shared buffer. arg0 and
152 arg1 contain the fork and block numbers of the page. arg2, arg3, and
153 arg4 contain the tablespace, database, and relation OIDs identifying
155 buffer-flush-done (ForkNumber, BlockNumber, Oid, Oid, Oid) Probe that
156 fires when a write request is complete. (Note that this just reflects
157 the time to pass the data to the kernel; it's typically not actually
158 been written to disk yet.) The arguments are the same as for
160 wal-buffer-write-dirty-start () Probe that fires when a server process
161 begins to write a dirty WAL buffer because no more WAL buffer space is
162 available. (If this happens often, it implies that wal_buffers is too
164 wal-buffer-write-dirty-done () Probe that fires when a dirty WAL buffer
166 wal-insert (unsigned char, unsigned char) Probe that fires when a WAL
167 record is inserted. arg0 is the resource manager (rmid) for the record.
168 arg1 contains the info flags.
169 wal-switch () Probe that fires when a WAL segment switch is requested.
170 smgr-md-read-start (ForkNumber, BlockNumber, Oid, Oid, Oid, int) Probe
171 that fires when beginning to read a block from a relation. arg0 and
172 arg1 contain the fork and block numbers of the page. arg2, arg3, and
173 arg4 contain the tablespace, database, and relation OIDs identifying
174 the relation. arg5 is the ID of the backend which created the temporary
175 relation for a local buffer, or INVALID_PROC_NUMBER (-1) for a shared
177 smgr-md-read-done (ForkNumber, BlockNumber, Oid, Oid, Oid, int, int,
178 int) Probe that fires when a block read is complete. arg0 and arg1
179 contain the fork and block numbers of the page. arg2, arg3, and arg4
180 contain the tablespace, database, and relation OIDs identifying the
181 relation. arg5 is the ID of the backend which created the temporary
182 relation for a local buffer, or INVALID_PROC_NUMBER (-1) for a shared
183 buffer. arg6 is the number of bytes actually read, while arg7 is the
184 number requested (if these are different it indicates a short read).
185 smgr-md-write-start (ForkNumber, BlockNumber, Oid, Oid, Oid, int) Probe
186 that fires when beginning to write a block to a relation. arg0 and arg1
187 contain the fork and block numbers of the page. arg2, arg3, and arg4
188 contain the tablespace, database, and relation OIDs identifying the
189 relation. arg5 is the ID of the backend which created the temporary
190 relation for a local buffer, or INVALID_PROC_NUMBER (-1) for a shared
192 smgr-md-write-done (ForkNumber, BlockNumber, Oid, Oid, Oid, int, int,
193 int) Probe that fires when a block write is complete. arg0 and arg1
194 contain the fork and block numbers of the page. arg2, arg3, and arg4
195 contain the tablespace, database, and relation OIDs identifying the
196 relation. arg5 is the ID of the backend which created the temporary
197 relation for a local buffer, or INVALID_PROC_NUMBER (-1) for a shared
198 buffer. arg6 is the number of bytes actually written, while arg7 is the
199 number requested (if these are different it indicates a short write).
200 sort-start (int, bool, int, int, bool, int) Probe that fires when a
201 sort operation is started. arg0 indicates heap, index or datum sort.
202 arg1 is true for unique-value enforcement. arg2 is the number of key
203 columns. arg3 is the number of kilobytes of work memory allowed. arg4
204 is true if random access to the sort result is required. arg5 indicates
205 serial when 0, parallel worker when 1, or parallel leader when 2.
206 sort-done (bool, long) Probe that fires when a sort is complete. arg0
207 is true for external sort, false for internal sort. arg1 is the number
208 of disk blocks used for an external sort, or kilobytes of memory used
209 for an internal sort.
210 lwlock-acquire (char *, LWLockMode) Probe that fires when an LWLock has
211 been acquired. arg0 is the LWLock's tranche. arg1 is the requested lock
212 mode, either exclusive or shared.
213 lwlock-release (char *) Probe that fires when an LWLock has been
214 released (but note that any released waiters have not yet been
215 awakened). arg0 is the LWLock's tranche.
216 lwlock-wait-start (char *, LWLockMode) Probe that fires when an LWLock
217 was not immediately available and a server process has begun to wait
218 for the lock to become available. arg0 is the LWLock's tranche. arg1 is
219 the requested lock mode, either exclusive or shared.
220 lwlock-wait-done (char *, LWLockMode) Probe that fires when a server
221 process has been released from its wait for an LWLock (it does not
222 actually have the lock yet). arg0 is the LWLock's tranche. arg1 is the
223 requested lock mode, either exclusive or shared.
224 lwlock-condacquire (char *, LWLockMode) Probe that fires when an LWLock
225 was successfully acquired when the caller specified no waiting. arg0 is
226 the LWLock's tranche. arg1 is the requested lock mode, either exclusive
228 lwlock-condacquire-fail (char *, LWLockMode) Probe that fires when an
229 LWLock was not successfully acquired when the caller specified no
230 waiting. arg0 is the LWLock's tranche. arg1 is the requested lock mode,
231 either exclusive or shared.
232 lock-wait-start (unsigned int, unsigned int, unsigned int, unsigned
233 int, unsigned int, LOCKMODE) Probe that fires when a request for a
234 heavyweight lock (lmgr lock) has begun to wait because the lock is not
235 available. arg0 through arg3 are the tag fields identifying the object
236 being locked. arg4 indicates the type of object being locked. arg5
237 indicates the lock type being requested.
238 lock-wait-done (unsigned int, unsigned int, unsigned int, unsigned int,
239 unsigned int, LOCKMODE) Probe that fires when a request for a
240 heavyweight lock (lmgr lock) has finished waiting (i.e., has acquired
241 the lock). The arguments are the same as for lock-wait-start.
242 deadlock-found () Probe that fires when a deadlock is found by the
245 Table 27.50. Defined Types Used in Probe Parameters
247 LocalTransactionId unsigned int
250 BlockNumber unsigned int
255 27.5.3. Using Probes #
257 The example below shows a DTrace script for analyzing transaction
258 counts in the system, as an alternative to snapshotting
259 pg_stat_database before and after a performance test:
260 #!/usr/sbin/dtrace -qs
262 postgresql$1:::transaction-start
264 @start["Start"] = count();
265 self->ts = timestamp;
268 postgresql$1:::transaction-abort
270 @abort["Abort"] = count();
273 postgresql$1:::transaction-commit
276 @commit["Commit"] = count();
277 @time["Total time (ns)"] = sum(timestamp - self->ts);
281 When executed, the example D script gives output such as:
282 # ./txn_count.d `pgrep -n postgres` or ./txn_count.d <PID>
287 Total time (ns) 2312105013
291 SystemTap uses a different notation for trace scripts than DTrace does,
292 even though the underlying trace points are compatible. One point worth
293 noting is that at this writing, SystemTap scripts must reference probe
294 names using double underscores in place of hyphens. This is expected to
295 be fixed in future SystemTap releases.
297 You should remember that DTrace scripts need to be carefully written
298 and debugged, otherwise the trace information collected might be
299 meaningless. In most cases where problems are found it is the
300 instrumentation that is at fault, not the underlying system. When
301 discussing information found using dynamic tracing, be sure to enclose
302 the script used to allow that too to be checked and discussed.
304 27.5.4. Defining New Probes #
306 New probes can be defined within the code wherever the developer
307 desires, though this will require a recompilation. Below are the steps
308 for inserting new probes:
309 1. Decide on probe names and data to be made available through the
311 2. Add the probe definitions to src/backend/utils/probes.d
312 3. Include pg_trace.h if it is not already present in the module(s)
313 containing the probe points, and insert TRACE_POSTGRESQL probe
314 macros at the desired locations in the source code
315 4. Recompile and verify that the new probes are available
317 Example: Here is an example of how you would add a probe to trace all
318 new transactions by transaction ID.
319 1. Decide that the probe will be named transaction-start and requires
320 a parameter of type LocalTransactionId
321 2. Add the probe definition to src/backend/utils/probes.d:
322 probe transaction__start(LocalTransactionId);
324 Note the use of the double underline in the probe name. In a DTrace
325 script using the probe, the double underline needs to be replaced
326 with a hyphen, so transaction-start is the name to document for
328 3. At compile time, transaction__start is converted to a macro called
329 TRACE_POSTGRESQL_TRANSACTION_START (notice the underscores are
330 single here), which is available by including pg_trace.h. Add the
331 macro call to the appropriate location in the source code. In this
332 case, it looks like the following:
333 TRACE_POSTGRESQL_TRANSACTION_START(vxid.localTransactionId);
335 4. After recompiling and running the new binary, check that your newly
336 added probe is available by executing the following DTrace command.
337 You should see similar output:
338 # dtrace -ln transaction-start
339 ID PROVIDER MODULE FUNCTION NAME
340 18705 postgresql49878 postgres StartTransactionCommand transaction-start
341 18755 postgresql49877 postgres StartTransactionCommand transaction-start
342 18805 postgresql49876 postgres StartTransactionCommand transaction-start
343 18855 postgresql49875 postgres StartTransactionCommand transaction-start
344 18986 postgresql49873 postgres StartTransactionCommand transaction-start
346 There are a few things to be careful about when adding trace macros to
348 * You should take care that the data types specified for a probe's
349 parameters match the data types of the variables used in the macro.
350 Otherwise, you will get compilation errors.
351 * On most platforms, if PostgreSQL is built with --enable-dtrace, the
352 arguments to a trace macro will be evaluated whenever control
353 passes through the macro, even if no tracing is being done. This is
354 usually not worth worrying about if you are just reporting the
355 values of a few local variables. But beware of putting expensive
356 function calls into the arguments. If you need to do that, consider
357 protecting the macro with a check to see if the trace is actually
359 if (TRACE_POSTGRESQL_TRANSACTION_START_ENABLED())
360 TRACE_POSTGRESQL_TRANSACTION_START(some_function(...));
362 Each trace macro has a corresponding ENABLED macro.