pgbadger PGSQLPhriday

Posted on Jul 6, 2023

This week, #PGSQLPhriday is hosted by Alicja Kucharczyk. Every month, one community member proposes a new subject to this monthly blogging event and let the world (or your family/friends/neighbors if you prefer) know all about pgBadger. It's a tool to analyze your PostgreSQL logs and present you a nice web report.

If you like some history, it has been developed by Gilles Darold since more than 11 years, as v1.0 came along on June the 10th in 2012. At this time, pgfouine was the main log analyzer and the complete Perl rewrite was greatly performance influenced. In V4, it started to have its current look, by embarking the Bootstrap library and fonts.

This little caterpillar:

2023-07-05 12:04:33.245 CEST [734] LOG:  starting PostgreSQL 15.1 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2), 64-bit
2023-07-05 12:04:33.245 CEST [734] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-07-05 12:04:33.245 CEST [734] LOG:  listening on IPv6 address "::", port 5432
2023-07-05 12:04:33.246 CEST [734] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-07-05 12:04:33.249 CEST [734] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-07-05 12:04:33.259 CEST [760] LOG:  database system was interrupted; last known up at 2023-05-13 15:27:30 CEST
2023-07-05 12:04:33.467 CEST [760] LOG:  database system was not properly shut down; automatic recovery in progress
2023-07-05 12:04:33.471 CEST [760] LOG:  redo starts at 0/4D0B43B8
2023-07-05 12:04:33.471 CEST [760] LOG:  invalid record length at 0/4D0B44A0: wanted 24, got 0
2023-07-05 12:04:33.471 CEST [760] LOG:  redo done at 0/4D0B4468 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-07-05 12:04:33.473 CEST [758] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-07-05 12:04:33.478 CEST [758] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB
2023-07-05 12:04:33.484 CEST [734] LOG:  database system is ready to accept connections

is then transformed into this beautiful butterfly.

As a stand-alone Perl script, it is available on all good platforms and comes with a truck-load of command-line parameters (93 according to the documentation) and features. The report will include information about:

  • statistics;
  • events;
  • sessions;
  • queries: slowest queries ,most frequent waiting ones,

You can parse remote files, auto-generate incremental reports, generate Tsung files, … I usually have a very simple use case and I most commonly use it this way:

  pgbadger --jobs 4 --format stderr myfile.log --outfile 20230707_pgbadger_myinstance.html

It is a tremendously useful script that has saved countless DBAs out there, including myself. If everyone one of us bought Gilles a coffee for this, we should keep him overly caffeinated for many years to come. Why is the tool useful ?

  • It's very good at giving you a high level overview of how your instance behaved. It's even better if you have a baseline to compare;
  • Your log files can be millions of lines long… grep will help but will you search for all relevant information ? You can't possibly do so for all PostgreSQL errors but pgbadger will present them nicely for you;
  • It can complement your monitoring as a metrology solution, as it has access to useful data your monitoring tool is usually not handling;
  • It is easy to share the huge amount of data it contains with someone else. Sharing the full Postgres logs, your complete monitoring dashboards… might not be feasible/allowed.
  • As it is visual, it's easier to share information with less technical people who will greatly prefer graphs over wide and numerous lines of logs.

Is it perfect and the cure to climate change ? Unfortunately no:

  • If you have a performance issue and need pgBadger to track the queries, you want to catch them all in your logs, meaning you now have an even bigger performance problem (and possibly a disk one);
  • I have had little usage of some of the pie-charts;
  • I would sometimes like to mix some graphs together, like what is the histogram of session times for a specific user. SQL and width_bucket came to the rescue to answer this question.
  • Knowing how queries changed (calls, duration) throughout the day would be useful;

But please add it to your toolbet and complement it with proper monitoring and extra tools. And don't forget, it will only be as useful as what it can parse.

If you want to know more about pgbadger or caterpillars: