Day 3: Memory and cache management / Backup and restore

Get to [2021-training-notes-day-2]. Get to [2021-training-notes-day-4].

Today, we will discuss:

vaccuuming
logical backup
persistence (esp during crash)

The visibility info is in the table, the index does not know about it.

Vaccuum:

scan table, identify rows you want to get rid of
scan index, remove corresponding index entries
return to heap and remove offending rows

Illustration to remember:

postgres=# SELECT ctid,xmin,xmax,* FROM accounts;
 ctid  | xmin | xmax | id | name |  amount
-------+------+------+----+------+----------
 (0,5) |  490 |    0 |  1 | brie |  9800.00
 (0,6) |  491 |    0 |  2 | plop | 10200.00
(2 rows)

postgres=# BEGIN;
BEGIN
postgres=*#  UPDATE accounts SET amount = amount - 100 WHERE id = 1;
UPDATE 1
postgres=*# UPDATE accounts SET amount = amount + 100 WHERE id = 2;
UPDATE 1
postgres=*# COMMIT;
COMMIT
postgres=# SELECT ctid,xmin,xmax,* FROM accounts;
 ctid  | xmin | xmax | id | name |  amount
-------+------+------+----+------+----------
 (0,7) |  492 |    0 |  1 | brie |  9700.00
 (0,8) |  492 |    0 |  2 | plop | 10300.00
(2 rows)

Run VACUUM (VERBOSE) accounts; for more info:

postgres=# VACUUM (VERBOSE) accounts;
INFO:  vacuuming "public.accounts"
INFO:  "accounts": found 6 removable, 2 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 493
There were 4 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_16384"
INFO:  "pg_toast_16384": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 493
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

No table row can be bigger than 8K because everything is in 8K blocks and a row should be in the same row.

OK, I have columns bigger than 8K? Enter :bread: toast technique.

When a table is created with at least one column with a data type of variable width/size, a toast table is created to store overlength variables.

When a row is added:

if tabl row is less than 2000 bytes, stored as is.
if larger, large values compressed. if compressed row is less than 200, stored compressed.
if compressed row is bigger than 2000 bytes, chunked and stored in toast table. main table contains reference to toast table.

The toast stuff happens behind the scenes. This is called detoasting.

postgres=# CREATE TABLE vactest (id integer); CREATE TABLE postgres=# SELECT * FROM generate_series(1, 10);

generate_series() is a set-returning function, a table function. It behaves like a table:

postgres=# SELECT * FROM generate_series(1, 4);
 generate_series
-----------------
               1
               2
               3
               4
(4 rows)

So cool!

INSERT INTO vactest SELECT * FROM generate_series(1,100000);

Modular arithmetic is useful:

DELETE FROM vactest WHERE id % 2 = 0;

Vacuuming Example

Do this:

postgres=# BEGIN;
BEGIN
postgres=*# COMMIT;
COMMIT
postgres=# BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN
postgres=*# SELECT * FROM accounts;
 id | name |  amount
----+------+----------
  1 | brie |  9700.00
  2 | plop | 10300.00
(2 rows)

Delete 50,000 rows and then observe that vacuum can’t get rid of the old row versions. (They might be needed by the other transaction, although the tables are unrelated.)

postgres=# DELETE FROM vactest WHERE id % 2 = 0;
DELETE 50000
postgres=# VACUUM (VERBOSE) vactest;
INFO:  vacuuming "public.vactest"
INFO:  "vactest": found 0 removable, 100000 nonremovable row versions in 443 out of 443 pages
DETAIL:  50000 dead row versions cannot be removed yet, oldest xmin: 495
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Once we commit the transaction, vacuum can do its thing.

postgres=# VACUUM (VERBOSE) vactest;
INFO:  vacuuming "public.vactest"
INFO:  "vactest": removed 50000 row versions in 443 pages
INFO:  "vactest": found 50000 removable, 50000 nonremovable row versions in 443 out of 443 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 498
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Long-running transactions in PostgreSQL are doubly bad:

prevent other transactions
prevents vacuum

Auto Vacuum

The autovacuum launcher runs in the background. It awakes periodically, checks for whether any tables need to be vacuumed and sends a worker to perform the vacuuming if that’s the case. Vacuum is designed to not create problems: no problem with concurrent data modifications. Vacuum won’t block this. It’s still resource-intensive (lots of reads, possibly some writes, CPU time). Auto vacuum is designed to be gentle.

READ about Routine Vacuuming

Vacuum removes the data but doesn’t shrink: there’s just more free space in the 8K block. The empty space is not fragmented. Empty space is always in one spot and ready for reuse with the next INSERT or UPDATE.

Vacuum isn’t for freeing up disk space.

Configuring Autovacuum

Always keep autovacuum on.

Reduce resource usage by autovacuum

Modify autovacuum_max_workers to address load due to autovacuum. Additionally, autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit also help to make sure autovacuum is not too intensive.

When will autovacuum run?

This formula determines whether autovacuum runs: autovacuum_threshold + autovacuum_scale_factor * number of rows < number of dead rows.

By default: if 20% of a table is dead rows, autovacuum runs.

READ Visualizing and Tuning Postgres Autovacuum

:bar_chart: AUTO ANALYZE

ANALYZE is part of autovacuum and has similar bits that can be toggled.

ANALYZE is much less resource-intensive than autovacuum. Make sure it’s running often enough and you’re good. The default is usually good except for things like:

Each row you add has a newer timestamp. The most recent rows will be out-of-date. You have an inaccuracy on teh rows you need the most if you are querying the more recent data. This is when you might want ANALYZE to run more often. Recall, you can do this on a per-table basis.

Taking a step back: this seems obvious but it’s really important to understand your data, your application and how they are used to make the best decisions.

Index Only Scan

Imagine a query like SELECT id FROM tab WHERE id < 1000. If we have an index based on the id: we don’t even need to go to the table and bear the pain of fetching rows from the table. That info is right in the index.

…but…

It’s not that simple. The index doesn’t know about the visibility information of an entry in the index. (The xmin and xmax stuff.)

Because we have to fetch the table row to get the xmin and xmax info, doing an index only scan is not possible. orhard (we’ll see…)

To solve this problem, PostgreSQL introduces the visibility map:

2 bits for each 8K block, one bit is “all visible” when set to 1, it means that all rows in that 8K block are visible for all transactions.

The index only scan finds an index entry, which tells it info about the block. It looks to the visibility map which is in memory and checks the “all visible” block. This is how we increase the efficiency of index only scans.

The vacuum process creates and maintains the visibility map. For efficient index only scans, make sure that vacuum runs frequently enough.

TRY using pg_visibility to get more information https://www.postgresql.org/docs/current/pgvisibility.html

postgres=# CREATE TABLE test2 (LIKE accounts) WITH  (autovacuum_enabled = off);
CREATE TABLE
postgres=# INSERT INTO test2 SELECT * FROM accounts;
INSERT 0 2
postgres=# ANALYZE test2;
ANALYZE
postgres=# CREATE INDEX ON test2 (id);
CREATE INDEX

Vacuum makes the table emptier but not smaller.

What if I want to shrink the table?

Use VACUUM (FULL) vactest;.

Note that VACUUM (FULL) is completely different from VACUUM. It makes a copy, removse the old stuff and moves the new table in place of the original.

:warning: Running VACUUM (FULL) means downtime: it’s in an access-exclusive mode. Consider this to be an emergency measure. Don’t run it out of cron. :wink:

Because it makes a copy, don’t use VACUUM (FULL) when you are on the verge of out of disk space and looking to free some up.

Q: What happens if you run out of disk space during a VACUUM (FULL)?
A: You’ll get an I/O error, which will terminate the statement and roll it back. The new file is deleted and you’ll be back to how it looked beforehand. One exception: if PostgreSQL runs out of space creating a transactio nlog file, it will crash. We’ll learn more later. A crash encountered during a VACUUM (FULL) will leave you with orphaned files which are hard to identify. Dump and restore becomes your best option.

:grinning_face_with_one_large_and_one_small_eye: Squeeze

Our tables are too big for us to do VACUUM (FULL) so we use repack. Laurenz recommends pg_squeeze, which is nicer, entirely in the database (no client).

pg_squeeze - builds compacted version in the background, grabs an access exclusive lock where possible, catches up on updates and shoves the new compacted version into place. Read more about pg_squeeze. This should be avoided if possible. Don’t design a situation where you need to do this.

Other things to do instead:

Give autovacuum more memory.
Consider HOT updates so you don’t need to run vacuum so frequently.
READ about TOAST on the PostgreSQL wiki

12:04 Break! 12:14 Back!

Is my tabel bloated? Do I need to take action?

Is it growing inordinately?
What does pg_stat_user_tables say?

SELECT * FROM pg_stat_user_tables WHERE relname = 'vactest';
\x

The \x turns on Extended display, which is beautiful:

You might have lots of dead tuples but vacuum just ran.

Do you want the full truth? :question:

Use pgstattuple:

postgres=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION
Time: 8.281 ms
postgres=# SELECT * FROM pgstattuple('vactest');
-[ RECORD 1 ]------+---------
table_len          | 21749760
tuple_count        | 150000
tuple_len          | 4200000
tuple_percent      | 19.31
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 14474892
free_percent       | 66.55

Time: 16.186 ms

Run VACUUM vactest; and then SELECT * FROM pgstattuple('vactest'); again.

Note that pgstattuple is expensive. Consider SELECT * FROM pgstattuple_approx('vactest'); instead. This takes a sample of the table and gives you a guess because pgstattuple will be using a sequential scan. (Don’t run it every 10 minutes; maybe daily.)

There are other tools that aim to estimate bloat. They are less expensive but still check in with pgstattuple.

Logical Backup

Our friend pg_dump comes in:

# pg_dump -d database_name > backup.sql
# psql -d copy -f backup.sql

This is simple. A few downsides:

Requires a per-database approach
Misses global info (like users)
Configuration files

Use pg_dumpall to dump the entire cluster (database instance). Intended for use with upgrades.

Both pg_dump and pg_dumpall are client-side tools. Run pg_dump somewhere other than the database machine. (I mean, you can keep your backups on the server. You do you…)

pg_dumpall > hey.sql

Use pg_dumpall -g to get just the global info. This may be a good complement to a single database backup with pg_dump.

Q: How consistent is a dump? Data could be modified during this time.
A: Thankfully, pg_dump uses a read-only repeatable read transaction. No matter how long it takes, pg_dump will always get a consistent snapshot.

Table-wise Restore

I want to restore “less” than the entire database.

Dump just a table: pg_dump -t accounts course

Use -s to get the schema. There are lots of options to pg_dump worth exploring.

Use pg_dump -F c -f course.dmp course to tell pg_dump to use a custom format. It’s a binary format. Use pg_restore tow ork with this file.

postgres@4b6f50d57295:~$ pg_dump -F c -f howdy.dmp copy
postgres@4b6f50d57295:~$ file howdy.dmp
howdy.dmp: PostgreSQL custom database dump - v1.14-0

Use something like pg_restore -d copy howdy.dmp.

Ooooh! You can restore a .dmp to a file!!!!!

You can generate a .sql from a custom dump. This is so cool:

pg_restore -f - howdy.dmp  | less

With the custom format dump, the data are compressed.

Maybe always use the custom format and convert to .sql if you need to. With a custom format, you can restore only part of what you dmped.

pg_restore -f - -t accounts course.dmp

To only restore the schema: pg_restore -f - -s course.dmp

Alternately, you can use the directory format: pg_dump _F d -f course.dir course

We have looked at a few formats for backups:

.sql –> What we are all used to.
custom format –> can’t use parallel jobs
directory format –> can use parallel jobs, possibly the coolest format, IMO.
tar/tile format –> no advantages over the others

The directory format enables the -j flag which lets you have multiple resotre processes.

postgres@4b6f50d57295:~$ pg_dump -F d -f backup copy
postgres@4b6f50d57295:~$ tree backup/
backup/
|-- 2965.dat.gz
|-- 2966.dat.gz
|-- 2967.dat.gz
|-- 2969.dat.gz
|-- 2970.dat.gz
`-- toc.dat

0 directories, 6 files

So, pg_dump is great: simple, good but there’s one decisive disadvantage:

It only captures one point in time. (You lose everything between when you took the backup and when the bad thing happened.)

How PostgreSQL Handles Persistence

This is useful and background so we can understand how alternatives methods of backing up PostgreSQL work.

Below the line: persistence world, disk Above the line: memory and processes

There are data files on disk that persist the tables. 8K blocks in files contaiining data.

Above the line we have a cache (shared buffers in Postgres). Used to cache blocks for all databases, shared by all db processes. We also have a worker backend process working on behalf of a client connection to execute queries.

We want to read.
Block is copied up to shared buffers, read and returned.

A contradiction, conflicting interests:

We want to cache writes but we also want Durability.

A simple (but poor approach) write all involved tables when I COMMIT;. Works but poor performance.

How do we resolve this?

There’s a single good architecture that goes by other names.

Below the line, we have Write Ahead Log (WAL). This file (on disk) is written. tosequentially, contains a log of all changes that have happened. All DBs have this (also caled transaction log, binary log, redo log).

You must always put in the log and then write. This is low-level information (not the SQL statement), write what it takes to modify the files form before to after.

Using WAL

backend writes to WAL buffers (in memory)
backend is then free to modify the block
WAL writer takes info from WAL buffers to Write Ahead Log (This could come above step 2.) a. Confirm WAL info is persisted, then COMMIT can proceed
Way after COMMIT;, the WAL buffer is written down to disk ( in the data files, not the WAL). The checkpointer and background writer take care of this.

:warning: Writing to the Write Ahead Log does not necessarily mean that the info has been written to disk because PostgreSQL uses buffered I/O; the info could be with the kernel. We use fsync at COMMIT; time to force WAL down to disk.

Way after COMMIT;, the WAL buffer is written down to disk ( in the data files, not the WAL). The checkpointer and background writer take care of this.

A checkpiont is a point. intime whena ll dirty info in shared buffers is writen down. todisk and cleaned. This is marked in the WAL so PostgreSQL knows anything older than this sequence number has already been writen to disk. You only need to replay everything that came after the latest checkpoint. (This is how crash recovery works.)

Most non-WAL activities are written to disk by the checkpointer. After checkpointer has completed, buffers are clean. AS time passes, buffers get dirty.

Clean buffers are vital for good performance. Dirty buffers mean the backend must write dirty buffers down and then read into. aclean buffer in order to get its query handled. OMG. We introduce the background writer for this.

background writer looks around in dirty buffers, cleans them, and aims. tomake sure we have enough clean buffers in between checkpoitns so no one has to wait for a clean buffer.

An overeager background writer means mroe writes than are necessary: no thanks!

The background writer goes front to back through the entire shared buffer, slowly, over time. It shouldn’t do any more work than is required to make sure buffer is clean when needed. (The background writer is not overly important. )

Docs: Background Writer

### Terminology

We’ll use page, block and buffer interchangeably but we’ll always mean those 8K blocks. When on disk, it’s a block; when in memory, a page or buffer.

12:59 Break! 13:17 Back!

## Data Directory Backups

The name of a table isn’t the real name, the object ID is the right identifier. (The table name can be changed).

 SELECT oid FROM pg_class WHERE relname = 'ts';

Try SELECT oid, relfilenode FROM pg_class WHERE relname = 'ts';

A full vaccuum will cause a change.

 postgres=# SELECT oid,datname FROM pg_database;
  oid  |  datname
-------+-----------
 13395 | postgres
     1 | template1
 13394 | template0
 16470 | copy
(4 rows)

$ pwd && ls
/var/lib/postgresql/13/main/base
1  13394  13395  16470  pgsql_tmp

The _vm files inside 16470 are the visibility maps.

:warning: You can’t just copy around the contents of that data directory and expect the righ things to happen.

The pg_control file contains metadata about the database. It’s useful for crash recovery. The pg_controldata command is used to read pg_control.

Ack, I don’t have pg_controldata but this. iswaht would work:

pg_controldata -D /var/lib/postgresql/13/main/
-bash: pg_controldata: command not found

Let’s look. atthe write ahead log in /var/lib/postgresql/13/main/pg_wal. It’s a bunch of files, not just one. One of those files is active, you can identify it by looking for the most recent timestamp.

There are other WALs for future use, these are reserves to improve performance. When the active WAL is full, we switch to the next one and write to it.

If a WAL segment that are younger than the latest checkpoint, it is then useless. At the end of a checkpoint, we delete unneeded WAL segments. The database will crash if it hits an I/O error writing the next WAL segment.

In production, consider having pg_wal on a different filesystem. Whiel database is down, move the data and symlink to the new place.

Default WAL size is 16 MB. If you have lots of data modification, you’ll. havelots of WAL switches. Not expensive but you may wish to increase the size of WAL segments.

Use pg_waldump to take a look. So freaking cool!

 root@4b6f50d57295:~# /usr/lib/postgresql/13/bin/pg_waldump /var/lib/postgresql/13/main/pg_wal/00000001000000000000002C  | tail
pg_waldump: fatal: error in WAL record at 0/2C774E98: invalid record length at 0/2C774ED0: wanted 24, got 0
rmgr: Heap2       len (rec/tot):     56/    56, tx:          0, lsn: 0/2C774C40, prev 0/2C774C00, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 44
rmgr: Heap2       len (rec/tot):     56/    56, tx:          0, lsn: 0/2C774C78, prev 0/2C774C40, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 67
rmgr: Heap2       len (rec/tot):     56/    56, tx:          0, lsn: 0/2C774CB0, prev 0/2C774C78, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 71
rmgr: Heap2       len (rec/tot):     62/    62, tx:          0, lsn: 0/2C774CE8, prev 0/2C774CB0, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 76
rmgr: Heap2       len (rec/tot):     58/    58, tx:          0, lsn: 0/2C774D28, prev 0/2C774CE8, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 77
rmgr: Heap2       len (rec/tot):     70/    70, tx:          0, lsn: 0/2C774D68, prev 0/2C774D28, desc: CLEAN remxid 0, blkref #0: rel 1663/16470/1255 blk 80
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/2C774DB0, prev 0/2C774D68, desc: RUNNING_XACTS nextXid 590 latestCompletedXid 589 oldestRunningXid 590
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/2C774DE8, prev 0/2C774DB0, desc: RUNNING_XACTS nextXid 590 latestCompletedXid 589 oldestRunningXid 590
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/2C774E20, prev 0/2C774DE8, desc: CHECKPOINT_ONLINE redo 0/2C774DE8; tli 1; prev tli 1; fpw true; xid 0:590; oid 24576; multi 1; offset 0; oldest xid 478 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 590; online
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/2C774E98, prev 0/2C774E20, desc: RUNNING_XACTS nextXid 590 latestCompletedXid 589 oldestRunningXid 590

An online backup creates a checkpoint from which you recover. WAL segments are useful for recovering from an old checkpoint. You can recover everything but you don’t need to.

Let’s get started.

Archive transaction logs.

#wal_level = replica                    # minimal, replica, or logical

minimal is enough for crash recovery.

Do not turn off fsync. Yes, your database is faster but you are toast (not in the good way) if it crashes. :bread:

Let’s try something :microscope::

postgres@4b6f50d57295:~$ time psql  -d copy  -q -f insert.sql

real    0m12.576s
user    0m0.433s
sys     0m0.909s

This takes forever because fsync runs after each comit. PostgreSQL is running in auto-commit mode so it’s writing after every commit.

insert.sql is just 30,001 INSERT statements. Put BEGIN; at the beginning and COMMIT; at. theend and observe the improvement:

:~$ time psql  -d copy  -q -f insert.sql

real    0m3.836s
user    0m0.478s
sys     0m0.872s

Much better! :tada:

Let’s look at synchronous_commit to see what impact it has. With it off, not every commit syncs to disk. Safe to use because syncs will occur and WAL is still synced/flushed to disk occasionally (just not automatically withe very commit).

Let’s set it

# grep synchronous_commit /etc/postgresql/13/main/postgresql.conf
synchronous_commit = off                # synchronization level;

Let’s use pg_ctl reload to tell the daemon to check for a new config.

OK, I removed the single transaction from insert.sql and the performance with synchronous_commit is much better:

time psql  -d copy  -q -f insert.sql

real    0m4.033s
user    0m0.365s
sys     0m0.660s

However, we don’t want to lose data so let’s turn synchronous_commit = on again. Let’s consider commit_delay and commit_siblings instead.

Check for other siblings transactions and sync them together. This means fewer syncs. Commits will take longer (it won’t return until the flush. isdone). It’s tough to tune commit_delay and commit_siblings well so it won’t be as :rocket: blaxingly fast as with synchronsou_commit = off but still better.

Don’t change full_page_writes = on. Partial pages are difficult to recover from. These are called torn pages. You may wish to compress the full-page writes: IDK, wanna trade CPU time for I/O? You must make a decision somewhere.

Checkpoints :checkmark:

If more than max_wal_size has been written since the last checkpiont, PostgreSQL triggers a checkpoint. Additionally, checkpoint_timeout.

See checkpoint_completion_target. PostgreSQL can spread checkpoint process out. With the default of 0.5, every checkpoint should be done halfway before the next checkpoint. It’s customary to increase this parameter and with 14, it will be increased to 0.9. You’ll have checkpoint activity much more often (up until 90% of the way to the next checkpoint). If you have checkpoitns every 10 minutes:

old style it can’t go more than 5 minutes
new style, up to 9 minutes

I hope that helps!

13:51 Break! 14:03 Back!

Great filesystem-level backup:

Stop server
Copy data directory
Start server

But I don’t want to stop the server!

Fair!

Let’s configure PostgreSQL to

Turn archive_mode = on, this requires restarting the database.

archive_command is nice but scary. You just put any command there and go.

archive_mode = on               # enables archiving; off, on, or always
                                # (change requires restart)
#archive_command = ''           # command to use to archive a logfile segment
                                # placeholders: %p = path of file to archive
                                #               %f = file name only
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

archive_command = 'cp %p /var/lib/postgresql/walarchive/%f'             # command to use to archive a logfile segment

PostgreSQL only checks the return code. You could add /bin/true and PG would be like, yeah, great job, archiving completed.

Archival failure does not mean that the segment is deleted. PG will keep trying, back off, try again, it won’t just skip and try for the next one. Why? We need an unbroken stream of WAL segments.

Activate Extended display for a single command with:

SELECT * FROM pg_stat_archiver  \gx

Yep, instead of the ;.

TODO Take screenshots from when I broek and fixed the WAL archiver stuff.

pg_basebackup -c fast -D backup

Copy files from server to client. Cool but also….OK. Some things will be pissing like postmaster.pid (I don’t care about the pid from the server). The results:

In addition, we have some new files like backup/bbackup_manifest. This is used by pg_verifybackup. We also have backup/backup_label.

We could just start up from the contents of pg_basebackup but…..things will change while pg_basebackup is running. Don’t do it, please, for you, not for me. :cat:

Anything after the WAL LOCATION and CHECKPOINT LOCATION – could be consistent, might not be. That’s why we then recover things (with WAL) until we get to the point. Thankfully, we can choose a specific point of time and restore to it. So, 3 hours after the pg_basebackup ran or…..5 hours after.

The CHECKPOINT LOCATION tells us where in the WAL the checkpoint is located.

We recover from the WAL LOCATION.

Q: When is the earliest point that we can safely stop recovering?
A: The time the online backup process ended. (You’d recover from the backup and then from the WAL for things that were backed up at 12:04.) You. cancontinue recovering after that point but you don’t have to.

^^^ This was tricky.

pg_dump is granular but pg_basebackup: it’s all or nothing for backups and restores. Sorry!

We can specify restore_command to tell PG what command to use to restore.

With no Recovery Target info, it’ll keep recovering until either there is no file or it encounters garbage. Let’s specify a Recovery Target.

Setting it to immediate says make sure the backup is consistent. Huh?

Set recovery_target_time = 2020-09-01 14:20:00 to tell it the timestamp to recover up to.

Set recovery_target_action. If you set it to pause, it’ll bring the DB up to recovery mode so you can check the recovery looks OK. Otherwise, set it to promote to just kick it into production.

recovery.signal –> is it gone? Everything is OK?

The recovery process will rename backup_label to backup_label.old.

Cool! I did the rm too soon so….here’s how the recovery looked on his machine.

pg_dump vs pg_basebackup

pg_dump is more flexible, has to do a sequential scan, do a bunch of stuff that’s slower than pg_basebackup

pg_basebackup – generally faster than pg_dump, larger backup [bloat, indexes (in pg_dump an index is a CREATE INDEX statement but with pg_basebackup, is the whole actual index)]. single-threaded, no good wit huge databases (10TB).

Parallel Universes

Restore to a certain point, then new incarnation of database is created before the bad thing you want to recover from happened. This is how you get timelines 1 and 2.

The timeline is part of the name of the WAL segments.

Observe:

Say TL2 was created too soon. TL3 recovers later on:

Each recovery attempts creates a new timeline. There is always only one current timeline. When recovering, always follow the path to a new timeline. With current, stay on the current timeline. Or specify a timeline ID (follow each fork to et to this timeline). You shouldn’t need to worry too much about this unless you are in a really bad place. Time to call or hire a DBA? :telephone:

The .history files describe the history of a particular timeline.

There was a branch at this point and. wecreated timeline 2:

/Users/brie/Desktop/Screen Shot 2021-09-01 at 2.47.52 PM.png

There are existing non-core tools for automating elements of this. Laurenz recommends pgBackRest. See more at pgbackrest.org. You have a dedicated pgBackRest server, tell postgresql.conf a command to talk to that server. It handles retention as well.

Add this above where it belongs:

pg_stat_user_tables has statistical data that accumulate over time. It is cheap to query. pgstattuple takes a momemtary snapshot of the table and scans it. That is something different, and it is expensive. But of course nobody keeps you form joining both in a query to present a combined result.

Laurenz Albe

TEST YOUR BACKUPS. OMG.

Q: A customer experiences corruption when running a SELECT. They don’t have a backup. What options are available?
A: It depends. First, make a faithful copy of the files and experiment on the copy. Find someone. whoknows a lot about PostgreSQL. It won’t be fun. Hire an expert or cut your losses. :scissors:

We probably won’t cover Replication. A few words:

Replication is basically just recovery. The differences of note:
- It doesn’t stop recovering, it just waits for more.
- WAL archivse are not restored, they are fetched straight from the mains erver.

Say PostgreSQL won’t start because you are out of disk space. The correct answer is to increase disk space.

Before major version 10, pg_wal used to be pg_xlog (transaction log). sacct used to be called clog for commit log.

People would delete those things because people would think they were just unnecessary log files and delete them. YIKES! :scared:

hello, world. | notes.carranza.engineer