Get to [2021-training-notes-day-3]. Get to [2021-training-notes-day-5].
When recovering, WAL is retrieved from the archive. You could do replication that way but it would be super annoying. Instead, enter streaming replication.
Replication is recovery (of a type) that never ends.
Streaming replication introduces two proceses:
When soemthing htis WAL buffers, it goes to the WAL sender, is received by the WAL receiver and then sent to WAL on the standby. First to the WAL, and then to the data – remember.
The startup process that helps with recovery replays the data into the standby. This is a fundamentally async process.
The standby acts as a client to the primary. PG has two protocols, both on 5432, it seems:
pg_basebackupThe config on the primary and the secondary should be as similar as possible.
pg_createcluster standby
rm -rf /var/lib/postgresql/13/standby/
pg_basebackup -D 13/standby # Replace with contents of `pg_basebackup`>
Modify postgresql.conf.
max_wal_senders –> more than 0.
wal_level can not be minimal, must be replica or logical.
Configure primary_conninfo = '', this is a connection string.
With the streaming replication protocol, you connect to the cluster, no database needed.
touch 13/standby/standby.signal
Replicate and then recovery mode until I tell you to promote.
systemctl start postgresql@13-standby
Observe walsender and walreceiver processes:


sent_lsn position received by the stand by
write_lsn psition written by standby
flush_lsn what’s persisted and definitely safe
replay_lsn what position in the WAL has already been replayed and is visible on the standby server.

first 8 hex digits –> denotes the timeline
second 8 hex digits –> a number that corresponds to the one before the slash in the _lsn
first two hex digits after slash in _lsn are the last two digits of the WAL segment name
remaining six digits in _lsn are the offset.
SELECT pg_current_wal_lsn(); to get the current LSN.
:watch: psql has watch!
\watch 1 SELECT * FROM whatever;
Measure replication lag in bytes, not time. Time is unreliable for this purpose.
docker run –name postgres_training
-e POSTGRES_PASSWORD=mysecretpassword
-d postgres:13
Stop the standby.
There are checkpoints (actually called restart points) on a recovering database. This makes an interrupted recovery less annoying. Resume recovery from that point, rather than the beginning. This is clearly essential for replication (it super needs to recover from where it left off, not the beginning, where possible).
CREATE TABLE accounts (id bigint PRIMARY KEY, name text NOT NULL, amount numeric(15,2) NOT NULL);
If you delete a WAL segment needed by the secondary…please don’t. Rebuilding the secondary becomes an option.
The restore command can not be changed without restarting the server.
Use wal_keep_size to archive unneeded WAL to ensure it’s available for the standby, to make it easier if it falls too far behind. This is usually unnecessary. If standbydowntime is longer than wal_keep_size accounts for, you’ll still have trouble. Enter the replication slot. It marks a position in the WAL stream. The primary doesn’t have any memory of the standby. (It’s essentially a client.)
The primary should not discard any WAL younger than the replication slot. The replication slot is an indicator of ho much WAL the secondary has.
SELECT pg_create_physical_replication_slot('slot');
Consider max_slot_wal_keep_size –> override the replication slot indicator to avoid disk space pile up. This will tell PostgreSQL to nuke that WAL even if the standby has not caught up.
With synchronous replication, COMMIT does not return until the synchronous standby has received the WAL. Doing this across continents: everything will take forever. This only works when the primary and the standby are close together in the network. the other downside of synchronous replication: the availability of the entire system goes down. A failure on the standby means a failur eon the primary.
Use synchronous replication if you can not afford to lose a single transaction (fail over occurs before sync completes). Most people are using async.
The application_name field can be used to tell the PG server who I (the client) am. This is handy for replicas. Well-behaved applications set this name. This is used as a convention to name the standby server. You can set this in the connection string.
There are a range of options for synchronous_commit when it’s on.
on – remote flush, waits for standby to write to WAL file and flush to disk
remote_apply –> wait until standby has replayed information and that info is now visible.
If you query the standby immediately after making a change, it may not be available with the default settings for synchronous_commit. You want remote_apply for that.
12:01 Break! 12:12 Back!
Good use cases for streaming replication:
remote_apply levels. Yeah, it would work, but…enter the replication conflict. VACUUM on the primary during a long running query on the second.max_standby_streaming_delay governs how long you wait before canceling a query and resuming streaming of WAL. The default value is 30s. Consider your report that runs on this replicated database. Please please please don’t cancel it 30 seconds in. You can make this indefinite with -1.
Primary goal: replication doesn’t lag too much.
hot_standby Setting this to off disallows queries during recovery.
The name of the game here is preventing conflicts.
hot_standby_feedback – Send oldest query blocking VACUUM. No more conflicts due to VACUUM. It’s not on by default because a very long running query on the replica could cause bloat on the primary server. The primary will know the oldest query.
hot_standby_feedbackremote_apply –> Please don’t, says LA. You’ll have lots of waiting around all the time.
You can have an intentionally delayed standby server. This is helpful to recover from logical problems, assuming you notice the problem within the intentional delay period. Possible appraoch for quick recovery.
Every second, tell Postgres to sleep for 10 seconds.
\watch 1 SELECT pg_sleep(10);
Temporarily suspend replication: SELECT pg_wal_replay_pause();
resume with SELECT pg_wal_replay_resume();
There is no cluster-ware in PostgreSQL but the tools to make it happen are available. Here’s a manual failover:
#promote_trigger_file = '' # file name whose presence ends recoverypsql commandSELECT pg_promote();HAproxy or similar. Alternately: this can be done with connection strings. If I can’t get to the first, talk to the second.pg_rewind.psql 'host=/var/run/postgresql,/var/run/postgresql port=5432,5433 user=postgres'
Connect with a string like the above and observe the output of \conninfo to see which you are connected to. That approach is cool but makes it possible to wind up connected. tothe stand by and why? It’s read-only. Enter target_session_attrs, an addition to the connection string:
psql 'host=/var/run/postgresql,/var/run/postgresql port=5432,5433 user=postgres target_session_attrs=read-write'
With the above, you’ll only connect to a database that is read-write. The clients do the handovers. Any client that links with libpq can do this. Possible with different syntax with GDBC.
A few ways to stop the server:
pg_ctl stop there are three shutdown modes: smart, fast immediate. The default is fast.
fast transactions. areall cancelled, client sessions are interrupted, checkpoint made, goes down. Default.smart no new client connections, existing connections not interrupted. Once last client connection closes, DB shuts down. Not used too often.immediate evil mode. Go down as quickly as you can, no clean up. Crash the server. Crash recovery will be required ons tartup. Bad for production, fun for testing. No fsync.pg_ctl stop -m immediate -D /etc/postgresql
Run pg_rewind locally on the failed primary.
Running pg_rewind:
pg_rewind --target-pgdata=13/main --source-server='port=5433 user=postgres'

Replication slots are only on the primary, they are not replicated. You’ll need to add it again when failing over to a new primary!
12:55 Break! 13:10 Back!
What we use and they recommend for HA and cluster-ware.
Written by Zalando for their own purposes, taken off and now highly recommended.
PostgreSQL has a shared mapping architecture, which means there are no shared components. Good for reducing single points of failure.
Patroni runs on both the primary and the secondary. Patroni handles starting and stopping PostgreSQL for you. It does the failover and the configs. No more postgresql.conf editing. You tell Patroni what to do and Patroni does it for you.
Use etcd [or Consul (if you’re GitLab)]. There is a third box: three instances of etcd, implements the RAFT protocol which requires an odd number.
Patroni uses the key-value store (provided by etcd or whatever). See diagram. If it can’t talk to the etcd running on itself, that’s OK, it’ll talk to another.
ONe Patroni is the Leader (primary) and the others are Followers (secondaries). They vote regularly. Config is distributed across all instances in the key-value store.
etcd doesn’t need to run on the same machine.
the solo etcd host can be a little EC2 or whatever. It just needs fast disk so it isn’t seen to be nonresponsive.

View users:
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Think of it as root. There’s nothing preventing access to the underlying filesystem. Applications should not ever connect as postgres.
There is no big difference between users and user groups: users can log in to the system and groups do not log in. Regardless, users and groups are caled roles. Call CREATE ROLE to create a user.
postgres=# \h CREATE ROLE
Command: CREATE ROLE
Description: define a new database role
Syntax:
CREATE ROLE name [ [ WITH ] option [ ... ] ]
where option can be:
SUPERUSER | NOSUPERUSER
| CREATEDB | NOCREATEDB
| CREATEROLE | NOCREATEROLE
| INHERIT | NOINHERIT
| LOGIN | NOLOGIN
| REPLICATION | NOREPLICATION
| BYPASSRLS | NOBYPASSRLS
| CONNECTION LIMIT connlimit
| [ ENCRYPTED ] PASSWORD 'password' | PASSWORD NULL
| VALID UNTIL 'timestamp'
| IN ROLE role_name [, ...]
| IN GROUP role_name [, ...]
| ROLE role_name [, ...]
| ADMIN role_name [, ...]
| USER role_name [, ...]
| SYSID uid
URL: https://www.postgresql.org/docs/13/sql-createrole.html
CREATE ROLE plop LOGIN PASSWORD 'cutecats';
Don’t do this:
Use \password plop after CREATE ROLE plop LOGIN;
Get all roles:
postgres=# \duS
List of roles
Role name | Attributes | Member of
---------------------------+------------------------------------------------------------+--------------------------------------------------------------
pg_execute_server_program | Cannot login | {}
pg_monitor | Cannot login | {pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables}
pg_read_all_settings | Cannot login | {}
pg_read_all_stats | Cannot login | {}
pg_read_server_files | Cannot login | {}
pg_signal_backend | Cannot login | {}
pg_stat_scan_tables | Cannot login | {}
pg_write_server_files | Cannot login | {}
plop | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
The Cannot login objects are roles. Grant access to these system roles instead of granting superuser.
Experiment with pg_authid to get more info:
postgres=# SELECT rolname, rolpassword FROM pg_authid WHERE rolname LIKE 'plop';
rolname | rolpassword
---------+-------------------------------------
plop | md5edd3bc15c6025ae3a4570fdf0061e20b
(1 row)
MD5. Sad. :cry: :sad:
With 14, scram-sha-256 will be the default. :tada:
Every user can see pg_ table names. People don’t like it. There are exceptions like pg_authid, which can only be read by the **Superuser.
listen_addresses = 'localhost'
That’s the server interface. Accepts a comma-separated list of server interfaces. With the above, it’s only listening on the loopback (TCP connections are only allowed from localhost –> no remote access.
Update to listen_addresses = '*' to listen on all available interfaces, requires a restart.
When a client connects:
Entries are read top-to-bottom. First matching entry is chosen and determiens how client must authenticate.
local socket connectionshost TCPThere are dedicated entries for replication.
The HBA file has these columns:
This shows us why psql doesn’t need a password on localhost with sockets but not TCP:

This is the bit of PostgreSQL I never fully grokked.
You can have an authentication method of reject to specifically reject someone. Also, no matching entries means that a rejection will occur.
trust –> come on in! no checkingpeer –> come on in! only for local connections and only when DBNAME and local username match. (postgres can run psql no problem but root can not.)md5 and scram-sha-256, challenge/response. client hashes password with known salt to prevent password from being transmitted.Users are shared by all databases but not every user can connect to every database. pg_hba.conf permits control of this. Password authentication is second-rate among the other available options. The available options permit GSSAPI, PAM, RADIUS, LDAP, all the good stuff!
Every object in PostgreSQL has an object (the person who created it). The owner can do anything with an object and other users must be granted access. However, only the owner can ALTER or DROP, regardless of GRANT. Permissions are stored on the object.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
PostgreSQL file system permission inspired by Unix file system permissions.
To that end, a schema is a directory (a place whre a file or table could go). No nested schemas. Each database object is always in some schema.
cutecats=# \dn+ myapp
List of schemas
Name | Owner | Access privileges | Description
-------+----------+-------------------+-------------
myapp | postgres | |
(1 row)
Nothing shown in Access privileges means it has the defaults. Sad.
Schem privs:
GRANT USAGE ON SCHEMA myapp TO plop;
cutecats=# \dn+ myapp
List of schemas
Name | Owner | Access privileges | Description
-------+----------+----------------------+-------------
myapp | postgres | postgres=UC/postgres+|
| | plop=U/postgres |
(1 row)
After the slash is who granted the priv, the bit before the = is who got the priviege and the bit between = and / is the privilege:
plop=U/postgres
The user postgres granted the user plop the U __.
Now, let’s look at the public schema:
cutecats=# \dn+ public
List of schemas
Name | Owner | Access privileges | Description
--------+----------+----------------------+------------------------
public | postgres | postgres=UC/postgres+| standard public schema
| | =UC/postgres |
(1 row)
The _
One of the first things you do after creating a database should be to change the schema.
cutecats=# REVOKE CREATE ON SCHEMA public FROM PUBLIC;
REVOKE
cutecats=# \dn+ public
List of schemas
Name | Owner | Access privileges | Description
--------+----------+----------------------+------------------------
public | postgres | postgres=UC/postgres+| standard public schema
| | =U/postgres |
(1 row)
Do the same thing on template1 so new DBs won’t have the bad lame public schema.
\c template1
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
Database Cluster –> Databases –> Schema
Each database has its own schemas.
What MySQL calls a database is a schema elsewhere. WHAT?
13:58 Break! 14:11 Back!
There are also system schemas. Viwe with \dnS.
pg_catalog –> preexisting system types, operators, system functions, catalog tables all live here.pg_toast –> Toast tables for overlength attributes.pg_temp_2 and pg_toast_temp_2 –> implementation detailsinformation_schema –> Interesting for the develoer.database-independent standard-confirming way to access database metadata.
Use this not that:
THAT template1=# SELECT * FROM pg_attribute \gx THIS template1=# SELECT * FROM information_schema.columns;
public schema?search_path parameter.template1=# SHOW search_path;
-[ RECORD 1 ]----------------
search_path | "$user", public
Time: 0.283 ms
Relative path: SELECT * FROM accounts; Absolute path: SELECT * FROM public.accounts;
If you do SELECT * FROM accounts;, it will find the first table in a schema on the search_path
Available options when querying, you can:
search_path never gets in your way)search_path is rightThis is answering a lot of questions! The most useful day so far.
GRANT { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER }
REFERENCES allows people to generate a foriegn key that points to their tableTRIGGER don’t use, lets someone else create a trigger on your tableWe have a two-user architecture.
A three-user architecture:
This three-user architecture would make migrations more difficult.
To achieve row-level security, a policy must be defined.
ALTER TABLE rls ENABLE ROW LEVEL SECURITY;
CREATE POLICY coolstuff ON rls FOR SELECT TO joe USING (id % 2 == 0);
The user joe will be able to see the results of the query that they apply and a filter based on the contents of USING() above.

Super user and table owner are not subject to row level security
There is a BYPASSRLS optoin when creating a role.
You can also turn row security off
SHOW row_security;
Minor upgrades are easy: always do them. Install new software, restart the database. Happiness! Like, pg_dump not required ( do it anyway probably).
Due diligence: read the release notes.
The simplest method:
pg_dumpallinitdbpsqlThis leaves out configuration, remember!
You can just copy pg_hba.conf around between versions. This is not advised with postgresql.conf:
LA recommends comparing old and new, looking them up and configuring the new. It’s an opportunity to review your config.
Use the newer pg_dump where possible.
Another way of upgrading…
pg_upgrade –> introduced in 8.x
pg_upgrade which will extract metadata from old database (only catalog tables), then copies data files as-is.Faster
only works if same storage format
things that influence how data is stored must also be the same (-k during initdb) –> You would have to use pg_dump and restore in that case.
This approach can still be painfully slow. Enter pg_upgrade -?.
With -k, --link passed to pg_upgrade, it hard links instead of copying files to the new cluster. Up to you whether you want to do this but with this approach, you don’t have a known old good db to go back. Old server can not be started. Fall back from upgrade with pg_upgrade --link requires recovering from a rsetore or failing over to a replica or something.
-k, --link link instead of copying files to new cluster

The statistics are not moved over. Instead, you get a analyze_new_cluster.sh script to help guide the process for you.
So, pg_upgrade is super fast (neer more than 2-3 minutes, he says but huh???)
anyway, say you don’t even want those 2-3 minutes:
logical replicatoin an extension that ships logical informationto the standby, reverse enginers WAL to figure out the right thing to do usually for replicating part of a database useful if physical differences are OK streaming replication across PostgreSQL versions
This options reduces downtime as much as possible but at the cost of simplicity.
smart rarely used?