Disk space usage in a PostgreSQL cluster
The PostgreSQL is stored on the cluster disks:
- temporary files;
- data files;
- transaction logs (WALs);
- logical replication slots;
- DBMS logs;
- The system files required for PostgreSQL to function.
Part of the disk space is reserved for service needs:
- for the file system — approximately 4% of the disk volume;
- for the operating system, service components and logs — approximately 8 GB.
The reserved portion of the disk space is not available to host databases. Keep this in mind when selecting a configuration lineup.
You can monitor disk occupancy using disk occupancy notifications and metrics. For more information about metrics, see Monitoring cluster, nodes and databases PostgreSQL.
When the amount of disk space used grows, you can check:
When the cluster disk is 95% or more full, the cluster will go to DISK_FULL status and will be read-only. This is to prevent the cluster from completely locking up or becoming corrupted due to lack of free space. To make the cluster work read and write, clear the disk or scale the cluster and select a configuration with a disk size larger than the previous configuration.
Disk fullness notifications
Notifications of disk fullness are sent to the Account Owner's email and to users who are subscribed to the "Services and Services" notification category. Notifications are sent when the disk is 80% and 95% full.
View the size of temporary files
Temporary files can be used for sorting, hashing, and temporarily storing query results. To see the total size of temporary files in the database, use SQL query to pg_stat_database view:
SELECT datname, temp_files AS "Temporary files", temp_bytes AS "Size of temporary files"
FROM pg_stat_database;
Example output:
datname | temp_files | temp_bytes
--------+--------------+----------------
mydb | 2 | 16384
postgres| 1 | 8192
Here:
datname— database name;temp_files— number of temporary files in this database;temp_bytes— size of temporary files in bytes.
The temp_files and temp_bytes fields include all temporary files since the cluster was created. The data is only reset after restoring from backup or after a crash. Use the values of these fields to keep track of changes in the total size of temporary files.
The size of the temporary tables created by a particular query can be obtained using the EXPLAIN ANALYZE command.
Check consumers of logical replication slots
Logical replication slots are used to continuously replicate data from one database to another. Logical replication slots must always have a consumer. If there is no consumer, the file size will grow.
Learn more about managing logical replication slots in the PostgreSQL Subscription documentation article.
To check if logical replication slots have consumers, submit a SQL query to the pg_replication_slots view:
SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag,active
FROM pg_replication_slots;
Example output:
slot_name | replicationslotlag | active
-----------------+--------------------+--------
myslot1 | 129 GB | f
myslot2 | 704 MB | t
myslot3 | 624 MB | t
Here:
slot_name— name of the logical replication slot;replicationslotlag— the size of WAL files that will not be automatically deleted at checkpoints and that consumers of the logical replication slot can use;active— boolean value indicates whether the logical replication slot is in use:f— slot does not have a consumer;t— slot has a consumer.
If you have PostgreSQL version 13 or higher, you can limit the maximum size of stored WAL files using the max_slot_wal_keep_size parameter. Note that when using this parameter, the transaction log may be deleted before the consumer reads the changes from the logical replication slot.
"Dead" motorcades
When you update rows in a table (UPDATE) or delete them (DELETE), tuples are not actually deleted from disk, but new versions of them are created. The old versions of tuples will be called "dead tuples". This versioning is necessary to implement the MVCC (Multi-Version Concurrency Control) process . Even though the rows have been changed in one transaction, other active transactions may continue to see the old version of the rows. "Dead" tuples can only be deleted when all active transactions are closed.
"Dead" tuples will also be formed on the replica if you are using logical replication.
Check for "dead" tuples
If there are many "dead" tuples, they can take up a significant amount of disk space. To check the number of "dead" tuples, you can use the PostgreSQL extension pgstattuple or the pg_stat_all_tables view.
An example SQL query to the pg_stat_all_tables view:
SELECT * FROM pg_stat_all_tables WHERE relname='test';
Example output:
-[ RECORD 1 ]----------+------------------------------
relid | 16395
schemaname | public
relname | test
seq_scan | 3
seq_tup_read | 5280041
idx_scan |
idx_tup_fetch |
n_tup_ins | 2000000
n_tup_upd | 0
n_tup_del | 3639911
n_tup_hot_upd | 0
n_live_tup | 1635941
n_dead_tup | 1999952
n_mod_since_analyze | 3999952
last_vacuum |
last_autovacuum | 2023-02-16 04:49:52.399546+00
last_analyze | 2023-02-09 09:44:56.208889+00
last_autoanalyze | 2023-02-16 04:50:22.581935+00
vacuum_count | 0
autovacuum_count | 1
analyze_count | 1
autoanalyze_count | 1
Here n_dead_tup is the number of "dead" tuples.
Comparison of methods for removing "dead" tuples
To remove "dead" tuples, you can use the standard VACUUM, VACUUM FULL commands or repack tables and indexes using the pg_repack extension.
Learn more about the VACUUM and VACUUM FULL commands in the Routine Vacuuming article of the PostgreSQL documentation.
Learn more about the pg_repack extension and its functions in the pg_repack documentation.
Repack tables and indexes with pg_repack
Run pg_repack during periods of minimal load on the cluster, as pg_repack creates additional load. For more information on monitoring cluster state, see the Monitoring cluster, nodes, and databases PostgreSQL instruction.
To work with the pg_repack extension, a client of the same name is used. The client is installed on the host from which you connect to the PostgreSQL cluster.
-
Add the extension
pg_repackto the database. -
Determine the version of
the pg_repackextension:2.1 Connect to the cluster. When connecting, specify the name of the database to which you added the
pg_repackextension.2.2.Determine the version of the extension:
SELECT
extname AS extension,
extversion AS version,
extnamespace::regnamespace AS schema
FROM pg_extension
WHERE extname = 'pg_repack'; -
Download and install the pg_repack client of the same version as the extension.
-
Make sure that there is enough free disk space. A complete repackaging of tables requires free disk space approximately twice the size of the tables and indexes being processed. For example, if the total size of the tables and indexes being processed is 1 GB, an additional 2 GB of disk space is required.
-
Make sure that the table being processed has a primary key
(PRIMARY KEY) or a unique index (UNIQUE INDEX). -
Repackage the tables and indexes in the database:
pg_repack -k -h <host> -p <port> \
-U <user> \
-d <database_name> \
-t <table_name> \
-i <index_name>Specify:
<host>— DNS address of the node;<port>— port for connection;<user>— database user name;<database_name>— database name;- optional:
-t <table_name>, where<table_name>is the table name. Use this parameter if you want to repackage a single table. To repackage multiple tables, specify the required number of-tparameters — one for each table; - optional:
-i <index_name>, where<index_name>is the index name. Use this parameter if you want to repackage a single index. To repackage multiple indexes, specify the required number of-iparameters — one for each index. If the index belongs to a table that you have already specified with the-tparameter, you do not need to specify it separately with the-iparameter — the index will be automatically repacked together with the table.
If you do not specify the
-tand-iparameters,pg_repackwill repack all tables and indexes in the specified database.
Clear the disk
We do not recommend using a DELETE FROM table WHERE ... query. to clean up disk space. This query can create oversized selections on large tables and place them on disk. The remaining free disk space may run out completely, leading to problems with PostgreSQL and the need to restore its operation manually.
Open transaction transaction_read_only = no and remove unnecessary data using one of the queries:
-
DROP TABLE— completely deletes the table: data, structure, indexes, constraints, triggers.BEGIN;
SET transaction_read_only = no;
DROP TABLE table_name;
COMMIT; -
TRUNCATE TABLE— removes all rows from the table. Works faster thanDELETE.BEGIN;
SET transaction_read_only = no;
TRUNCATE TABLE table_name;
COMMIT; -
DELETE— deletes the rows specified in the conditionWHERE.