Skip to main content

Disk space usage in a PostgreSQL cluster

The PostgreSQL is stored on the cluster disks:

  • temporary files;
  • data files;
  • transaction logs (WALs);
  • logical replication slots;
  • DBMS logs;
  • The system files required for PostgreSQL to function.

You can monitor disk occupancy using metrics.Learn more about metrics in the Monitoring Cluster, Nodes, and Databases PostgreSQL instructions.

When the amount of disk space used grows, you can check:

If the cluster disk is 80% full, a notification will appear in the dashboard and will be sent to the Account Owner's email and those users subscribed to the "Services and Services" notification category.

If the cluster disk is 95% full or more, the cluster will go to DISK_FULL status and will be read-only.To make the cluster read and write-only, clear the disk or scale the cluster and select a configuration with a larger disk size.

View the size of temporary files

Temporary files can be used for sorting, hashing, and temporarily storing query results.To see the total size of temporary files in the database, use a SQL query against the pg_stat_database view:

SELECT datname, temp_files AS "Temporary files", temp_bytes AS "Size of temporary files"
FROM pg_stat_database;

Example output:

datname | temp_files   | temp_bytes
--------+--------------+----------------
mydb | 2 | 16384
postgres| 1 | 8192

Here:

  • datname — database name;
  • temp_files — number of temporary files in this database;
  • temp_bytes — size of temporary files in bytes.

The temp_files and temp_bytes fields account for all temporary files since the cluster was created.The data is only reset after a restore from backup or after a crash.Use the values of these fields to keep track of changes in the total size of temporary files.

The size of the temporary tables created by a particular query can be obtained using the EXPLAIN ANALYZE command.

Check consumers of logical replication slots

Logical replication slots are used to continuously replicate data from one database to another.Logical replication slots must always have a consumer.If there is no consumer, the file size will grow.

Learn more about managing logical replication slots in the PostgreSQL Subscription documentation article.

To check if logical replication slots have consumers, submit a SQL query to the pg_replication_slots view:

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag,active
FROM pg_replication_slots;

Example output:

slot_name        | replicationslotlag | active
-----------------+--------------------+--------
myslot1 | 129 GB | f
myslot2 | 704 MB | t
myslot3 | 624 MB | t

Here:

  • slot_name — name of the logical replication slot;
  • replicationslotlag — the size of WAL files that will not be automatically deleted at checkpoints and that consumers of the logical replication slot can use;
  • active — boolean value indicates whether the logical replication slot is in use:
    • f — slot does not have a consumer;
    • t — slot has a consumer.

If you have PostgreSQL version 13 or higher, you can limit the maximum size of stored WAL files using the max_slot_wal_keep_size parameter.Note that if you use this parameter, the transaction log may be deleted before the consumer reads the changes from the logical replication slot.

"Dead" motorcades

When you update rows in a table (UPDATE) or delete them (DELETE), the tuples are not actually deleted from disk, but new versions are created.The old versions of the tuples will be called "dead tuples".This versioning is necessary to implement the MVCC (Multi-Version Concurrency Control) process .Even though the rows are changed in one transaction, other active transactions may continue to see the old version of the rows. "Dead" tuples can only be deleted when all active transactions are closed.

"Dead" tuples will also be formed on the replica if you are using logical replication.

Check for "dead" tuples

If there are many "dead" tuples, they can take up a significant amount of disk space.To check the number of "dead" tuples, you can use the PostgreSQL extension pgstattuple or the pg_stat_all_tables view .

An example SQL query to the pg_stat_all_tables view:

SELECT * FROM pg_stat_all_tables WHERE relname='test';

Example output:

-[ RECORD 1 ]----------+------------------------------
relid | 16395
schemaname | public
relname | test
seq_scan | 3
seq_tup_read | 5280041
idx_scan |
idx_tup_fetch |
n_tup_ins | 2000000
n_tup_upd | 0
n_tup_del | 3639911
n_tup_hot_upd | 0
n_live_tup | 1635941
n_dead_tup | 1999952
n_mod_since_analyze | 3999952
last_vacuum |
last_autovacuum | 2023-02-16 04:49:52.399546+00
last_analyze | 2023-02-09 09:44:56.208889+00
last_autoanalyze | 2023-02-16 04:50:22.581935+00
vacuum_count | 0
autovacuum_count | 1
analyze_count | 1
autoanalyze_count | 1

Here n_dead_tup is the number of "dead" tuples.

Comparison of methods for removing "dead" tuples

To remove "dead" tuples, you can use the standard VACUUM, VACUUM FULL commands or repack tables and indexes using the pg_repack extension.

VACUUMVACUUM FULLpg_repack
Clears the table from "dead" tuples
Reduces file size
Completely locks the tableOnly for two short periods at the beginning and end of the extension work

Learn more about the VACUUM and VACUUM FULL commands in the Routine Vacuuming article of  the PostgreSQL documentation.

Learn more about the pg_repack extension and its functions in the pg_repack documentation.

Repack tables and indexes with pg_repack

For your information

Run pg_repack during periods of minimal load on the cluster because pg_repack creates additional load.For more information on monitoring cluster health, see the Monitoring Cluster, Nodes, and Databases PostgreSQL instruction.

The pg_repack extension uses a client of the same name.The client is installed on the host from which you connect to the PostgreSQL cluster.

  1. Add the extension pg_repack to the database.

  2. Determine the version of the pg_repack extension:

    2.1 Connect to the cluster.When connecting, specify the name of the database to which you added the pg_repack extension.

    2.2.Determine the version of the extension:

    SELECT
    extname AS extension,
    extversion AS version,
    extnamespace::regnamespace AS schema
    FROM pg_extension
    WHERE extname = 'pg_repack';
  3. Download and install the pg_repack client of the same version as the extension.

  4. Make sure there is enough free disk space.A complete table repack requires free disk space approximately twice the size of the tables and indexes being processed.For example, if the total size of the tables and indexes being processed is 1 GB, an additional 2 GB of disk space will be required.

  5. Make sure that the table being processed has a primary key (PRIMARY KEY) or a unique index (UNIQUE INDEX).

  6. Repackage the tables and indexes in the database:

    pg_repack -k -h <host> -p <port> \
    -U <user> \
    -d <database_name> \
    -t <table_name> \
    -i <index_name>

    Specify:

    • <host> — DNS address of the node;
    • <port> — port for connection;
    • <user> — database user name;
    • <database_name> — database name;
    • optional: -t <table_name>, where <table_name> is the table name.Use this parameter if you want to repackage a single table.To repackage multiple tables, specify the required number of -t parameters  — one for each table;
    • optional: -i <index_name>, where <index_name> is the index name.Use this parameter if you want to repackage a single index.To repackage multiple indexes, specify the required number of -i parameters  — one for each index.If the index belongs to a table you have already specified with the -t parameter, you do not need to specify it separately with the -i parameter  — the index will automatically be repacked together with the table.

    If you do not specify the -t and -i parameters, pg_repack will repack all tables and indexes in the specified database.

Clear the disk

For your information

We do not recommend using the query DELETE FROM table WHERE ... to clean up disk.This query can create large size selections on large tables and place them on disk.The remaining free disk space may run out completely, leading to problems with PostgreSQL and the need to restore its operation manually.

Open transaction transaction_read_only = no and remove unnecessary data using one of the queries:

  • DROP TABLE — deletes the structure (data, privileges, indexes, constraints, triggers).Use when completely deleting a table with data and structure:

    BEGIN;
    SET transaction_read_only = no;
    DROP TABLE table_name;
    COMMIT;
  • TRUNCATE TABLE — deletes the contents of the table, but the structure is preserved.Works faster. DROP TABLE.Use when deleting all rows of a table while preserving the table structure:

    BEGIN;
    SET transaction_read_only = no;
    TRUNCATE TABLE table_name;
    COMMIT;
  • DELETE — use to delete specific strings.