Monitoring of PostgreSQL cluster, nodes, and databases

In Managed Databases PostgreSQL you can monitor the state of the cluster.

To assess the overall state of the cluster, check its status.

For a more detailed analysis, some metrics can be viewed as charts in the control panel:

cluster node metrics;
database metrics;
connection pooler metrics.

A complete set of available metrics can be exported in Prometheus format.

When analyzing graphs, keep in mind that the time in the Control Panel corresponds to the time on your device and does not depend on the region where the cluster is located.

:::noteFor example, you have created a cluster in Tashkent, in the uz-1 pool. Tashkent is in the UTC+5 time zone. On the device from which you logged into the control panel, the UTC+3 time zone is set. The time on the metrics charts will be displayed in UTC+3. :::

View cluster status

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.

Check the status in the cluster row.

ACTIVE	Cluster is available
CREATING	Cluster is being created
UPDATING	The cluster is updating
RESIZING	Cluster is being scaled
ERROR	An error has occurred; create a ticket
DISK FULL	Disk is full; the cluster is in read-only mode. For the cluster to work in read-write mode, clean up the disk or scale the cluster and select a configuration with a larger disk size
DEGRADED	Some cluster nodes are unavailable
DELETING	Cluster is being deleted

View cluster node status

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster monitoring block, click Cluster nodes.
Select the nodes whose metrics you want to view.
View the available cluster node metrics.

Cluster node metrics in the control panel

Memory	Occupied memory excluding operating system cache and buffers, as a percentage or in gigabytes
vCPU	Percentage of cluster node core utilization
CPU iowait	Percentage of time the processor spent waiting for input/output
Volume	Used disk space in percent or gigabytes. It accounts for the part of disk space reserved for service needs and unavailable for hosting databases. For more information about disk space reservation, see the instructions Using disk space in cluster PostgreSQL
Load Average	Average system load over a period of time. Shows the number of processes being processed by the cluster cores. The indicator is presented as three values — for 1, 5, and 15 minutes. These values should not exceed the number of cores on the node
OOM	Number of processes that failed with an `Out of Memory` error due to insufficient RAM
Disk load	Data read and write speed in KB/s or the number of read and write operations per second
Network load	Number of bits or packets sent and received via the network interface

View database status

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster monitoring block, click Databases.
Select the nodes whose metrics you want to view.
View the available database metrics.

Database metrics in the control panel

Statistics file size	Total size of the statistics file in kilobytes
Cache hit	Percentage of data in the query that was read from the cache — the ratio of `blks_hit` to the sum of `blks_hit` and `blks_read`
Row operations	Number of rows affected by operations in the selected database per second: `tup_deleted` — number of rows deleted by operations per second; ; `tup_fetched` — number of rows extracted by operations per second; ; `tup_inserted` — number of rows inserted by operations per second; ; `tup_returned` — number of rows returned by operations per second; ; `tup_updated` — number of rows updated by operations per second
Locks	Number of locks in each database of the cluster
Deadlocks	Number of deadlocks in each database
Transactions	Number of transactions per second in each database of the cluster
Connections	Number of connections to each database of the cluster and the total number of connections to all databases
Temporary file size	Total size of temporary files in kilobytes
WAL file size	Total size of WAL files in megabytes
Execution time of the longest query	Execution time of the longest query in each database of the cluster over a period of time
Database size	Total size of the selected database in megabytes

View connection pooler status

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster monitoring block, click Connection pooler.
Select the nodes whose metrics you want to view.
View the available connection pooler metrics.

Connection pooler metrics in the control panel

Maximum client wait time in queue	Maximum client wait time in queue in the selected database in seconds
Wait time for response from server	Wait time for response from node in the selected database in seconds
Active server connections	Number of server connections associated with clients in the selected database
Client connections to the pool	Number of client connections to the pool in the selected database: `pools_client_active_connections` — number of client connections associated with server connections or idle without requests; ; `pools_client_waiting_connections` — number of client connections where a request has been sent but there is no connection to the node yet

Export metrics in Prometheus format

Historical information for clusters is not available — metrics are requested only in real time. The list of all metrics that are supported in Managed Databases and their description can be viewed in the Metrics in Prometheus format table.

Get token.
Get metrics in Prometheus format.

1. Get token

The token provides access to metrics for all clusters in a project within a single pool.

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Prometheus Tokens block, click Create Token. The token will be generated automatically.
Copy the token. To do this, click in the token line.

2. Get metrics in Prometheus format

Configuration file
CLI

Add to the Prometheus configuration file:
```
scrape_configs:
  - job_name: get-metrics-from-dbaas
    scrape_interval: 1m
    static_configs:
      - targets:
        - '<domain>'
    scheme: https
    authorization:
      type: Bearer
      credentials: <token>
```
Specify:
- <domain> — Managed Databases API domain. This is the part of the URL used to access the API without https:// /v1, for example uz-1.dbaas.api.servercore.com. The URL depends on the region and pool, you can check it in the list of URLs;
- <token> — the token you copied when obtaining the token in step 5.
Open in your browser the page where metrics in Prometheus format will be available:
```
http://<ip_address>:9090/targets
```
Specify <ip_address> — the IP address where Prometheus is installed.
Configure monitoring and alerts for database clusters yourself.

Metrics in Prometheus format

Metrics in Prometheus format are provided for all clusters. A specific cluster can be found by the database cluster ID in the ds_id label.

Infrastructure level metrics
Application level metrics

dbaas_memory_percent	Occupied memory excluding operating system cache and buffers (RAM) as a percentage
dbaas_memory_bytes	Occupied memory excluding operating system cache and buffers (RAM) in bytes
dbaas_oom_count	Number of processes that failed with an `Out of Memory` error due to insufficient RAM
dbaas_cpu	vCPU utilization on database cluster nodes as a percentage
dbaas_cpu_iowait	Input/output wait time as a percentage
dbaas_disk_percent	Used disk space in percent. It accounts for the part of disk space reserved for service needs and unavailable for hosting databases. For more information about disk space reservation, see the instructions Using disk space in cluster PostgreSQL
dbaas_disk_bytes	Used disk space in bytes. It accounts for the part of disk space reserved for service needs and unavailable for hosting databases. For more information about disk space reservation, see the instructions Using disk space in cluster PostgreSQL
dbaas_disk_read_iops	Number of read operations per second
dbaas_disk_write_iops	Number of write operations per second
dbaas_disk_read_bytes	Disk read speed in bytes per second
dbaas_disk_write_bytes	Disk write speed in bytes per second
dbaas_node_load1	Average system load over one minute. Shows how many processes are being handled by the cluster cores
dbaas_node_load5	Average system load over five minutes. Shows how many processes are being handled by the cluster cores
dbaas_node_load15	Average system load over 15 minutes. Shows how many processes are being handled by the cluster cores
dbaas_network_receive_bytes	Number of bytes received via the network interface
dbaas_network_transmit_bytes	Number of bytes sent via the network interface
dbaas_network_receive_packets	Number of packets received via the network interface per second
dbaas_network_transmit_packets	Number of packets sent via the network interface per second
dbaas_role	Node role: `0` — role unknown; ; `1` — master; ; `2` — replica

dbaas_connections	Number of active connections to the PostgreSQL process. For example, you can use the following labels: `ds_name` — database cluster name; ; `datname` — database name.
dbaas_total_connections	Total number of established connections to the PostgreSQL process
dbaas_max_tx_duration	Execution time of the longest query in seconds
dbaas_xact_commit_rollback	Number of transactions per second in each cluster database. For example, you can use the following labels: `ds_name` — database cluster name; ; `datname` — database name.
dbaas_tup_deleted	Number of rows deleted by queries in the database per second
dbaas_tup_fetched	Number of rows extracted by queries in the database per second
dbaas_tup_inserted	Number of rows inserted by queries in the database per second
dbaas_tup_returned	Number of rows returned by queries in the database per second
dbaas_tup_updated	Number of rows changed by queries in the database per second
dbaas_xact_commit	Number of committed transactions per second in the database
dbaas_xact_rollback	Number of transactions per second in the database for which a rollback was performed
dbaas_cache_hit_ratio	Percentage of data in the query that was read from the cache — the ratio of `blks_hit` to the sum of `blks_hit` and `blks_read`
dbaas_deadlocks	Number of deadlocks per second in each database. For example, you can use the following labels: `ds_name` — database cluster name; ; `datname` — database name.
dbaas_locks	Number of locks per second in each database of the cluster. For example, you can use the following labels: `ds_name` — database cluster name; ; `datname` — database name.
dbaas_pg_pgss_query_texts_size_bytes	Size of the statistics file from `pg_stat_statements` in bytes
dbaas_pg_total_wals_size_bytes	Size of the directory with WAL files in bytes
dbaas_pg_tmp_size_bytes	Temporary file size of PostgreSQL in bytes
dbaas_databases_size_bytes	Total database size in bytes. For example, you can use the label `datname` — database name
dbaas_pg_trx_max_age	Number of transactions performed after the last freeze using the VACUUM FREEZE or AUTOVACUUM operation
dbaas_pg_trx_percent_before_vacuum_freeze	Shows how close the age of the earliest transaction in the database is to the threshold after which PostgreSQL forcibly starts the VACUUM FREEZE operation. The threshold is determined by the `autovacuum\_freeze\_max\_age` parameter
dbaas_pg_trx_percent_before_wraparound_risk	Shows how close the age of the earliest transaction in the database is to the threshold after which transaction ID wraparound is possible (wraparound)
dbaas_pgbouncer_pools_client_maxwait_seconds	Maximum client wait time in queue in seconds
dbaas_pgbouncer_pools_client_waiting_connections	Number of client connections where a request has been sent but there is no connection to the node yet
dbaas_pgbouncer_stats_client_wait_seconds_total	Wait time for response from node in microseconds
dbaas_pgbouncer_pools_client_active_connections	Number of client connections associated with server connections or idle without requests. For example, you can use the following labels: `ds_id` — database cluster ID; `data_base` — database name.
dbaas_pgbouncer_pools_server_active_connections	Number of server connections associated with clients
dbaas_pg_replication_slot_active	Replication slot status: `0` — slot is not in use. The slot has no consumer, data is not transmitted to the receiving database, accumulates in the replication slot, and occupies additional disk space; ; `1` — slot is in use. The slot has a consumer, data is transmitted to the receiving database
dbaas_pg_replication_slot_lag	Size of accumulated WAL files in megabytes. Shows how much transactional information needs to be processed by the receiving database to catch up to the source database