Autoscaling in a Managed Kubernetes cluster

For your information

Autoscaling is unavailable:

for node groups with GPU without drivers;
node groups on dedicated servers.

In a Managed Kubernetes cluster, you can use Cluster Autoscaler to autoscale node groups. It helps optimize cluster resource utilization—the number of nodes in a group will automatically decrease or increase depending on the cluster load. Cluster Autoscaler is installed automatically when a cluster is created; to start using it, you just need to enable it.

When using Cluster Autoscaler, keep the recommendations in mind.

For pod autoscaling in Managed Kubernetes, Metrics Server is used.

Recommendations

For optimal autoscaling performance, we recommend:

do not use more than one autoscaling tool at the same time;
make sure the project has quotas for vCPU, RAM, GPU, and disk capacity to create the maximum number of nodes in the group;
specify resource requests in the manifests for pods. For more information, see the Resource Management for Pods and Containers instruction in the Kubernetes documentation;
configure PodDisruptionBudget for pods that cannot tolerate interruptions. This will help avoid downtime during migration between nodes;
do not manually modify node resources through the control panel. Cluster Autoscaler will not accommodate these changes;
check that nodes in the group have the same configuration and labels.

Autoscaling using Cluster Autoscaler

Cluster Autoscaler does not need to be installed in a cluster—it is installed automatically when a cluster is created. To use Cluster Autoscaler in a cluster, enable node group autoscaling. After autoscaling is enabled, default settings are used, but you can configure Cluster Autoscaler for each node group.

Operating principle

Cluster Autoscaler works with existing node groups and pre-selected configurations. If a node group is in the ACTIVE status, Cluster Autoscaler checks every 10 seconds for pods in the PENDING status and analyzes the load—pod requests for vCPU, RAM, and GPU. Depending on the check results, nodes are added or deleted. During this process, the node group switches to PENDING_SCALE_UP or PENDING_SCALE_DOWN. The cluster status during autoscaling is ACTIVE. For more information about cluster statuses, see the View cluster status instruction.

The minimum and maximum number of nodes in a group can be set when enabling autoscaling —Cluster Autoscaler will only change the number of nodes within these limits. If there are at least two working nodes remaining in other node groups of the cluster, you can configure autoscaling to zero nodes.

Adding a node

If there are pods in the PENDING status and the cluster lacks free resources to place them, the required number of nodes is added to the cluster. In a cluster with Kubernetes version 1.28 and higher, Cluster Autoscaler will work across several groups at once and distribute nodes evenly.

note

For example, you have two node groups with autoscaling enabled. The cluster load increases, requiring four nodes to be added. Two new nodes will be created simultaneously in each node group.

In a cluster with Kubernetes version 1.27 or lower, nodes are added one at a time per check cycle.

Deleting a node

If there are no pods in the PENDING status, Cluster Autoscaler checks the amount of resources requested by the pods.

If the requested resource amount for pods on a node is less than 50% of the node's resources, Cluster Autoscaler marks the node as unnecessary. If the resource request amount on the node does not increase within 10 minutes, Cluster Autoscaler checks whether the pods can be migrated to other nodes.

Cluster Autoscaler will not move pods and, consequently, will not delete a node if one of the following conditions is met:

pods use PodDisruptionBudget;
In kube-system pods there is no PodDisruptionBudget;
pods are created without a controller — for example, Deployment, ReplicaSet, StatefulSet;
pods use local storage;
there are no resources on other nodes for the pod requests;
there is a mismatch of nodeSelector, affinity and anti-affinity rules, or other parameters.

You can allow such pods to be moved — to do this, add the following annotation:

cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

If there are no restrictions, the pods will be migrated, and the underutilized nodes will be removed. Nodes are removed one by one per check cycle.

Autoscaling to zero nodes

In a node group, you can configure autoscaling to zero nodes—all nodes in the group are removed when the load is low. The node group card with all settings is not removed. When the load increases, nodes can be added to this group again.

Autoscaling to zero nodes works only if at least two working nodes remain in other node groups of the cluster. Working nodes must remain in the cluster to host system components necessary for cluster operation.

note

For example, autoscaling to zero nodes will not work if the cluster has:

two node groups with one working node in each;
one node group with two working nodes.

When there are no nodes in a group, you do not pay for unused resources.

Enable autoscaling using Cluster Autoscaler

For your information

If you set the minimum number of nodes in a group higher than the current number, it will not increase to the lower bound immediately. The node group will only scale after pods appear in the PENDING status. Similarly, for the upper bound of nodes in a group—if the current number of nodes is greater than the upper bound, removal will only start after checking the pods.

You can enable autoscaling using Cluster Autoscaler in the control panel, via the Managed Kubernetes API or via Terraform.

In the control panel, on the top menu, click Products and select Managed Kubernetes.
Open the cluster page → Cluster Resources tab.
In the menu of the node group, select Change number of nodes.
In the Number of nodes field, open the With autoscaling tab.
Set the minimum and maximum number of nodes in the group—the node count will only change within this range. For fault-tolerant operation of system components, we recommend using at least two working nodes in the cluster. Nodes can be in different groups.
Click Save.

Configure Cluster Autoscaler

You can configure Cluster Autoscaler separately for each node group.

Parameters, their descriptions, and default values can be found in the Cluster Autoscaler parameters table. If you do not specify a parameter in the manifest, the default value will be used.

Example manifest:

apiVersion: v1
kind: ConfigMap
metadata:
    name: cluster-autoscaler-nodegroup-options
    namespace: kube-system
data:
    config.yaml: |
        150da0a9-6ea6-4148-892b-965282e195b0:
          scaleDownUtilizationThreshold: 0.55
          scaleDownUnneededTime: 7m
          zeroOrMaxNodeScaling: true
        e3dc24ca-df9d-429c-bcd5-be85f8d28710:
          scaleDownGpuUtilizationThreshold: 0.25
          ignoreDaemonSetsUtilization: true

Here, 150da0a9-6ea6-4148-892b-965282e195b0 and e3dc24ca-df9d-429c-bcd5-be85f8d28710 are the unique identifiers (UUIDs) of the node groups in the cluster. You can view them in the control panel: in the top menu, click Products ⟶ Managed Kubernetes ⟶ Kubernetes section ⟶ cluster page ⟶ copy the UUID above the node group card, next to the pool segment.

Cluster Autoscaler parameters

	Description	Default value
scaleDownUtilizationThreshold	Minimum vCPU and RAM load of a node at which the system can delete the node. If the node uses less than the specified percentage of vCPU and RAM, for example, less than 50% with a value of `0.5`, the system deletes the node	0.5
scaleDownGpuUtilizationThreshold	Minimum GPU load at which the system can delete a node. If the node uses less than the specified percentage of GPU, for example, less than 50% with a value of `0.5`, the system deletes the node	0.5
scaleDownUnneededTime	Waiting time before deleting a low-load node. The system does not delete a node immediately when the node load drops — it waits for the specified time to ensure that the load reduction is stable	10m
scaleDownUnreadyTime	Waiting time before deleting a node in the `NotReady` status. The system will not leave a node in the `NotReady` status in the cluster — it will wait for the specified time to ensure that the node has frozen and will not recover, and then it will delete it	20m
maxNodeProvisionTime	Waiting time for adding a new node. If an error occurs and the node is not added within the specified time, the system will restart the node addition process	15m
zeroOrMaxNodeScaling	Allows the number of nodes to be automatically changed only to zero or the maximum you have set. Useful if you need the system to deploy all nodes in a group at once when load appears, and remove all nodes when there is no load	false
ignoreDaemonSetsUtilization	Allows you to ignore system services (DaemonSets) when the system determines whether to reduce the number of nodes in a group. If the value is `true`, system services are ignored	false