Skip to main content

Autoscaling in a Managed Kubernetes cluster

For your information

Autoscaling is unavailable:

  • for node groups with GPU without drivers;
  • node groups on dedicated servers.

In a Managed Kubernetes cluster, you can use Cluster Autoscaler to autoscalenode groups. It helps to optimally use cluster resources — depending on the cluster load, the number of nodes in the group will automatically decrease or increase. Cluster Autoscaler is installed automatically when creating a cluster, simply enable it to start working.

When using Cluster Autoscaler, keep the recommendations in mind.

For pod autoscaling in Managed Kubernetes, Metrics Server is used.

Recommendations

For optimal autoscaling performance, we recommend:

  • do not use more than one autoscaling tool at the same time;
  • make sure the project has quotas for vCPU, RAM, GPU, and disk capacity to create the maximum number of nodes in the group;
  • specify resource requests in the manifests for pods. For more information, see the Resource Management for Pods and Containers instruction in the Kubernetes documentation;
  • configure PodDisruptionBudget for pods that cannot be stopped. This will help avoid downtime during moves between nodes;
  • do not manually modify node resources through the control panel. Cluster Autoscaler will not accommodate these changes;
  • check that nodes in the group have the same configuration and labels.

Autoscaling using Cluster Autoscaler

Cluster Autoscaler does not need to be installed in the cluster — it is installed automatically when the cluster is created. To use Cluster Autoscaler in a cluster, enable autoscaling of the node group. After enabling autoscaling, default settings are used, but you can configure Cluster Autoscaler for each node group.

Operating principle

Cluster Autoscaler works with existing node groups and pre-selected configurations. If a node group is in the ACTIVE status, Cluster Autoscaler checks every 10 seconds if there are any pods in the PENDING status and analyzes the load — requests from pods for vCPU, RAM, and GPU. Depending on the check results, nodes are added or deleted. The node group at this time changes its status to PENDING_SCALE_UP or PENDING_SCALE_DOWN. The cluster status during autoscaling is ACTIVE. More about cluster statuses in the View cluster status instruction.

The minimum and maximum number of nodes in a group can be set when enabling autoscaling — Cluster Autoscaler will only change the number of nodes within these limits. If at least two working nodes remain in other node groups of the cluster, you can configure autoscaling to zero nodes.

Adding a node

If there are pods in the PENDING status and the cluster does not have enough free resources to accommodate them, the required number of nodes is added to the cluster. In a cluster with Kubernetes version 1.28 and higher, Cluster Autoscaler will work in multiple groups at once and distribute nodes evenly.

note

For example, you have two node groups with autoscaling enabled. The cluster load has increased and requires four additional nodes. Two new nodes will be created in each node group simultaneously.

In a cluster with Kubernetes version 1.27 or lower, nodes are added one at a time per check cycle.

Deleting a node

If there are no pods in the PENDING status, Cluster Autoscaler checks the amount of resources requested by the pods.

If the requested number of resources for pods on one node is less than 50% of its resources, Cluster Autoscaler marks the node as unneeded. If the resource request count does not increase on the node within 10 minutes, Cluster Autoscaler checks if the pods can be moved to other nodes.

Cluster Autoscaler will not move pods and, consequently, will not delete a node if one of the following conditions is met:

  • pods use PodDisruptionBudget;
  • In kube-system pods there is no PodDisruptionBudget;
  • pods are created without a controller — for example, Deployment, ReplicaSet, StatefulSet;
  • pods use local storage;
  • there are no resources on other nodes for the pod requests;
  • there is a mismatch of nodeSelector, affinity and anti-affinity rules, or other parameters.

You can allow such pods to be moved — to do this, add the following annotation:

cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

If there are no restrictions, the pods will be moved, and low-load nodes will be deleted. Nodes are deleted one at a time per check cycle.

Autoscaling to zero nodes

In a node group, you can configure autoscaling to zero nodes — at low load, all nodes in the group are deleted. The node group card with all settings is not deleted. When the load increases, nodes can be added to this node group again.

Autoscaling to zero nodes works only if at least two working nodes remain in other node groups of the cluster. Working nodes must remain in the cluster to host the system components required for the cluster to function.

note

For example, autoscaling to zero nodes will not work if the cluster has:

  • two node groups with one working node in each;
  • one node group with two working nodes.

When there are no nodes in a group, you do not pay for unused resources.

Enable autoscaling using Cluster Autoscaler

For your information

If you set the minimum number of nodes in a group higher than the current number, it will not increase to the lower limit immediately. The node group will only scale after pods appear in the PENDING status. The same applies to the upper limit of nodes in a group — if the current number of nodes is greater than the upper limit, deletion will only start after pod verification.

You can enable autoscaling using Cluster Autoscaler in the control panel, via the Managed Kubernetes API or via Terraform.

  1. In the control panel, on the top menu, click Products and select Managed Kubernetes.
  2. Open the cluster page → Cluster Resources tab.
  3. In the menu of the node group, select Change number of nodes.
  4. In the Number of nodes field, open the With autoscaling tab.
  5. Set the minimum and maximum number of nodes in the group — the number of nodes will only change within this range. For fail-safe operation of system components, we recommend having at least two working nodes in the cluster; nodes can be located in different groups.
  6. Click Save.

Configure Cluster Autoscaler

You can configure Cluster Autoscaler separately for each node group.

Parameters, their descriptions, and default values can be viewed in the Cluster Autoscaler parameters table. If you do not specify a parameter in the manifest, the default value will be used.

Example manifest:

apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-nodegroup-options
namespace: kube-system
data:
config.yaml: |
150da0a9-6ea6-4148-892b-965282e195b0:
scaleDownUtilizationThreshold: 0.55
scaleDownUnneededTime: 7m
zeroOrMaxNodeScaling: true
e3dc24ca-df9d-429c-bcd5-be85f8d28710:
scaleDownGpuUtilizationThreshold: 0.25
ignoreDaemonSetsUtilization: true

Here, 150da0a9-6ea6-4148-892b-965282e195b0 and e3dc24ca-df9d-429c-bcd5-be85f8d28710 are unique identifiers (UUID) of the node groups in the cluster. You can view them in the control panel: on the top menu, click ProductsManaged KubernetesKubernetes section ⟶ cluster page ⟶ copy the UUID above the node group card, next to the pool segment.

Cluster Autoscaler parameters

DescriptionDefault value
scaleDownUtilizationThresholdMinimum vCPU and RAM load of a node at which the system can delete the node. If the node uses less than the specified percentage of vCPU and RAM, for example, less than 50% with a value of 0.5, the system deletes the node0.5
scaleDownGpuUtilizationThresholdMinimum GPU load at which the system can delete a node. If the node uses less than the specified percentage of GPU, for example, less than 50% with a value of 0.5, the system deletes the node0.5
scaleDownUnneededTimeWaiting time before deleting a low-load node. The system does not delete a node immediately when the node load drops — it waits for the specified time to ensure that the load reduction is stable10m
scaleDownUnreadyTimeWaiting time before deleting a node in the NotReady status. The system will not leave a node in the NotReady status in the cluster — it will wait for the specified time to ensure that the node has frozen and will not recover, and then it will delete it20m
maxNodeProvisionTimeWaiting time for adding a new node. If an error occurs and the node is not added within the specified time, the system will restart the node addition process15m
zeroOrMaxNodeScalingAllows the number of nodes to be automatically changed only to zero or the maximum you have set. Useful if you need the system to deploy all nodes in a group at once when load appears, and remove all nodes when there is no loadfalse
ignoreDaemonSetsUtilizationAllows you to ignore system services (DaemonSets) when the system determines whether to reduce the number of nodes in a group. If the value is true, system services are ignoredfalse