Revert to Lower-Cost Node
Cloud service provider relevance: AWS Kubernetes
In addition to scale up, scale down, and various optimization processes (e.g., Revert to Reserved Capacity, Savings Plans, and Revert to spot), Ocean uses the Revert to Lower-cost Node process. This process is applied to nodes with underutilized compute resources that cannot be scaled down from the cluster's set of nodes.
Scaling down is not always possible. For example, anti-affinity rules ensure that pods run on different nodes. Even if the node is underutilized because other pods have finished running, it is impossible to scale down the node because the anti-affinity would be violated.
Another example could occur when the configuration requires a minimum number of nodes at the cluster or virtual node group (VNG) level. Either scenario may result in cluster nodes with unused resources that cannot scale down for optimization.
To address these cases, the revert to lower-cost node process analyzes the nodes in the cluster and checks for underutilized nodes that Ocean could not scale down. Then, Ocean proactively replaces them with cheaper nodes if a more profitable VM instance is available.
How it Works
Ocean constantly scans the cluster’s node utilization. The revert to lower-cost node optimization process is applied when all of these conditions are met:
- Cluster Orientation:
- Balanced Orientation (default):
- No scaling occurred in the last 25 minutes in the specific virtual node group (neither scale up nor down event).
- CPU and memory usage is less than 50%, or GPU utilization is less than 50%.
- Cost Orientation:
- No scaling occurred in the last 20 minutes in the specific virtual node group (neither scale up nor down event).
- CPU and memory usage is less than 60%, or GPU utilization is less than 60%.
- Cheapest Orientation:
- No scaling occurred in the last 15 minutes in the specific virtual node group (neither scale up nor down event).
- CPU and memory usage is less than 70%, or GPU utilization is less than 70%.
- Balanced Orientation (default):
- The node was underutilized for at least 10 minutes.
- The node lifecycle is a spot instance.
- No ongoing replacement in the relevant virtual node group.
- A smaller instance type than the running one is configured.
Then, Ocean will individually replace all the relevant nodes in the virtual node group. Each time the process is triggered, it will replace up to one instance in a virtual node group. (Nodes from different virtual node groups can be replaced simultaneously.)
- If the cluster is set to utilize Reserved Instances (RIs), the autoscaler will try to launch RIs first.
- If there is no spot available and there is a smaller on-demand instance that is also cheaper, Ocean will try to replace the instance with that OD instance.
Ocean will not replace nodes with restricted scale-down configuration (neither on the pods nor the virtual node group level) or where the pod disruption budget would be violated.
The proactive cost optimization process runs in addition to Ocean's existing optimization processes, such as:
- Revert to RI or Savings Plan process — Ocean constantly monitors your account's available RIs or Savings Plans (when the
strategy.utilizeReservedInstancesorutilizeCommitmentsflag is enabled). If an Ocean-monitored node runs as spot or on-demand, Ocean will try to replace it with the available RI or Savings Plan nodes. - Revert to Spot process — If a node was launched on demand because no spot node was available in the market, Ocean continues scanning the market for an available spot node and reverts as soon as one becomes available.