Labels and Taints

To make scheduling more efficient and compatible with Kubernetes, Ocean supports the Kubernetes constraint mechanisms for scheduling pods:

Node Selector: Constrains pods to nodes with particular labels.
Node Affinity: Constrains nodes for pod scheduling eligibility based on node labels. Spot supports hard and soft affinity (requiredDuringSchedulingIgnoredDuringExecution, preferredDuringSchedulingIgnoredDuringExecution).
Pod Affinity and Pod Anti-Affinity: This function schedules a pod based on whether other pods run on a node.
Pod Port Restrictions: Validates that each pod will have required ports available on the machine.
Well-Known Labels.

Spot Labels

Spot labels let you adjust Ocean's default scaling behavior. Add them to your pods to control the node termination process or lifecycle.

spotinst.io/azure-premium-storage

The AKS scheduler does not guarantee that pods requiring premium storage will schedule on nodes that support premium storage disks. The Spot Ocean label spotinst.io/azure-premium-storageis injected into every node in a node pool that supports premium storage. We recommend using spotinst.io/azure-premium-storage on your pods when the pod requires premium storage disks. This ensures pods are scheduled on nodes that meet their storage requirements. For more information, see Azure premium storage.

spotinst.io/restrict-scale-down

Some workloads tolerate spot instance replacements less than others. To minimize node replacements for these workloads while maintaining spot instance cost savings, use the spotinst.io/restrict-scale-down label (set to true) to block proactive scaling down of instances for bin packing purposes. This will leave the instance running as long as possible. The instance will be replaced only if it becomes unhealthy or if forced by a cloud provider interruption.

spotinst.io/node-lifecycle

Ocean uses the spotinst.io/node-lifecycle label key to indicate a node's lifecycle. It is applied to all Ocean-managed nodes and has a value of od (on-demand).

This label is useful for workloads that are not resilient to spot instance interruptions and must run on on-demand instances at all times.

By applying node affinity to the spotinst.io/node-lifecycle label with the value od, you can make sure that these workloads are scheduled only on on-demand instances.

note

spotinst.io/node-lifecycle:spot affinity is not supported, and unless spotinst.io/node-lifecycle:od affinity is applied, Ocean will continue to try to provide excess compute capacity (spot instances) for all workloads in the cluster.

spotinst.io/gpu-type

This label helps create direct affinity to specific types of GPU hardware, freeing you from the need to explicitly set and manage a list of VMs that contain the required hardware. Ocean automatically matches the relevant VMs (AWS and GCP) for workloads having affinity rules using this label. Valid label values are:

nvidia-tesla-v100
nvidia-tesla-p100
nvidia-tesla-k80
nvidia-tesla-p4
nvidia-tesla-t4
nvidia-tesla-a100 (Only for AWS)
nvidia-tesla-m60
amd-radeon-v520
nvidia-tesla-t4g
nvidia-tesla-a10

note

Don't add Spot labels under the virtual node group (launch specification) node labels section. Add these labels to the pod configuration only.

Instance Types Labels

Format: aws.spot.io/instance-<object>, for example, aws.spot.io/instance-category

Apply these labels to a workload's constraints (nodeSelector, node affinity, etc.) to reflect instance type properties. For example, constrain workloads to run on any M6, M7, or R7 family. This avoids manually listing all instance types per family.

The instance labels are as follows:

aws.spot.io/instance-category: Reflects the category of the instance (for example., c).
aws.spot.io/instance-family: Reflects the family of the instance (for example., c5a).
aws.spot.io/instance-generation: Reflects the generation of the instance (for example., 5).
aws.spot.io/instance-hypervisor: Reflects the hypervisor the instance uses (for example., nitro).
aws.spot.io/instance-cpu: Reflects the CPU the instance uses (for example., 2).
aws.spot.io/instance-memory: Reflects the instance's memory (for example., 4096).

These labels only launch nodes that match the required pod labels.

Examples

Using restrict scale-down label:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        spotinst.io/restrict-scale-down: "true"
    spec:
      containers:
        - name: nginx
          image: nginx:1.14.2
          ports:
            - containerPort: 80
          resources:
            requests:
              memory: "2Gi"
              cpu: "2"
            limits:
              memory: "4Gi"
              cpu: "4"

Using od nodeSelector:

apiVersion: v1
kind: Pod
metadata:
  name: with-node-selector
spec:
  containers:
    - name: with-node-selector
      image: registry.k8s.io/pause:2.0
      imagePullPolicy: IfNotPresent
  nodeSelector:
    spotinst.io/node-lifecycle: od

Using od nodeAffinity:

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: spotinst.io/node-lifecycle
                operator: In
                values:
                  - od
  containers:
    - name: with-node-affinity
      image: registry.k8s.io/pause:2.0

Startup Taints

Cloud service provider relevance: AWS Kubernetes

Startup taints are temporary taints applied to a node during its initialization phase. The Ocean autoscaler recognizes these taints and avoids scaling up additional nodes for pending pods that match this node, anticipating the taint's removal. Once the startup taint is removed, any pod without a matching toleration can be scheduled on the node without requiring new nodes to be launched.

When to Use Startup Taints

Use startup taints to make sure a specific pod deploys to a node before other pods. Once that initialization pod is ready or has completed a defined procedure (such as configuring networking), other pods can be scheduled on the node.

Example - Cilium: Cilium recommends applying a taint such as node.cilium.io/agent-not-ready=true:NoExecute to prevent other pods from starting before Cilium has configured networking on the node.

Only the initialization pod will have a toleration for this taint. Once the node is ready, the application running in the initialization pod removes the taint from the node.

note

If the startupTaint attribute has not been removed for a specific node by the end of the cluster's grace period, a new node will be launched for any pending pods. The grace period starts when a node is created; its default is 5 minutes, and you can configure it in the cluster under cluster.strategy.gracePeriod.

Configure Startup Taints in the Spot API

AWS Kubernetes only

Prerequisite: Ocean controller version v2.0.68 or above.

Configure Ocean to consider your startup taints using the startupTaints attribute at the Ocean cluster and virtual node group levels.

Cluster: under cluster.compute.launchSpecification
- Create Cluster
- Update Cluster
Virtual node group: under launchSpec
- Create virtual node group
- Update virtual node group

important

You must also set the startupTaint as a regular taint in the userData for the cluster or virtual node group. This is because Ocean does not add or remove configured startup taints.

Spot Labels​

spotinst.io/azure-premium-storage​

spotinst.io/restrict-scale-down​

spotinst.io/node-lifecycle​

spotinst.io/gpu-type​

Instance Types Labels​

Examples​

Startup Taints​

When to Use Startup Taints​

Configure Startup Taints in the Spot API​

Spot Labels

spotinst.io/azure-premium-storage

spotinst.io/restrict-scale-down

spotinst.io/node-lifecycle

spotinst.io/gpu-type

Instance Types Labels

Examples

Startup Taints

When to Use Startup Taints

Configure Startup Taints in the Spot API