Skip to main content

PodDisruptionBudgets and Cluster maintenance

Introduction

PodDisruptionBudget(PDB) is a feature which will help you run highly available applications and limit the disruption to your application during nodes rollout and pods rescheduling, such as for upgrades or routine maintenance work on the Kubernetes nodes.

Refer to the kubernetes documentation at pod disruptions, for detailed explainations.

Defining PodDisruptionBudget for your application:

You can configure PDB by defining the number or percentage of pods that can go down with maxUnavailable and must stay up with minAvailable. PDB applies the budget by using a selector and currently supports ReplicaSet, StatefulSet, and Deployment.

Example PDB definitions using minAvailable

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-pdp
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app-name

If your application is defined with 3 deployment replicas, then during node rollout, the cluster will make sure you have 2 available and hence 1 pod is rescheduled at one time.

Setting minAvailable to 100% or same as pod number

If your deployment replicas is set to 1 and have a PodDisruptionBudget defined with minAvailable: 100% or minAvailable: 1, this will ensure none of your pods will ever be disrupted. This will halt any node rollout process by Cloud Platform team such as draining a node or doing a cluster upgrade.

Example PDB definitions using minAvailable:1

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: helloworld-pdp
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: helloworld-rubyapp

Example of deployment for the above PDB

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld-rubyapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: helloworld-rubyapp
  template:
    metadata:
      labels:
        app: helloworld-rubyapp
    spec:
      containers:
      - name: rubyapp
        image: ministryofjustice/cloud-platform-helloworld-ruby:1.1
        ports:
        - containerPort: 4567

PDB with 100% availability and cluster maintenance operations

Cloud Platform team runs a recycle-node pipeline on every weekday between 8am and 10am where the oldest worker node is gracefully replaced. This is to ensure the pods are not stuck around forever in one node and things like namespace quota and limit range changes have the desired effects and not holding reservations for the old values.

If you have a deployment set as above with an unreasonable/aggressive PDB, the recycle-node pipeline tries to evict the pod gracefully from the node, but after several unsuccessful retries it will evict the pod brutally which may bring downtime to your app.

The team also perform regular kubernetes upgrades to mitigate security vulnerability and to benefit new features. Hence, ensure your configuration for replicas and PodDisruptionBudget allow the Cloud Platform team to manage the nodes.

This page was last reviewed on 11 March 2024. It needs to be reviewed again on 11 September 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 11 September 2024 by the page owner #cloud-platform. This might mean the content is out of date.