Deploying to the Cloud Platform
Declarative, not Imperative
Deploying to kubernetes is done declaratively rather than imperatively. So, instead of defining a list of instructions to be carried out (e.g. “install these gems, then run this install script”), you tell the cluster what state it should be in, and then leave it to the cluster to do whatever is necessary to get from the state it’s in to the state you want it to be in.
By ‘state’ here, we usually mean something like “there should be 4 pods, each running an instance of this docker image, listening on port 3000, with a service called ‘rails-app’ distributing inbound traffic to them.”
Much of the configuration of your service on the cloud platform will be done via kubernetes deployment objects. You can learn more about deployments, and see an example, here.
Use image tags
You specify the docker images which comprise your application like this:
0.8.1 above refers to the tag of the named docker image. This is
often a semantic version number, as above, but it could also be the hash of a
specific commit in the underlying software repo.
It is possible to omit the tag altogether:
If you do this, you will get the
latest version of the image (
latest is the
default tag value that all docker images have, assigned to the last version
pushed to the image repository).
We strongly advise against deploying the
latestversion of any images. Doing this makes it difficult or impossible to reproduce a known state, for example if you wanted to roll back a deployment to an earlier version.
Always specify specific tagged versions of your docker images.
Kubernetes makes it very easy to deploy your services in a
high-availability configuration. All you need to do is set the value of
replicas in your deployments to a value higher than
1. For production services, we recommend 4 as a sensible number of replicas.
This means that, in the event that a worker node dies, taking one of your replicas with it, the remaining replicas will handle all application traffic, so that there is no downtime for your service while the missing replica is replaced.
Kubernetes will automatically try to schedule your replicas on different worker nodes, to minimise the impact of a node outage on all the services running in the cluster.
Running multiple replicas is usually sensible for things like web application servers, where you just want to be sure that an instance of your app. is always available to service web requests. But, for some types of workload, such as background job processing, it might make sense to ensure that you have zero or one instances, rather than one or more.
An example of this would be a job processing tasks which must be handled in First In, First Out (FIFO) order - running multiple replicas in this case could mean tasks get processed out of order. For workloads like this, consider a Recreate deployment strategy.
Zero Downtime Deploys
This is another feature you get ‘for free’ from kubernetes.
Your deployments have a
strategy section, which
could look something like this:
strategy: type: RollingUpdate rollingUpdate: maxSurge: 100% maxUnavailable: 50%
rollingUpdate (the default deployment strategy) means that when your
deployment is updated, or your pods need to be moved to another node, the
cluster will create new instances before terminating the old ones.
100% means that a complete additional copy of your deployment will be created
before any of your old pods are terminated, and
maxUnavailable: 50% means the
cluster will only allow at most half of your pods being unavailable (e.g. if
the new version of your service fails to deploy, for some reason).
To redeploy your application with zero downtime, all you need to do is create an updated version of your deployment and apply it to the cluster, which will then take care of launching new pods and deleting old ones.
Horizontal Pod Autoscaling (HPA)
The Horizontal Pod Autoscaler is a built-in Kubernetes feature that allows you to horizontally scale applications based on one or more monitored metrics such as cpu or memory usage.
The metrics you specify in the
HorizontalPodAutoscaler manifest will determine the minimum and maximum number of pods needed, and set the thresholds at which pods should be created or removed.
The Horizontal Pod Autoscaler can ensure that critical applications are elastic and can scale out to meet increasing demand as well scale down to ensure optimal resource usage, it is also a great tool in saving money by auto-scaling down non-production work loads when not in use, such as overnight or at the weekend.
The HPA calculates the number of replicas by calculating the ratio between desired metric value and current metric value. Details on how the algorithm works is explained here
Important aspects of the HorizontalPodAutoscaler to be aware of:
- Resource Limits - You need to set resource limits otherwise HPA will not work, as it will not have a value to quantify from.
- 15 seconds - The HPA controller checks the value of the metric used every 15 seconds per pod.
- 3 minutes - The HPA scales up pods if the metric threshold has been continually exceeded for 3 mins.
- 5 minutes - The HPA scales down pods if the metric threshold has not been exceeded for 5 mins.
To create the horizontal pod autoscaler for your deployment, create a
hpa-myapp.yaml similar to this:
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: my-app-name namespace: my-namespace spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app-name minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 95
The above manifest will ensure a minimum of 1 replica is available for the deployment called my-app-name, but will increase replicas up to 5 when the CPU utilization is above 95%.
To configure what value to set for
targetCPUUtilizationPercentage depends on resource limits set up in your namespace and how much the actual pods consume.
This limitrange file defines the defaults we assign to new namespaces.
Check how much defaults is set for your namespace
Check how much the pods actually consume by running
kubectl top pods -n <namespace>If your namespace
defaultRequest.cpu : 10mand you pods consumes
8mwihtout any traffic, then the usual CPUUtilizationPercentage is 80%. The pods doesnot need scaling until it reaches 95% of the
defaultRequest.cpuwhich is already reserved for each of the containers. Hence the
targetCPUUtilizationPercentagecan be set as 95%.
Run the following command to apply:
$ kubectl apply -f hpa-myapp.yaml -n my-namespace horizontalpodautoscaler.autoscaling/hpa-myapp autoscaled
Run the following to describe the status of the pod autoscaler:
$ kubectl describe hpa -n my-namespace my-app-name
Name: my-app-name Namespace: my-namespace Labels: app=my-app-name Annotations: <none> CreationTimestamp: Tue, 01 Jun 2019 23:35:22 +0100 Reference: Deployment/my-app-name Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 10% (1m) / 50% Min replicas: 1 Max replicas: 5 Deployment pods: 1 current / 1 desired Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale recommended size matches current size ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request) ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
For the above, the current use is 10% of the resource limit, if/when the CPU consumption exceeds the threshold of 50% for more than 3 minutes, the deployment will scale up.
It is also possible to use custom metrics with the horizontal pod-autoscaler. However. this requires a bit more input from the development and Cloud-Platform team. If this is something you may be interested in, please speak to a member of the team on #ask-cloud-platform
Click here for the official Kubernetes documentation on the horizontal pod autoscaler walkthrough.