Security Controls on the Cloud Platform
The purpose of this page is to explain some of the security issues which often come up during “code of connection” processes, when connecting to external systems.
The Cloud Platform (CP) is hosted in Amazon AWS, and consists of two kubernetes clusters:
- manager - an EKS cluster which runs shared services such as monitoring and CI/CD pipelines
- live - a EKS “application” cluster, managed by EKS, which runs all hosted services.
…plus additional AWS managed services such as database instances, S3 buckets, etc.
The clusters and all the AWS resources that services use are hosted in a single VPC, which is only accessible to the outside world via either https connections, or via the kubernetes API and SSH through a bastion host (only the CP team have access to these).
Isolation of services
Services hosted in the CP run in one or more namespaces. Each namespace is isolated from the others by kubernetes NetworkPolicy configurations. Service teams can choose to allow traffic between specific namespaces (e.g. if their service has multiple components, each in its own namespace), but by default no inter-namespace network traffic is permitted.
In ‘traditional’ AWS hosting environments, it is common to create some infrastructure and managed resources (e.g. some EC2 instances, an RDS instance, and a load-balancer), and use security groups to segregate this set of infrastructure from other components in the same VPC.
It is not possible to use security groups in this way in the CP.
The kubernetes cluster has a pool of worker nodes (EC2 instances) which are entirely interchangeable from the point of view of the services hosted in the cluster. Services do not know, and should not care, which worker nodes are hosting them at any given moment. So, network connections to, for example, an RDS instance could come from any of the worker nodes in the cluster.
Security groups work at the EC2 instance level. i.e. requests to an RDS instance from a specific set of EC2 instances would be allowed, and others would be blocked. Since we can’t block requests from any of the worker nodes, security groups cannot be used.
As described above, the nature of the CP means that, at the network level, anything in the CP can attempt to connect to any AWS resource belonging to any service hosted in the CP.
Such resources are secured by credentials, potentially including encryption keys, which are usually made available to services via kubernetes secrets in the namespaces in which the services run.
Cloud Platform recommends using Amazon ECR to store container images. Images are scanned for vulnerabilities using Amazon ECR Image Scanning.
We also have Trivy-Operator running in the CP, which scans images for vulnerabilities and reports them to the Cloud Platform Vulnerability Dashboard. At present, we don’t take any action on the vulnerabilities reported, but this is something we plan to address in the future.