Skip to main content

Cloud Platform Metrics and Dashboards

This guide provide details of metrics and dashboards that are available to view the health and understand the behaviours of the platform.


Cloud Platform uses several Grafana dashboards to monitor the health of the platform. These dashboards are available to all Cloud Platform users.

Some of the key dashboards are:

To monitor the resource usage of the cluster:

Cluster health:

Namespace level usage:

Metrics and Alerts

Cloud Platform uses Prometheus metrics to collect the data from the cluster and Alertmanager to send alerts to Slack Channel. These are the alerts that are set up to monitor the health of the cluster:

The platform also uses Pingdom as a black box monitoring tool to monitor the cluster from outside and use Pagerduty to send alerts to Slack channel and person on-call.

All alerts triggered from Alertmanager and Pagerduty come to #high-priority-alarms and #lower-priority-alarms channel based on the severity of the alert.

Cloud Platform hosted Services and their costs

The Cloud Platform team has created a dashboard to report on the number of services hosted on the platform and their costs.

Cloud Platform hosted services:

Cloud Platform costs per namespace:

Cloud Platform deployments:

This page was last reviewed on 12 March 2024. It needs to be reviewed again on 12 September 2024 by the page owner #cloud-platform .
This page was set to be reviewed before 12 September 2024 by the page owner #cloud-platform. This might mean the content is out of date.