Skip to main content

Getting application metrics into Prometheus

Overview

This guide will walk you through the steps to export metrics from your application into the Cloud Platform Prometheus. By exporting these metrics into Prometheus you can create useful observability tools like Grafana dashboards and triggered alerts on things like crashing pods and failed deployments. To do that, Prometheus needs to be able to scrape data from a /metrics endpoint, which is created by a Prometheus client library. Once you have a /metrics endpoint you can create a ServiceMonitor to connect the Cloud Platform Prometheus to your endpoint and store data for querying.

The example application in this document will be the Ruby reference app, utilising the Ruby prometheus-client gem. If you’re following along in another language, Prometheus offers a several client libraries to get you started. At the end you should have a working /metrics endpoint that displays your site’s response time, which we can use to query the application latency in the Cloud Platform Prometheus.

The application latency metric is quite basic but our intention is to get you started.

Assumptions

To keep this document short we will assume you already have an application up and running in a namespace on the Cloud Platform, if not, please see Deploying a multi-container application to the Cloud Platform.

Changing the application code

We need to add the Prometheus Ruby client library via a gem to give us our /metrics endpoint.

First, add the gem to your Gemfile and install with bundler.

gem `prometheus-client`

Next, we need to amend the config.ru file and include the two rack middlewares required by the prometheus-client.

require_relative 'config/environment'
require 'prometheus/middleware/collector'
require 'prometheus/middleware/exporter'

use Prometheus::Middleware::Collector
use Prometheus::Middleware::Exporter

run Rails.application

If you’re running this locally, you’ll now be able to query your /metrics endpoint and see some metrics data. If nothing appears, or metrics cannot be found, this hasn’t worked.

curl localhost:3000/metrics

Build, tag and push your application changes to your code repository and deploy the latest version into your Cloud Platform namespace. Confirm your /metrics endpoint is now accessible from your url.

curl https://myapp.cloud-platform/metrics

Add Service endpoint and ServiceMonitor

We need to expose the metrics endpoint with a Service and tell the Cloud Platform Prometheus to scrape the endpoint with a ServiceMonitor object in Kubernetes. A ServiceMonitor is a custom resource definition (CRD) that allows you to automatically generate Prometheus scrape configuration based on a specified resource.

In this example, we’re using the same port to expose both our application and metrics endpoint so we’ll need to query our existing Service for the current port name. However, if you’re exposing a different port you’ll need to either amend your current Service or create a new one.

Let’s find out our current port name and number by running:

kubectl -n <namespace> get svc rails-app-service -o=jsonpath={.spec.ports[0].name}

As you can see, the name of the port we’re exposing is http.

Create and apply your service monitor <application>-serviceMonitor.yaml, as below:

   apiVersion: monitoring.coreos.com/v1
   kind: ServiceMonitor
   metadata:
     name: rails-app-service
   spec:
     selector:
       matchLabels:
         app: rails-app-service
     endpoints:
     - port: http # this is the port name you grabbed from your running service
       interval: 15s

This will tell Prometheus to go and scrape that endpoint every 15 seconds and store any exposed metrics.

Add a NetworkPolicy resource

The Prometheus server is in the ‘monitoring’ namespace, but by default, any network connections from outside your application’s namespace will be blocked. So, to allow prometheus to scrape your application’s /metrics endpoint, we need to add a network policy to allow connections from the monitoring namespace.

Create and apply a new resource <application>-networkPolicy.yaml, as below:

   kind: NetworkPolicy
   apiVersion: networking.k8s.io/v1
   metadata:
     name: allow-prometheus-scraping
     namespace: my-app-namespace
   spec:
     podSelector:
       matchLabels:
         app: rails-app
     policyTypes:                                                                                                                                                         
     - Ingress
     ingress:
     - from:
       - namespaceSelector:
           matchLabels:
             component: monitoring

Querying metrics

We can now query our /metric endpoint using the Cloud Platform Prometheus.

Head to Cloud Platform Prometheus and use the following promql query to view the application latency (remembering to change the namespace value):

http_server_request_duration_seconds_sum{namespace="my-namespace"}

The output will be something like: Image of prometheus output

Example in full

If you’d like to see the changes I’ve made to the cloud-platform-multi-container-demo-app, please see this PR.

Applications configured to use multiple processes

If you’re using a pre-forking web server (like unicorn or puma for Ruby, or gunicorn for Python) and have it configured to use multiple processes, then you need to use a Prometheus client library that supports exporting metrics from multiple processes. Not all the official clients do that. If you don’t use a library which supports this, then requests to /metrics could be served by any of the processes, which would mean Prometheus sees inconsistent data on each scrape. The prometheus-client library we used in the example above supports multi-process metrics so will need to be aggregated, to report coherent total numbers. For more information on this please read this article.

More information on Service Monitors

CoreOS Blog on Prometheus Operator and ServiceMonitor

CoreOS README on Custom Resource Definitions

Example ServiceMonitors

This page was last reviewed on 10 August 2021. It needs to be reviewed again on 10 November 2021 .
This page was set to be reviewed before 10 November 2021. This might mean the content is out of date.