< Back to articles

Prometheus Metrics Scraping for Google Cloud Monitoring

Has this ever happened to you? You develop your backend app, prepare lovely metrics for it, export those as Prometheus metrics because your backend friends told you it's an industry standard, and now you can’t see those metrics in Google Cloud Monitoring (referred as the monitoring interchangeably in this text). You thought it just works. Google, in most cases, likes tools adopted by CNCF. Why shouldn't it work?

Unless you deploy Prometheus yourself or use GKE, you are probably out of luck. And even the past of Prometheus support for GKE wasn't a pretty one. Just a quick overview:

  • First, you could use the Prometheus sidecar. The orchestration wasn’t simple, but once it was done, you just recycled your setup, probably written in Helm and you were done with it. Now deprecated!
  • Second, we had Cloud Monitoring with a resource called PodMonitor. The only thing you had to do was to enable Workloads in Cloud Monitoring components. It just worked, it was easy, … and now it’s deprecated!
  • Third Time's the Charm: now we have a PodMonitoring Kubernetes resource. Did you notice that “ing” at the end? Yeah, nah, me neither. It took me a few hours of migration because the name of the old resource was PodMonitor. You can enable the resource by setting up Managed Service for Prometheus in the GKE cluster. The funny thing is that it’s still beta.

Now you can pick from two deprecated solutions and one in beta. Lovely. It’s June 2022, I expect a new tool for Prometheus in GKE in the next 2-3 months. Is it going to be here for a while? Probably not.

And what about Cloud Run? AppEngine? Or just a plain GCE instance? To be honest and fair to Google, after struggling with the GKE setup, I didn’t research any more. Shame on me.

Ok, so what is this text even about?

I know. I just wanted to emphasize how hard it is to use GCP. The main point of this text is that I wanted to scrape Prometheus metrics, push them into Google Cloud Monitoring and not waste several hours of setting up tools that will get deprecated sooner or later.

Google actually has a tool to do that for go-metrics. Its main goal is to take the metrics and push them to Google Cloud Monitoring. The name is simply go-metrics-stackdriver. Of course, it has a large disclaimer saying it is not an officially supported Google product.

I used the tool to investigate how to send Prometheus histogram to the monitoring. The problem is that there is no histogram, only distribution. Luckily, you can easily transform a histogram into explicit distribution. That means the bucket intervals are defined not by mean and sum of square deviations but just by the quantiles represented as a list of floats.

See the snippet from go-metrics-stackdriver:

Value: &monitoringpb.TypedValue_DistributionValue{  
    DistributionValue: &distributionpb.Distribution{  
      BucketOptions: &distributionpb.Distribution_BucketOptions{  
            Options:  
&distributionpb.Distribution_BucketOptions_ExplicitBuckets{  
                   ExplicitBuckets:  
&distributionpb.Distribution_BucketOptions_Explicit{  
                         Bounds: v.buckets,  
                 },  
            },  
       },  
       BucketCounts: v.counts,  
       Count:     count,  
    },  
},  
  

Bounds contain quantile boundaries and BucketCountscontain cumulative count of points. Confusing data types (like Distribution_BucketOptions_Explicit, …) are just something from Google API. That is nice, because the naming convention works in all the SDKs. Once you understand the API, you understand the SDK.

Gauge and Counter metrics are very similar to those in Google Cloud Monitoring. Therefore it was fairly easy to prepare the transformation for them. Scraping of Prometheus metrics was also done relatively simply by using the expfmt package in Prometheus common repository.

The code of my scraper is available in the project repository. I am not a golang pro (also, I am not a developer). You are more than welcome to drop some smart comments about my lack of best practices usage. I do appreciate it!

The module

As always, I prepared a terraform module for this. It accepts setup for scraping and submits the metrics to the project in which the module is executed. Let’s take a look at an example:

module "metrics_push" {  
  source      = "../"  # use correct module path  
  project_id  = var.project  
  scrape_jobs = {  
    testing : {  
      schedule : "* * * * *"  
      endpoint : "https://example.com/metrics"  
    }  
  }  
  region = var.region  
}

Each key in scrape_jobs map is a Google Cloud Scheduler job. It calls a Cloud Function containing golang code for scraping. The function scrapes the metrics from the given http endpoint and sends the data to the monitoring. The submitted metric type has the following form:

Type:   fmt.Sprintf("custom.googleapis.com/%s/%s", *config["SERVICE"], *v.Name),

SERVICE is a key from scrape_job map. In our case, it is testing. And *v.Name is a name of a metric scraped from the Prometheus metrics endpoint.

Prometheus metrics scraping: Summary

I will not tell you this is the right way to work with Prometheus. I started to prepare this because I wanted to debug missing data in the monitoring. In the end, it helped me use Prometheus also on Cloud Run.

There are obviously issues which should be solved:

  • The scheduler is limited to cron. The most frequent samples are scrapped each minute.
  • Services do not allow metrics endpoints available to the outside world. E.g. for services running in GCE, VPC serverless connector for Cloud Function has to be used.
  • And I am sure you will find more of them yourselves.

One other thing which I actually like about GCP: Once I create a tool for something which I think is missing, it will appear as a service the very next day. You can have the feeling my time was wasted, but, in a way, I am always happy to use something with SLA. It is much more satisfying to blame somebody else than me, especially if it’s a monitoring tool.

Anyway, if you ever need to scrape Prometheus metrics and push them into Google Cloud Monitoring, look at the project. I do appreciate any feedback. The Terraform module is available in the registry. Like & subscribe.

Martin Beránek
Martin Beránek
DevOps Team LeadMartin spent last few years working as an architect of the Cloud solutions. His main focus ever since he joined Ackee is implementing procedures to speed up the whole development process.

Are you interested in working together? Let’s discuss it in person!

Get in touch >