You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

3.8 KiB

SysEng coding challenge: Prometheus exporter

Thanks for trying our coding challenge for systems engineers.

Prerequisites

You need to be able to run Docker on your development computer to work on this challenge. Please run

docker run -it -p 8080:8080 beorn7/syseng-challenge

Then run curl http://localhost:8080/stats. The output should look similar to the following:

{
  "requestCounters": {
    "200": 65221,
    "404": 14066,
    "500": 12618
  },
  "requestRates": {
    "200": 100,
    "404": 1
  },
  "duration": {
    "count": 91905,
    "sum": 4484.3037570333245,
    "average": 0.024613801985478054
  }
}

If you aren't already, you should make yourself familiar with the Prometheus monitoring and alerting system. A good starting point is Brian Brazil's very concise talk at FOSDEM 2016.

The Challenge

Imagine the little binary you have started above is an instance of a microservice that is running replicated with hundreds of instances on a computing cluster. For the sake of the challenge, we are not interested in what the service is actually doing. We are only interested in its metrics, and we want to simulate monitoring it with Prometheus. Thankfully, the service is providing metrics in JSON format via its /stats endpoint, as you have seen above. The requestCounters tell you how often each HTTP status code has been served during the lifetime of the binary. The requestRates tell you the same but for the last second, i.e. they give you the current QPS. The duration tells you how many requests have been served in total during the lifetime of the binary (count) and how much total time those requests have taken in seconds (sum). The average is the time in seconds a request has taken, averaged over the last second.

Unfortunately, Prometheus cannot ingest JSON directly but requires a custom format. Usually, you would write your microservice in a way that it would expose metrics in a format suitable for Prometheus directly. Let's imagine that this direct instrumentation is, for some reason, not feasible in this case. (In reality, this situation often arises when monitoring 3rd party software that does not happen to be instrumented for Prometheus specifically.) The usual solution is to write a so-called exporter, a little glue program that retrieves metrics from a 3rd party system and exposes them in the Prometheus way.

Your task is to write, in a language of your choice, such an exporter for the simulated microservice running in your Docker right now.

It might be helpful to also start a Prometheus server and scrape your exporter and explore the possibilities enabled by your metrics. The simulated microservice instance is looping through a number of scenarios over the course of about 15 minutes.

Bonus questions

Optionally, you may answer the following questions. Thinking about them might also help you solve the coding challenge in a meaningful way. Keep answers short. It's really just about sketching out a few ideas. If we invite you for on-site interviews, we will have plenty of time to discuss them in detail.

  1. What are good ways of deploying hundreds of instances of our simulated service? How would you deploy your exporter? And how would you configure Prometheus to monitor them all?
  2. What graphs about the service would you plot in a dashboard builder like Grafana? Ideally, you can come up with PromQL expressions for them.
  3. What would you alert on? What would be the urgency of the various alerts? Again, it would be great if you could formulate alerting conditions with PromQL.
  4. If you were in control of the microservice, which exported metrics would you add or modify next?