You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Richard Hillmann f971847235 comment out docker-compose 9 years ago
vendor save with vendor 9 years ago
.gitignore init project 9 years ago
Dockerfile save update 9 years ago
INSTRUCTIONS.md init project 9 years ago
Makefile save with vendor 9 years ago
README.md update readme 9 years ago
docker-compose.yml comment out docker-compose 9 years ago
exporter.go fix Readme 9 years ago
exporter_test.go update makefile to work with busybox 9 years ago
prometheus.yml save update 9 years ago
test.yml fix Readme 9 years ago

README.md

Challenge: Exporter for Prometheus

This is an prometheus metrics exporter for the soundcloud challenge.

Requirements

  • Docker, of course ;-)
  • Working go 1.8 (older should work as well)

Build

Build binary exporter:

make

Build docker image with tag challenge/exporter:

make docker

Run

Start locally in docker:

docker run -d -p 8080:8080 --name challenge beorn7/syseng-challenge
docker run -d -p 8081:8080 --link challenge --name challenge_exporter challenge/exporter --endpoint http://challenge:8080/stats

docker stop challenge_exporter
docker rm challenge_exporter
docker stop challenge
docker rm challenge

or with docker-compose

docker-compose up

Metrics should now be available on http://localhost:8081/metrics

Bonus

  1. What are good ways of deploying hundreds of instances of our simulated service? How would you deploy your exporter? And how would you configure Prometheus to monitor them all?

    Pretty easy with kubernetes. Just run the exporter along the app in a pod with an ReplicationController:

    Note: Config is just an proof of concept, not fully tested:

     apiVersion: v1
     kind: ReplicationController
     metadata:
       name: replicatedapp
     spec:
       replicas: 100
       selector:
         app: exportedapp
       template:
         metadata:
           name: podapp
           annotations:
             prometheus.io/scrape: "true"
             prometheus.io/port: "8081"
           labels:
             app: exportedapp
         spec:
           containers:
           - name: challenge
             image: beorn7/syseng-challenge
             ports:
             - containerPort: 8080
           - name: exporter
             image: exporter
             ports:
             - containerPort: 8081      
    

    Just use the service discovery in prometheus:

    - job_name: kube-app  
      kubernetes_sd_config:
        - role: pod
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: pod       
    

    DNS discovery may be an alternative, for example with coredns.

  2. What graphs about the service would you plot in a dashboard builder like Grafana?

    Usually graph everything where attention is required. It does not make sense to monitor metrics/graphs where nobody needs to get in action. Less is more.

    Assuming we have a fleet of this service and monitor all of them, it makes sense to graph in groups.

    Graph Request rates per code (QPS):

    sum(app_request_rates) by (code)
    

    Graph Highest latencies:

    max(app_duration_avg)
    

    Singlestat Running instances:

    count_scalar(app_up == 1)
    
  3. What would you alert on? What would be the urgency of the various alerts?

    High: Too few apps are up (to handle all requests)

    Middle/Hight: Request times are too high (priority depends on latency)

    Middle: Too many bad/failed requests (5xx) codes in comparision to suceeded (2xx)

  4. If you were in control of the microservice, which exported metrics would you add or modify next?

    Depends a little bit on the service, but probably these will be useful:

    • CPU/RAM utilization. Probably network throughput.
    • Avg duration time per code, method.
    • Request rates per code and method.

    In general, monitor more metrics than you need in the moment. As more than you have, debugging an issue can probably solved by an metric which is not active monitored.