|
|
|
|
@ -38,4 +38,102 @@ or with docker-compose
|
|
|
|
|
docker-compose up |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
Metrics should now be availabe on `http://localhost:8081/metrics` |
|
|
|
|
Metrics should now be available on `http://localhost:8081/metrics` |
|
|
|
|
|
|
|
|
|
Bonus |
|
|
|
|
===== |
|
|
|
|
|
|
|
|
|
1. What are good ways of deploying hundreds of instances of our simulated |
|
|
|
|
service? How would you deploy your exporter? And how would you configure |
|
|
|
|
Prometheus to monitor them all? |
|
|
|
|
|
|
|
|
|
Pretty easy with *kubernetes*. |
|
|
|
|
Just run the exporter along the app in a pod with an ReplicationController: |
|
|
|
|
```yaml |
|
|
|
|
apiVersion: v1 |
|
|
|
|
kind: ReplicationController |
|
|
|
|
metadata: |
|
|
|
|
name: replicatedApp |
|
|
|
|
spec: |
|
|
|
|
replicas: 100 |
|
|
|
|
selector: |
|
|
|
|
app: exportedApp |
|
|
|
|
template: |
|
|
|
|
metadata: |
|
|
|
|
name: podApp |
|
|
|
|
annotations: |
|
|
|
|
prometheus.io/scrape: true |
|
|
|
|
prometheus.io/port: 8081 |
|
|
|
|
labels: |
|
|
|
|
app: exportedApp |
|
|
|
|
spec: |
|
|
|
|
containers: |
|
|
|
|
- name: challenge |
|
|
|
|
image: beorn7/syseng-challenge |
|
|
|
|
ports: |
|
|
|
|
- containerPort: 8080 |
|
|
|
|
- name: exporter |
|
|
|
|
image: exporter |
|
|
|
|
ports: |
|
|
|
|
- containerPort: 8081 |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
Just use the service discovery in prometheus: |
|
|
|
|
```yaml |
|
|
|
|
- job_name: kube-app |
|
|
|
|
kubernetes_sd_config: |
|
|
|
|
- role: pod |
|
|
|
|
relabel_configs: |
|
|
|
|
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] |
|
|
|
|
action: keep |
|
|
|
|
regex: true |
|
|
|
|
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] |
|
|
|
|
action: replace |
|
|
|
|
regex: ([^:]+)(?::\d+)?;(\d+) |
|
|
|
|
replacement: $1:$2 |
|
|
|
|
target_label: __address__ |
|
|
|
|
- source_labels: [__meta_kubernetes_pod_name] |
|
|
|
|
action: replace |
|
|
|
|
target_label: podApp |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
DNS discovery may be an alternative, for example with coredns. |
|
|
|
|
|
|
|
|
|
2. What graphs about the service would you plot in a dashboard builder like Grafana? |
|
|
|
|
|
|
|
|
|
Usually graph everything where attention is required. |
|
|
|
|
It does not make sense to monitor metrics/graphs where nobody needs to get in action. Less is more. |
|
|
|
|
|
|
|
|
|
Assuming we have a fleet of this service and monitor all of them, it makes sense to graph in groups. |
|
|
|
|
|
|
|
|
|
*Graph* Request rates per code (QPS): |
|
|
|
|
``` |
|
|
|
|
sum(app_request_rates) by (code) |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
*Graph* Highest latencies: |
|
|
|
|
``` |
|
|
|
|
max(app_duration_avg) |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
*Singlestat* Running instances: |
|
|
|
|
``` |
|
|
|
|
count_scalar(app_up == 1) |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
3. What would you alert on? What would be the urgency of the various alerts? |
|
|
|
|
|
|
|
|
|
High: Too few apps are up (to handle all requests) |
|
|
|
|
Middle/Hight: Request times are too high (priority depends on latency) |
|
|
|
|
Middle: Too many bad/failed requests (5xx) codes in comparision to suceeded (2xx) |
|
|
|
|
|
|
|
|
|
4. If you were in control of the microservice, which exported metrics would you |
|
|
|
|
add or modify next? |
|
|
|
|
|
|
|
|
|
Depends a little bit on the service, but probably these will be useful: |
|
|
|
|
- CPU/RAM utilization. Probably network throughput. |
|
|
|
|
- Avg duration time per code, method. |
|
|
|
|
- Request rates per code and method. |
|
|
|
|
|
|
|
|
|
In general, monitor more metrics than you need in the moment. |
|
|
|
|
As more than you have, debugging an issue can probably solved by an metric which is not active monitored. |