|
|
|
|
@ -47,7 +47,7 @@ Bonus
|
|
|
|
|
service? How would you deploy your exporter? And how would you configure |
|
|
|
|
Prometheus to monitor them all? |
|
|
|
|
|
|
|
|
|
Pretty easy with *kubernetes*. |
|
|
|
|
Pretty easy with **kubernetes**. |
|
|
|
|
Just run the exporter along the app in a pod with an ReplicationController: |
|
|
|
|
|
|
|
|
|
_Note: Config is just an proof of concept, not fully tested:_ |
|
|
|
|
@ -108,17 +108,17 @@ Bonus
|
|
|
|
|
|
|
|
|
|
Assuming we have a fleet of this service and monitor all of them, it makes sense to graph in groups. |
|
|
|
|
|
|
|
|
|
*Graph* Request rates per code (QPS): |
|
|
|
|
**Graph** Request rates per code (QPS): |
|
|
|
|
``` |
|
|
|
|
sum(app_request_rates) by (code) |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
*Graph* Highest latencies: |
|
|
|
|
**Graph** Highest latencies: |
|
|
|
|
``` |
|
|
|
|
max(app_duration_avg) |
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
*Singlestat* Running instances: |
|
|
|
|
**Singlestat** Running instances: |
|
|
|
|
``` |
|
|
|
|
count_scalar(app_up == 1) |
|
|
|
|
``` |
|
|
|
|
@ -126,13 +126,16 @@ Bonus
|
|
|
|
|
3. What would you alert on? What would be the urgency of the various alerts? |
|
|
|
|
|
|
|
|
|
High: Too few apps are up (to handle all requests) |
|
|
|
|
|
|
|
|
|
Middle/Hight: Request times are too high (priority depends on latency) |
|
|
|
|
|
|
|
|
|
Middle: Too many bad/failed requests (5xx) codes in comparision to suceeded (2xx) |
|
|
|
|
|
|
|
|
|
4. If you were in control of the microservice, which exported metrics would you |
|
|
|
|
add or modify next? |
|
|
|
|
|
|
|
|
|
Depends a little bit on the service, but probably these will be useful: |
|
|
|
|
|
|
|
|
|
- CPU/RAM utilization. Probably network throughput. |
|
|
|
|
- Avg duration time per code, method. |
|
|
|
|
- Request rates per code and method. |
|
|
|
|
|