notes blog about

Monitoring

Prometheus is a metrics-based tool for operational monitoring of computer systems, i.e. it’s useful for:

At the end of the day all monitoring systems are data processing pipelines.

Monitoring is about events like receviving a HTTP request, entering a function, a user logging in, requesting more memory from the kernel. All events also have context like the IP address the request is coming from or call stacks of functions.

Having all context all the time is impractical - ways to reduce the amount of data to something workable:

Architecture and Components

                      +-------------+
                      |             |
                      | EC2, K8s,   |
                      | Consul, etc.|
                      +------^------+
                             |
                     +--------------------------------+
 +-------------+     |       |                        |
 | Application |     | +-----------+       Prometheus |
 |             |     | |           |                  |
 |    +--------+     | | Service   |                  |
 |    |Client  |     | | Discovery |                  |  +--------------+
 |    |Librabry<---+ | |           |                  |  | Email, PD,   |
 +-------------+   | | +-----------+                  |  | Slack, etc.  |
                   | |       |                        |  +------^-------+
                   | |       |                        |         |
                   | | +-----v -----      +---------+ |  +--------------+
                   | | |           |      |         | |  |              |
 +-------------+   +---+ Scraping  |      | Rules & +----> Alertmanager |
 |  Exporter   <---+ | |           |      | alerts  | |  |              |
 +-------------+     | +-----------+      +--^------+ |  +--------------+
        |            |       |               |   |    |
        |            | +-----v-------------------v--+ |
 +------v------+     | |                            | |  +--------------+
 |             |     | |          Storage           +----> Dashboards   |
 | 3rd Party   |     | |                            | |  +--------------+
 | Application |     | |                            | |
 |             |     | +----------------------------+ |
 +-------------+     |                                |
                     +--------------------------------+

Node exporter

PromQL

https://prometheus.io/docs/prometheus/latest/querying/basics/

If you enter up query into the expression browser and hit “Execute” you get:

up{instance="localhost:9090",job="prometheus"}          1

Metrics Types and Aggregations

Gauge

Total FS size on each machine (node_filesystem_size_bytes metric comes from Node exporter):

sum(node_filesystem_size_bytes) without(device, fstype, mountpoint)

Counter

How many samples Prometheus is ingesting per-second averaged over one minute:

rate(prometheus_tsdb_head_samples_appended_total[1m])

The output of rate is a gauge, so e.g. to get total bytes received per machine per second:

sum(rate(node_network_receive_bytes_total[5m])) without(device)

Selectors

You almost always will want to limit by job label (defines application type), e.g.:

process_resident_memory_bytes{job="kubelet"}

Matchers

=  --> job="node"
!=
=~ --> job=~"n.*"  # fully anchored, RE2
!~

Query examples

https://prometheus.io/docs/prometheus/latest/querying/examples/

Labels

There are two types of labels although you don’t see any difference among them in PromQL:

1) Instrumentation labels

2) Target labels

Kubernetes

You can run Prometheus in K8s and monitor K8s objects in two ways:

  1. Standard K8s objects (kinds) like configMap, deployment and service + access permissions so Prometheus can access (monitor) K8s objects (sample manifest).
  2. Prometheus Operator which uses custom resource definition (CRD) feature of k8s to define custom K8s objects (like Prometheus and PrometheusRule).

P8s can discover targets to monitor by using K8s API. There are currently these types of K8s service discovery you can use with P8s:

Tips and tricks

Useful metrics

Sources