Kubernetes metrics to watch for capacity management

Nice and eazy Metrics

## Information about Node

# Memory
Use bytes(IEC) and NO bytes(SI) the diff is: 
The unit MiB was defined by the International Electrotechnical Commission (IEC)
1 Mebibyte (MB) = 1.04 Megabyte (MiB)
50Mi = 50 mebibytes
50M = 50 megabytes

# CPU
500m = 500millicore = 0.5 core
1000m = 1000milicore = 1 CPU or 1 core

# CPU capacity <cores>
kube_node_status_capacity{resource="cpu"}

# Memory capacity <bytes>
kube_node_status_capacity{resource="memory"}
# Memory capacity in Gb
kube_node_status_capacity{resource="memory"}/(1024*1024*1024)

# Seconds that CPU spent in each mode: idle,user,system etc
node_cpu_seconds_total

## Information about Pod

kube_pod_container_info

# The number of  requests/limits resource by a container:
kube_pod_container_resource_requests
kube_pod_container_resource_limits

# Current  Working Set(set of memory pages touched recently by the threads in the process) in bytes, aka OOM killer:
container_memory_working_set_bytes

# Cumulative CPU time (user time + system time) consumed in seconds:
container_cpu_usage_seconds_total

PromQL query syntax

`<metric_name>[{<label_1="value_1">,<label_N="value_N">}] <metric_value>`

# label matchers
= exact match
!= negative equality matcher, time serie that don't have the label
=~ regex expression matcher
!~ Select labels that do not regex-match

# label values that start with... i.e. mountpoint label does not start with with `/run`
{mountpoint=~"/run.*"}

# label value does not start with..., exclude label values i.e. mountpoint label does not start with with `/run`
{mountpoint!~"/run.*"}

# multiple label values, i.e. all time series for metric for both web and job node
node_disk_read_bytes_total{job=~"web|node"}

# rate vs irate functions
rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector.
breaks in monotonicity (such as counter resets due to target restarts)
rate should only be used with counters and native histograms

irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range vector
breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for
Calculates the rate only between the LAST TWO datapoints in an interval

# Range vector:a set of time series containing a range of data points over a specified time range for each time series
node_boot_time_seconds[5m]

# Instant vector: one or a set of time series containing a single sample for each time series, all sharing the same timestamp. Apply rate over range vector rate(range_vector)=>instant_vector
rate(container_cpu_usage_seconds_total[5m])

# agregation operators: aggregate the elements of instant vector, resulting a new instant vector with fewer elemnts
sum (calculate sum over dimensions)
count (count number of elements in the vector)

# by clause, allows to choose which labels to aggregate along i.e. group by (label) (metric)
group by (node,internal_ip) (kube_node_info)

# 99% quantile of the metric 
histogram_quantile(φ scalar, b instant-vector) calculates the φ-quantile (0 ≤ φ ≤ 1) from a classic histogram

PromQL query 101

Every metric has by default these 2 labels: job and instance
Quantiles are typically applied to latency metrics like http_request_duration_seconds. Quantiles don't make sense directly on metrics with total sufix e.g. http_requests_total because these are counters (total count). But still if you need to use _total then apply histogram_quantile to bucket histogram_quantile(0.99, http_requests_total_bucket)
Submetrics for histogram metric: __count [total number of observations], __bucket [number of observations for a specific bucket], __sum [sum of all observations]
Ignoring certain labels when matching, i.e. dividing 2 metrics, one of which has a label that the other one does not have, simply ignore class label of node_filesystem_size_bytes node_filesystem_avail_bytes / ignoring(class) node_filesystem_size_bytes

PromQL query:

Available Capacity per Node: CPU [cores]

sum by (node) (kube_node_status_capacity{resource="cpu",unit="core"}
sum by (node) (kube_node_status_capacity{resource="cpu",unit="core", instance=~".*:8080",service="prometheus-systematic-kube-state-metrics"})
count(node_cpu_seconds_total{mode="user"}) by (instance)

CPU usage percentage: Subtracts the idle percentage from 100

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle",instance=~".*:9100"}[5m])) * 100)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle",instance=~".*:9100",service="prometheus-systematic-prometheus-node-exporter"}[5m]) * 100) * on(instance) group_left(nodename) node_uname_info)

Node load: a measure of CPU utilization which averages running process at any one moment

#  calculate saturation:  1-minute load average per CPU core for each instance.
node_load1 / on(instance) group_left() count(node_cpu_seconds_total{mode="user"}) by (instance)

Gauge/stat for no of pods in each namespace:

sum(kube_pod_info) by (namespace)
sum by (namespace) (kube_pod_status_phase{phase="Failed"})

Average Duration: average request duration (in seconds) over the past 5 minutes

rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

Containers without Memory/CPU limits per namespace:

# without CPU limts
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="CPU"}))

# without Memory limits
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))

Containers whose CPU is are close to limits:

(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"})) > 0.8

Containers whose Memory usage is close to limits:

(sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"})) > 0.8

Top 10 containers without limits using CPU:

topk(10,sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Top 10 containers without limits using Memory:

topk(10,sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))

Memory requests and limits:

# pod == label_values(pod)
kube_pod_container_resource_requests{pod="$pod", resource="memory"}
kube_pod_container_resource_limits{pod="$pod", resource="memory"}

Memory usage:

# pod == label_values(pod)
container_memory_working_set_bytes{name!~"POD",pod="$pod"}

dejanu/prometheus101.md

Nice and eazy Metrics

PromQL query syntax

`<metric_name>[{<label_1="value_1">,<label_N="value_N">}] <metric_value>`

PromQL query 101

PromQL query:

dejanu commented Nov 27, 2024