Skip to content

Instantly share code, notes, and snippets.

@dejanu
Last active November 28, 2024 20:24
Show Gist options
  • Save dejanu/46a00c6a6e8f84a57a5ed4f20c9ac4a1 to your computer and use it in GitHub Desktop.
Save dejanu/46a00c6a6e8f84a57a5ed4f20c9ac4a1 to your computer and use it in GitHub Desktop.
Kubernetes metrics to watch for capacity management

Nice and eazy Metrics

## Information about Node

# Memory
Use bytes(IEC) and NO bytes(SI) the diff is: 
The unit MiB was defined by the International Electrotechnical Commission (IEC)
1 Mebibyte (MB) = 1.04 Megabyte (MiB)
50Mi = 50 mebibytes
50M = 50 megabytes

# CPU
500m = 500millicore = 0.5 core
1000m = 1000milicore = 1 CPU or 1 core

# CPU capacity <cores>
kube_node_status_capacity{resource="cpu"}

# Memory capacity <bytes>
kube_node_status_capacity{resource="memory"}
# Memory capacity in Gb
kube_node_status_capacity{resource="memory"}/(1024*1024*1024)

# Seconds that CPU spent in each mode: idle,user,system etc
node_cpu_seconds_total

## Information about Pod

kube_pod_container_info

# The number of  requests/limits resource by a container:
kube_pod_container_resource_requests
kube_pod_container_resource_limits

# Current  Working Set(set of memory pages touched recently by the threads in the process) in bytes, aka OOM killer:
container_memory_working_set_bytes

# Cumulative CPU time (user time + system time) consumed in seconds:
container_cpu_usage_seconds_total

PromQL query syntax

<metric_name>[{<label_1="value_1">,<label_N="value_N">}] <metric_value>

# label matchers
= exact match
!= negative equality matcher, time serie that don't have the label
=~ regex expression matcher
!~ Select labels that do not regex-match

# label values that start with... i.e. mountpoint label does not start with with `/run`
{mountpoint=~"/run.*"}

# label value does not start with..., exclude label values i.e. mountpoint label does not start with with `/run`
{mountpoint!~"/run.*"}

# multiple label values, i.e. all time series for metric for both web and job node
node_disk_read_bytes_total{job=~"web|node"}

# rate vs irate functions
rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector.
breaks in monotonicity (such as counter resets due to target restarts)
rate should only be used with counters and native histograms

irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range vector
breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for
Calculates the rate only between the LAST TWO datapoints in an interval

# Range vector:a set of time series containing a range of data points over a specified time range for each time series
node_boot_time_seconds[5m]

# Instant vector: one or a set of time series containing a single sample for each time series, all sharing the same timestamp. Apply rate over range vector rate(range_vector)=>instant_vector
rate(container_cpu_usage_seconds_total[5m])

# agregation operators: aggregate the elements of instant vector, resulting a new instant vector with fewer elemnts
sum (calculate sum over dimensions)
count (count number of elements in the vector)

# by clause, allows to choose which labels to aggregate along i.e. group by (label) (metric)
group by (node,internal_ip) (kube_node_info)

# 99% quantile of the metric 
histogram_quantile(φ scalar, b instant-vector) calculates the φ-quantile (0 ≤ φ ≤ 1) from a classic histogram

PromQL query 101

  • Every metric has by default these 2 labels: job and instance
  • Quantiles are typically applied to latency metrics like http_request_duration_seconds. Quantiles don't make sense directly on metrics with total sufix e.g. http_requests_total because these are counters (total count). But still if you need to use _total then apply histogram_quantile to bucket histogram_quantile(0.99, http_requests_total_bucket)
  • Submetrics for histogram metric: __count [total number of observations], __bucket [number of observations for a specific bucket], __sum [sum of all observations]
  • Ignoring certain labels when matching, i.e. dividing 2 metrics, one of which has a label that the other one does not have, simply ignore class label of node_filesystem_size_bytes node_filesystem_avail_bytes / ignoring(class) node_filesystem_size_bytes

PromQL query:

  • Available Capacity per Node: CPU [cores]
sum by (node) (kube_node_status_capacity{resource="cpu",unit="core"}
sum by (node) (kube_node_status_capacity{resource="cpu",unit="core", instance=~".*:8080",service="prometheus-systematic-kube-state-metrics"})
count(node_cpu_seconds_total{mode="user"}) by (instance)
  • CPU usage percentage: Subtracts the idle percentage from 100
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle",instance=~".*:9100"}[5m])) * 100)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle",instance=~".*:9100",service="prometheus-systematic-prometheus-node-exporter"}[5m]) * 100) * on(instance) group_left(nodename) node_uname_info) 
  • Node load: a measure of CPU utilization which averages running process at any one moment
#  calculate saturation:  1-minute load average per CPU core for each instance.
node_load1 / on(instance) group_left() count(node_cpu_seconds_total{mode="user"}) by (instance)
  • Gauge/stat for no of pods in each namespace:
sum(kube_pod_info) by (namespace)
sum by (namespace) (kube_pod_status_phase{phase="Failed"}) 
  • Average Duration: average request duration (in seconds) over the past 5 minutes
rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])
  • Containers without Memory/CPU limits per namespace:
# without CPU limts
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="CPU"}))

# without Memory limits
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))
  • Containers whose CPU is are close to limits:
(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"})) > 0.8
  • Containers whose Memory usage is close to limits:
(sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"})) > 0.8
  • Top 10 containers without limits using CPU:
topk(10,sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))
  • Top 10 containers without limits using Memory:
topk(10,sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))
  • Memory requests and limits:
# pod == label_values(pod)
kube_pod_container_resource_requests{pod="$pod", resource="memory"}
kube_pod_container_resource_limits{pod="$pod", resource="memory"}
  • Memory usage:
# pod == label_values(pod)
container_memory_working_set_bytes{name!~"POD",pod="$pod"}
@dejanu
Copy link
Author

dejanu commented Nov 27, 2024

Avoid scenarios where a particular node is overloaded while another node in the cluster is underutilized.
kubectl top shows usage, not allocation. Allocation is what causes the insufficient CPU problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment