Meet Murre. Murre is an on-demand, scaleable source of container resource metrics for Kubernetes. Murre fetches CPU & memory resource metrics directly from the kubelet on each K8s Node and enriches the resources with the relevant K8s requests and limits from each PodSpec.

Bird

Minimalism. Yeah, big word I know, but bear with me here. Basically, it is all about owning only what adds value and meaning to your life, and removing the rest. It’s about removing the clutter and using your time and energy for the things that remain, since we only have a certain amount of energy, time, and space in our lives.

The best software developers are minimalists in heart. Minimalism does not mean writing less code, but writing code that counts with elegant and tight structure, and does one thing well. Minimalism, in this sense, means designing systems that use the least hardware and software resources possible – because you can. Because that’s how you believe things should be.

Like many developers we implement this approach to every thing we design and build.

Our journey to building Murre starts when we wanted to expose K8s node resources and usage metrics for our customers. As a company building a comprehensive Kuberenetes application monitoring solution the need was clear. We wanted to monitor and export the resources that are part of the cluster’s infrastructure layer – CPU, memory, and disk usage on each node aggregated by each container.

Sounds simple right? Thing is, as a monitoring platform embedded in the heart of the Kubernetes cluster, we always bear the token of minimalism. We aim to be as lightweight as possible wherever we can – which also means we prefer to not install any 3rd party tools on the cluster, if we don’t absolutely have to.

So we’ve set course to get our hands on the K8s node metrics, BUT without having to install the famous Metrics Server.

The traditional way to K8s metrics monitoring

Very often we see customers struggling with monitoring the physical layer that has been abstracted by K8s (you know, the same old CPU, Memory, Disk, etc). This becomes especially important when you’ve got a noisy neighbors effect which means your pod might be using the expected resource allocation but still suffer from missing resources.

The Kubernetes ecosystem includes a few complementary add-ons for aggregating and exposing monitoring data from your Kubernetes cluster. The Metrics Server is one of these useful add-ons.

The Kubernetes Metrics Server is a cluster-wide aggregator of resource usage data. It collects metrics like CPU or memory consumption for containers or nodes, from the Summary API, exposed by Kubelet on each node. However, it is an add-on which means it’s not part of the out-of-the-box K8s and not deployed by default in standard managed Kubernetes platforms.

The Metrics Server is intended to be scalable and resource-efficient, requesting about 100m core CPU and 200 MiB of memory for a typical cluster of 100 nodes. It allows storing only near-real-time metrics in memory, supporting ad-hoc checks of CPU or memory usage, or for periodic querying by a monitoring service that retains data over longer timespans.

Though designed to scale well with your K8s cluster, it doesn’t come free of issues. Installing a 3rd party service running in your cluster means you have to maintain and troubleshoot it from time to time. It can crash, take up more resources than you intended, or fail to do its job. The Metrics Server known issues page is just a glimpse into what it means to take responsibility for another ever-running service in your already busy cluster.

And there’s one more point that  concerned us. At groundcover we’ve built a distributed observability solution that can scale with the cluster and can take decisions at the edge. For example – we want to sample the HTTP span that caused a CPU spike in a relevant pod.

Going the Metrics Server route would (beyond the need to install and run it) also mean each of our edge units (running on each node) would have to query the central Metrics Server regularly to create such a trigger. That would force our distributed architecture to rely on a centralized point and would make things hard to scale.

Taking it to ultra-light

groundcover is all about offering a lightweight and frictionless approach to Kubernetes application monitoring, so we started looking for other ways to get the job done.

First, we started by measuring resources ourselves by operating Linux OS tools, like topps, etc directly on each node in the cluster. That solved one piece of the puzzle of enabling resource monitoring that didn’t require any prior installations or maintenance. However, it did introduce an efficiency issue. The outputs of tools like top and ps required parsing. They were also Linux and not K8s tools, so it required a second layer that made sense of process resources and turned into the containers resources our customers know and understand.

But, like any good answer to a problem, ours has been staring at us all along.

The Metrics Server is built in a centralized fashion, one per cluster. So how does it get all the metrics it needs from all the different nodes? A quick look at the code reveals the clear answer – by simply querying the Kubelet running on each node. Suddenly it all made sense, since K8s itself must also measure node resources as well for the Kubelet workflow to operate!

Kubelet who?

Kubelet is a process that runs on each node of a Kubernetes cluster and creates, destroys, or updates pods and their containers for the given node when instructed to do so. Basically, the Kubelet is the primary “node agent” that runs on each node and it works using PodSpecs (a YAML or JSON object that describes a pod). It is responsible for taking PodSpecs that are provided through various mechanisms (primarily through the apiserver) and ensures that the containers described in those PodSpecs are running and healthy.

In Kubernetes, scheduling, preemption and eviction are an important part of the cluster’s life. Scheduling refers to making sure that pods are matched to nodes so that the Kubelet can run them. Preemption is the process of terminating pods with lower priority so that pods with higher priority can schedule on nodes and eviction is the process of terminating one or more pods on nodes.

The Kubelet plays a critical role in these major scenarios. For example, a scenario called node-pressure eviction. Kubernetes constantly checks node resources, like disk pressure, CPU or Out of Memory (OOM). In case a resource (like CPU or memory) consumption in the node reaches a certain threshold, Kubelet will start evicting Pods in order to free up the resource.

This is exactly why the Kubelet must constantly use Kubernetes resource metrics to do its job, and expose these metrics to other services that might need them as well.

Great. So if the Kubelet is already exposing this API to the Metrics Server, it means we can use this API ourselves.

That’s two birds with one stone! We can use this API and query it directly without deploying a Metrics Server on the cluster, but we can also query it inside each node – without having to leave the node.
Oh, but it’s not documented anywhere…So we had to dig deeper.

A deep dive into the Kubelet source-code‍

We started to study the Kubelet sources to figure out how it attains the K8s resource metrics and how it exposes this data as APIs down the stream. What we found is that not only is Kubelet measuring the resource usage on each cluster node, but it also exposes the data using Prometheus formatted metrics – which we love!

The Kubelet API is not documented, but from its sources we found the endpoints. There are more endpoints that are not used for metrics / stats, but these are out of scope for this research. There’s definitely more gold there to be uncovered, but we’re going to focus on K8s metrics for now.

You can find some of the metrics definitions here pkg/kubelet/server/server.go:

const (
metricsPath = "/metrics"
cadvisorMetricsPath = "/metrics/cadvisor"
resourceMetricsPath = "/metrics/resource"
statsPath = "/stats/" // base path for /stats/summary
)

In order to reach the Kubelet API so we can see the data:

# fetch metrics/cadvisor of , kubectl will take care of auth
# you can get all your node names using kubectl get nodes 
kubectl get --raw /api/v1/nodes//proxy/metrics/cadvisor

Some of Kubelet’s endpoints return Jsons and some export actual metrics. Here’s a few worth noticing:

• /metrics/cadvisor – The endpoint is metrics originating from cadvisor and it provides all the metrics on container resource consumption like:
– CPU
– Memory
– File System
– Network

 • /metrics/resource – This endpoint also lists container resources (CPU, Memory) like the cadvisor endpoint, but also gives us pod level and node level resources.

 • /stats/summary – This endpoint provides an aggregated resource consumption data in JSON format. By default it will describe all resources (CPU, Memory, File System, and Network like the cadvisor endpoint), but you can pass a flag of only_cpu_and_memory=true as a request param and get only CPU and Memory data if that’s what you’re after.

Here’s how the response structure of the endpoint looks:

type Summary struct {
	// Overall node stats.
	Node NodeStats `json:"node"`
	// Per-pod stats.
	Pods []PodStats `json:"pods"`
}

// NodeStats holds node-level unprocessed sample stats.
type NodeStats struct {
	// Reference to the measured Node.
	NodeName string `json:"nodeName"`
	// Stats of system daemons tracked as raw containers.
	// The system containers are named according to the SystemContainer* constants.
	SystemContainers []ContainerStats `json:"systemContainers,omitempty" patchStrategy:"merge" patchMergeKey:"name"`
	// The time at which data collection for the node-scoped (i.e. aggregate) stats was (re)started.
	StartTime metav1.Time `json:"startTime"`
	// Stats pertaining to CPU resources.
	CPU *CPUStats `json:"cpu,omitempty"`
	// Stats pertaining to memory (RAM) resources.
	Memory *MemoryStats `json:"memory,omitempty"`
	// Stats pertaining to network resources.
	Network *NetworkStats `json:"network,omitempty"`
	// Stats pertaining to total usage of filesystem resources on the rootfs used by node k8s components.
	// NodeFs.Used is the total bytes used on the filesystem.
	Fs *FsStats `json:"fs,omitempty"`
	// Stats about the underlying container runtime.
	Runtime *RuntimeStats `json:"runtime,omitempty"`
	// Stats about the rlimit of system.
	Rlimit *RlimitStats `json:"rlimit,omitempty"`
}

// PodStats holds pod-level unprocessed sample stats.
type PodStats struct {
	// Reference to the measured Pod.
	PodRef PodReference `json:"podRef"`
	// The time at which data collection for the pod-scoped (e.g. network) stats was (re)started.
	StartTime metav1.Time `json:"startTime"`
	// Stats of containers in the measured pod.
	Containers []ContainerStats `json:"containers" patchStrategy:"merge" patchMergeKey:"name"`
	// Stats pertaining to CPU resources consumed by pod cgroup (which includes all containers' resource usage and pod overhead).
	CPU *CPUStats `json:"cpu,omitempty"`
	// Stats pertaining to memory (RAM) resources consumed by pod cgroup (which includes all co

 • /pods – This endpoint provides information about pods running on the node, with full pod specs and status. This data can be also fetched using the K8s client podLister interface. While useful, this specific data is irrelevant to our cause of resource metrics monitoring.

 • /metrics – This endpoint exposes metrics related to Kubelet’s own internal statistics. It’s good to know it’s out there but like before it will not be used for our needs.

And then there was Murre

The Common Murre is an interesting bird. It breeds in high density colonies, and make no nest; Their single egg is incubated on a bare rock ledge on a cliff. Minimalist? They take their nesting skills to the extreme. No wonder we liked that name 🙂

After we figured out how to get those metrics without storing them, we thought – we can help solving thisthis, and many more with the same pattern; introducing Murre.

Murre is an OSS tool that helps basically you get your container’s CPU & memory metrics without the need to install anything on the cluster. It works the same way as metrics-server just without storing the metrics. With Murre you can filter a specific namespace, pod or even container name to focus on the exact metrics you want.

Murre utilize 3 different APIs to get the data needed to craft these metrics:

  1. NodeList – in order to discover and maintain the list of all available nodes.
  2. /metrics/cadvisor – to get the actual usage metrics. This API is used for each and every Node as we described earlier.
  3. PodList– to enrich the data with K8s resource requests and limits for each and every container.

Here’s how Murre’s processing flow looks:

enrich data and display to user

And voilà!

Murre is a useful CLI tool that any team using Kubernetes can leverage.

Feel free to think big, open issues or add capabilities yourself. The sky is (literally) the limit for this special bird.

Questions about Murre? Join our community Slack and get in touch with the creators!

 

 

Guest post originally published on groundcover’s blog by Yechezkel Rabinovich, Co-Founder and CTO at groundcover
Source CNCF

Previous From The NFL To Google’s Data Centers: Why KP Philpot Still Values Teamwork Over Everything
Next Chrome Keeps Getting Better For Mac And iOS Devices