Not enough resources? How to manage CPU and RAM!

6 min readOct 15, 2024

When our apps or systems are running, we often get alerts about CPU or RAM running low. Besides just adding more resources, are there any other ways to squeeze out a bit more efficiency?

https://unsplash.com/photos/caution-runners-printed-signage-on-road-c7VU_zoSEzU

This article will focus on two critical resources when running containers: CPU and RAM. We’ll provide some explanations to help you better understand how to configure or fine-tune these resources:

Overview
CPU Recommendations
RAM Recommendations
KRR
How to Limit CPU Resources in Kubernetes
Tips for Removing Limits
Conclusion

1. Overview

Containers primarily use two key resources: CPU and Memory. For these resources, there are two advanced properties you can configure:

Request: The guaranteed amount of resources that the container can use. If a node cannot meet the request, the container will not be scheduled on that node.
Limit: The maximum amount of resources the container is allowed to use.

CPU is measured in millicores, where 1 millicore equals 1/1000 of a CPU.

250m = 250/1000 = 1/4 (equivalent to using one-quarter of a CPU core)

※ Difference Between CPU and RAM Resources

CPU is a compressible: This means its demand can be dynamically adjusted. Even if the demand exceeds the maximum limit, tasks can still be executed through a queuing mechanism. Since it is relatively flexible, in theory, there shouldn’t be strict Request/Limit values for CPU.
Memory is a non-compressible: If a process exceeds the allocated memory specified by the Request, Kubernetes will terminate the Pod (resulting in an OOMKill).

The kube-scheduler evaluates each Pod’s resource requirements and checks if a node can meet those needs. The factors considered include CPU, memory, storage, and network bandwidth. Only nodes that can fulfill the Pod’s requirements are eligible for scheduling.

In many cases, setting a Limit doesn’t always allow full utilization of all available resources, as the Limit acts as a hard cap, preventing the Pod from consuming more than the specified resources.

2. CPU Recommendations

Key points:

Avoid using CPU limits to restrict usage.
Focus on setting CPU requests for resource planning.
Consider using KRR (Kubernetes Resource Recommender) to optimize resource utilization.

Example:

When setting a Pod with a CPU Limit of 2000m and a Request of 1000m, even if the node’s overall usage is below 50%, the Pod will self-limit and avoid using more resources due to the hard CPU limit. This can lead to inefficiencies, causing CPU contention and performance issues despite idle resources being available.

https://unsplash.com/photos/a-yellow-sign-on-the-ground-mk9EoiPY5gs

In most cases, setting CPU limits for Pods in Kubernetes does not provide significant benefits. In fact, many CPU resource issues in Kubernetes arise due to the use of limits.

3. RAM Recommendations

Key points regarding RAM:

Always set a Memory Limit.
Always set a Memory Request.
Always consider setting Memory Request equal to the Limit.

Since RAM is a fixed resource, these practices ensure that your Pods use memory efficiently and avoid unexpected out-of-memory (OOM) issues, ensuring stable and predictable performance.

4. KRR

※ What is KRR?

KRR (Kubernetes Resource Recommender) is a tool that helps optimize resource usage in Kubernetes. It gathers data from Prometheus and suggests the best CPU and memory Requests and Limits for your Pods, making sure you’re not wasting resources.

https://pixabay.com/photos/triangle-quality-time-cost-3125882/

※ Key Features of KRR:

Agentless: No need to install additional agents.
Integrated with Prometheus: Uses built-in Prometheus for data collection.
Scalable: Easily scales with your cluster.
Customizable: Supports custom resources and metrics.

※Resource Utilization:

Based on current usage in a Kubernetes cluster, the resource utilization looks like this:

Average unused CPU: 69%
Average unused Memory: 18%
Containers without CPU limits: 58%
Containers without Memory limits: 49%

This means that by optimizing with KRR, there’s potential to save up to 69% of CPU resources.

※ How KRR Works:

KRR collects data using the following two PromQL queries:

CPU Usage:


 ※ PromQL
 sum(irate(container_cpu_usage_seconds_total{namespace=”{object.namespace}”, pod=”{pod}”, container=”{object.container}”}[{step}]))

Memory Usage:

※ PromQL
sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", image!="" , namespace="{object.namespace}", pod="{pod}", container="{object.container}"})

These queries help gather real-time CPU and memory usage data for pods, which is then used to recommend optimized resource settings.

※ Default Recommendation Algorithm:

CPU: In 99% of cases, the CPU Request is sufficient, while the remaining 1% of Pods do not have a Limit set. This allows Pods to burst and use any available CPU on the node, including CPU requested by other Pods that haven’t used it yet.
Memory: The recommendation is based on the maximum memory usage observed over the past week, with an additional 5% buffer added.

You can refer to the following official link for more details:

https://github.com/robusta-dev/krr

This will provide all the necessary information and documentation related to KRR, including setup, usage, and advanced configurations.

5. How to Limit CPU Resources in Kubernetes

Kubernetes uses CFS Quota to manage CPU limits for Pods running applications.

CFS (Completely Fair Scheduler) addresses CPU allocation based on time and utilizes the following two files:

cpu.cfs_quota_us: This defines the total available run time for a Pod within a specific period (in microseconds).
cpu.cfs_period_us: This specifies the length of one period (in microseconds).

Together, these settings help control how much CPU time a Pod can use over a defined interval.

# cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_period_us  
100000
=> 100000 us = 100 ms

# cat /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
-1
=> no limit

※ Example

When an application runs on a CPU with restrictions and requires 300 ms of processing time, the theoretical timeline for the entire process is illustrated in the diagram below:

In this scenario, if we set a 0.5 CPU limit for the Pod, it means the application gets 50 ms of runtime for every 100 ms period. As a result, a request that originally takes 300 ms would now take 550 ms to complete due to the limited CPU allocation.

In the example above, setting a CPU limit actually slows things down. Instead of taking 300 ms, the task now takes 550 ms because of the limit.

6. Tips for Removing Limits

By default, the Cluster’s CPU Limit equals the node’s maximum available value. It’s recommended to remove the CPU Limit for latency-sensitive Pods, but not for all Pods, as doing so could lead to performance instability in the Cluster.

7. Conclusion

This article focuses more on CPU because it’s easier to tweak than memory. Usually, the CPU assigned to each Pod doesn’t need to be fully used — it spikes and then drops. But setting strict CPU limits can actually stop Kubernetes from managing resources well.

Aside from using tools like Grafana and Prometheus for monitoring, we also talked about KRR, which can help give resource management advice from a different angle.