The Basic Principles of K8S GC Mechanism(En)
Recently, I’ve been spending time understanding how Kubernetes garbage collection and cleanup mechanisms work, for various reasons. While we typically don’t need to make any special configurations, sometimes, whether for debugging or understanding resource reclamation, it’s useful to have a basic understanding of several common resource types. In this article, I’ve integrated some knowledge from official doc and other articles to address the aspects I wanted to know about and share them.
This article will explain the following:
- Basic Introduction
- Unused Containers and Images
- Terminated Pods
- Completed Jobs
- Declared for Deletion PV
- Conclusion
1. Basic Introduction
In Kubernetes, there’s a handy feature called Garbage Collection (GC) that helps keep things tidy by getting rid of unused stuff. It’s like doing a cleanup to free up resources. Additionally, there’s Pod eviction, which kicks out pods when they’re using up too much space until things are back in balance.
Simply put, these tools ensure that Kubernetes stays healthy and efficient by removing anything that’s not in use, so resources like memory and disk space aren’t taken up by unnecessary stuff.
2. Unused Containers and Images
Kubelet regularly checks and removes any images not associated with pod, and it also reclaims disk space by removing unused containers from nodes. It inspects images every 2 minutes and containers every 1 minute (interval can be adjustable).
It’s advised to avoid using any external garbage collection tools as they may disrupt kubelet behavior and inadvertently remove containers that should be present.
If you want to tune this mechanism, you’ll need to adjust parameters in the Kubelet’s configuration file (KubeletConfiguration), such as:
(1) For containers:
- MaxPerPodContainer : The maximum number of dead containers that can be retained per Pod. Default is 0 (clears as soon as one dead container appears).
- MaxContainers : The maximum number of dead containers allowed on a node. Default is 0.
- minAge : The minimum age at which a container can be recycled. Default is 0 (recycled as soon as it’s dead).
(2) For images:
- HighThresholdPercent : Recycling begins when disk usage exceeds this value. Default is 85%.
- LowThresholdPercent : Recycling begins when disk usage falls below this value. Default is 80%.
※ The container recycling process can be divided into three parts: first, cleaning up application service containers, then cleaning up sandbox containers, and finally cleaning up the logs corresponding to the Pods. Below are the operational processes for these three stages:
Step 1. Clean up evictable containers
Step 2. Clean up evictable sandboxes
Step 3. Clean up the log directories of all pods
※ The process for image recycling is as follows:
Step1. Obtain disk status through kubelet.
Step2. Calculate the usage percentage (total space, available space) based on the obtained disk status data.
Step3. Determine if it exceeds the set threshold ( — image-gc-high-threshold int32 Default: 85) based on the percentage.
Step4. Calculate the space needed to be cleaned based on the set threshold and current usage rate ( — image-gc-low-threshold int32 Default: 80).
Step5. Execute the recycling action to reclaim the space needed in step 4. If unable to clean, write to error log.
3. Terminated Pods
PodGC is implemented to delete terminated pods (Succeeded/Failed) when the number of pods exceeds a certain threshold. Additionally, the following conditions trigger the PodGC:
- Pods bound to non-existent nodes, becoming orphaned pods.
- Pods that are unexpectedly terminated.
- Pods bound to unscheduled nodes that are in the process of termination.
4. Completed Jobs
The TTL-after-finished controller mechanism is used to clean up completed jobs. Whenever a job enters the Completed or Failed state, a countdown timer is initiated. If the timer expires, the cleanup task is automatically triggered.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100 <<<
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
5. Declared for Deletion PV
For PVs, Kubernetes has a protection mechanism in place to prevent data corruption of storage volumes that are in use. When enabled, if a user deletes a PVC that is still being used by a pod, Kubernetes will not immediately delete it. It will wait until there are no pods using this resource before initiating the deletion process. At this point, the PVC status is “Termination”. Internally, the PVC will have the annotation “kubernetes.io/pvc-protection”.
In practice, whether the PV bound to a PVC is also deleted after the PVC is deleted via the API depends on the reclaim policy set for the PV.
Generally, the lifecycle of PVCs and PVs is as follows:
Step1. Provisioning: Responsible for provisioning persistent storage, divided into two types: Static and Dynamic (storage class).
Step2. Binding: Users create PVCs to bind to PVs.
Step3. Using: Users can store data within Pods.
Step4. Releasing: Users delete the PVC, at which point its status becomes “Released”. The actual deletion of the PV depends on the defined reclaim policy.
Step5. Recycling: Actions based on policies (as follows):
- Retain: When a PVC is deleted, the PV status becomes “Released”, but the PV cannot be directly used by other Pods because the previous data remains. If reuse is needed, a new PV can be generated based on the original definition.
- Delete: Deleting the PVC also deletes the PV, along with any externally stored data.
- Recycle: Deprecated in favor of dynamic provisioning; no longer in use.
6. Conclusion
K8s is a platform that brings together various resources. Different objects in operation need cleaning mechanisms to maintain system stability and reliability. Having a good cleaning mechanism is like optimizing the platform. Understanding common resource recycling mechanisms helps understand what events have occurred to objects.
In practice, achieving effective recycling parameters requires long-term monitoring and observing application services and cluster behavior. In uncertain situations, it’s best to start with default values and adjust based on observations.
※ References: