Skip to content

Fixing a Kubelet Memory Leak in Kubernetes 1.36

A subtle Go context lifecycle bug in the kubelet can quietly consume node memory and bypass standard pod limits.

Emeka Okafor
Emeka Okafor
Security Editor · Jul 1, 2026 · 5 min read
Fixing a Kubelet Memory Leak in Kubernetes 1.36

When a Kubernetes node starts misbehaving, the standard playbook is predictable. You check kubectl top pods, look for containers terminated with an OOMKilled status (exit code 137), and scan for application-level memory leaks. But when a node running Kubernetes 1.36 begins throwing MemoryPressure alerts and evicting healthy workloads while application metrics remain flat, the threat model shifts. The leak is not coming from your workloads. It is coming from the orchestrator itself.

A subtle Go context lifecycle bug inside the kubelet in Kubernetes 1.36 can quietly consume node memory. This leak is particularly dangerous because it bypasses standard pod-level resource limits. It manifests as a slow, relentless climb in the kubelet's resident set size (RSS), eventually destabilizing the entire node. For platform engineers running high-churn environments or tight node allocations, understanding how to diagnose and mitigate this system-level leak is critical.

The Symptoms: Why Standard Metrics Fail

In a typical memory leak scenario, a rogue application container slowly eats up memory until it hits its limit and the kernel OOM killer steps in. But when the kubelet leaks memory, the failure mode is different.

Because the kubelet runs as a system daemon outside the container runtime's cgroups, its memory growth does not show up in kubectl top pods. The applications running on the node appear perfectly healthy, operating well within their memory limits. However, as the kubelet process expands, the node's available memory drops.

Once available memory falls below the eviction threshold, the kubelet declares a MemoryPressure node condition. At this point, the control plane taints the node with node.kubernetes.io/memory-pressure, preventing new BestEffort pods from being scheduled. If the pressure persists, the kubelet begins evicting existing pods to protect node stability.

If you suspect a system-level leak, you have to bypass the Kubernetes API and look at the node directly. Running htop or top on the node and sorting by memory usage will reveal the kubelet process itself steadily climbing up the memory rankings. While a quick systemctl restart kubelet temporarily mitigates the issue by resetting the heap, the leak will return unless the root cause is addressed.

Dumping the Heap: The pprof Methodology

To diagnose a system-level leak in a Go-based daemon like the kubelet, you need to inspect its heap memory. The Kubernetes API server exposes the kubelet's Go pprof endpoints, allowing you to capture a heap profile remotely without installing debugging tools on the host node.

You can pull the raw heap profile directly using kubectl:

kubectl get --raw "/api/v1/nodes/${NODE}/proxy/debug/pprof/heap?debug=0" > "kubelet_pprof_heap.pb.gz"

Once you have the profile, you can use the standard Go toolchain to analyze it. Looking at the in-use object count is often the fastest way to spot a leak, as leaks typically involve the accumulation of thousands of tiny, un-garbage-collected objects.

Running go tool pprof sorted by object count reveals a striking pattern:

go tool pprof -top -sample_index=inuse_objects kubelet_pprof_heap.pb.gz

In an affected Kubernetes 1.36 node, the output shows an astronomical number of active contexts:

flat  flat%   sum%    cum   cum%
642456 45.52% 45.52% 918672 65.09% context.(*cancelCtx).propagateCancel
380137 26.93% 72.45% 380195 26.94% context.withCancel (inline)
276216 19.57% 92.02% 276216 19.57% context.(*cancelCtx).Done
     0     0% 97.00% 380195 26.94% k8s.io/kubernetes/pkg/kubelet.(*podWorkers).startPodSync
     0     0% 97.00% 907283 64.28% k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount

Analyzing the same profile by total heap memory usage confirms that these contexts are not just numerous, they are consuming the majority of the kubelet's memory:

go tool pprof -top -sample_index=inuse_space kubelet_pprof_heap.pb.gz
flat  flat%   sum%    cum   cum%
86.33MB 51.28% 51.28% 115.83MB 68.80% context.(*cancelCtx).propagateCancel
29.50MB 17.52% 68.80%  29.50MB 17.52% context.(*cancelCtx).Done
29.00MB 17.23% 86.03%  30.54MB 18.14% context.withCancel (inline)
     0     0% 91.48% 103.02MB 61.19% k8s.io/kubernetes/pkg/kubelet/volumemanager.(*volumeManager).WaitForAttachAndMount

Seeing nearly a million active contexts taking up over 115 MB of heap on a small node is a clear indicator of a context leak. The stack trace points directly to startPodSync and the volume manager's WaitForAttachAndMount loop.

The Root Cause: Context Lifecycles in Go

To understand why this leak occurs, we have to look at how Go manages context lifecycles.

In Go, contexts form a tree structure. When you create a cancelable child context using context.WithCancel, context.WithTimeout, or context.WithDeadline, the parent context stores a reference to the child. This registration allows a cancellation signal to propagate down the tree.

However, this parent-to-child reference remains active until either the child context is canceled or the parent context itself is canceled. If you spawn a child context and fail to call its associated cancel function, the child context cannot be garbage collected, even if it goes out of scope in your code. It remains pinned in memory by the parent context.

In the Kubernetes 1.36 kubelet, the pod worker loop (startPodSync) regularly spawns contexts to manage the lifecycle of pod synchronization operations, including volume mounting via WaitForAttachAndMount. A minor logic error in this loop allowed child contexts to be abandoned without their cancel functions being invoked. Because the parent context of these sync operations is long-lived, every single pod synchronization iteration leaked a context.

Over days or weeks, especially on nodes with high pod churn or frequent sync iterations, these leaked contexts pile up. This is distinct from other historical kubelet leaks, such as the container garbage collection leak where references to deleted containers were retained in pkg/kubelet/container/container_gc.go. In this case, the leak is purely a Go runtime reference retention issue caused by un-canceled contexts.

The Developer Angle: Mitigation and Prevention

For platform teams running Kubernetes 1.36, this leak represents a real operational risk, especially on smaller nodes or in environments with high container turnover.

Short-Term Mitigations

If you are stuck on an unpatched version of 1.36, you can manage the leak using a few operational workarounds:

  1. Automated Kubelet Restarts: Since the kubelet is stateless with respect to running containers, restarting it does not interrupt active pods. You can set up a cron job or a systemd timer to restart the kubelet periodically on a rolling basis:
    systemctl restart kubelet
    
  2. Proactive Monitoring: Set up alerts in Prometheus to track the resident set size of the kubelet process. A steady, linear increase in kubelet memory usage over time, independent of pod count, is a strong signal to trigger a rolling restart.
    container_memory_working_set_bytes{container="kubelet"}
    

Writing Leak-Free Go Contexts

For developers building controllers, operators, or custom Kubernetes components, this bug serves as a reminder of Go context best practices. The golden rule of Go contexts is that every call to WithCancel, WithTimeout, or WithDeadline must be paired with a call to its cancel function, typically via a defer statement:

func processItem(ctx context.Context) {
    // Always capture the cancel function
    childCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel() // Ensures resources are released when processItem returns

    doWork(childCtx)
}

Even if the timeout expires naturally, calling cancel() is still necessary to clean up the internal parent-child relationship immediately rather than waiting for the garbage collector to eventually process the parent context. In high-frequency loops, failing to use defer cancel() is an easy way to introduce a slow, hard-to-debug memory leak that can bring down an entire node.

Sources & further reading

  1. Fixing a kubelet memory leak in Kubernetes 1.36 — heyoncall.com
  2. Kubernetes Memory Leaks: Detection, Impact & Fixes — groundcover.com
  3. How to Diagnose Kubernetes Container Memory Leaks Using Memory Profiling Tools — oneuptime.com
  4. Troubleshooting memory-related issues in Kubernetes | Coroot — coroot.com
  5. Memory leak in container garbage collection · Issue #131905 · kubernetes/kubernetes — github.com
Emeka Okafor
Written by
Emeka Okafor · Security Editor

Emeka has spent over a decade tracking threat actors, vulnerability disclosures, and the evolving landscape of application security, bringing a sharp continent-spanning perspective to his reporting. He's known for translating dense CVE advisories into clear, actionable context that developers and security teams alike actually read.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading