This guide provides a step-by-step process to diagnose and resolve high memory issues causing NextGen Gateway pods to crash in a Kubernetes environment. It includes commands to check the pod status, identify memory-related issues, and implement solutions to stabilize the pod.
Verify the Memory usage if Pod Crashes due to Memory Issue
To verify the memory usage in Kubernetes pods, make sure that you have enabled the metrics server in the Kubernetes cluster. Kubectl top command can be used to retrieve snapshots of resource utilization of pods or nodes in your Kubernetes cluster.
Use the below command to verify POD memory usage.
Use the below command to verify Node memory usage.
NextGen Gateway pod Crashed due to High Memory Usage
The NextGen Gateway pod in a Kubernetes cluster crashes due to high memory usage.
Possible Causes
When a pod exceeds its allocated memory, the Kubernetes system automatically kills the process to protect the node’s stability, resulting in an “OOMKilled” (Out of Memory Killed) error. This is particularly critical for the NextGen Gateway, as it may affect the stability and monitoring capabilities of the OpsRamp platform.
Troubleshooting Steps
Follow these steps to diagnose and fix memory issues for the NextGen Gateway pod:
- Check the status of Kubernetes objects to determine if pods are running or not.
- Use the following command to gather detailed information about the pod. This will include the status, restart count, and the reason for any previous restarts.
- Look for memory-related termination reasons in the pod’s event logs.
Sample output of logs: - Confirm memory issue by Exit Code.
- If the exit code is 137, then the pod is crashing due to memory issue.
- Fix the memory issue:
- Decrease the load on NextGen Gateway by limiting the number of metrics.
- Adjust the memory limits for the NextGen Gateway accordingly.