Skip to main content
Background Image
  1. Posts/

Common Error Codes in Kubernetes Operations

··1261 words·6 mins· loading · loading ·
yuzjing
Author
yuzjing
Table of Contents

When troubleshooting K8s issues, there are three core commands:

  1. kubectl describe pod/node <name>: To check resource Events and identify the root cause.
  2. kubectl logs <pod-name>: To check application logs and resolve program issues.
  3. kubectl get <resource-type>: To check the status of resources.

Layer 1: Pod Status Codes
#

Status Code (Status)Core ReasonCore Troubleshooting Steps
PendingCannot be scheduled: The scheduler cannot find a suitable node.1. kubectl describe pod <name>, check Events to find the specific reason:
- Insufficient cpu/memory (Not enough resources).
- Taints/Tolerations (Mismatch between taints and tolerations).
- Affinity rules (Mismatch in affinity/anti-affinity rules).
- PVC not bound (PersistentVolumeClaim is not ready).
ImagePullBackOff / ErrImagePullImage pull failed: The Kubelet cannot pull the container image from the registry.1. kubectl describe pod <name>, check Events to find the specific reason:
- Incorrect image name or tag (Check the YAML).
- Private registry authentication failed (Check imagePullSecrets).
- Network issue (Log in to the node and test with docker/crictl pull).
CrashLoopBackOffContainer is crashing repeatedly: The container exits immediately after starting, and the Kubelet keeps restarting it.1. kubectl logs <pod-name> --previous (Check the logs of the previous crash, extremely important).
2. kubectl logs <pod-name> (Check the current logs).
3. Investigate application bugs, configuration errors, or out-of-memory issues based on the logs.
RunContainerErrorContainer runtime error: The configuration is correct, but the underlying container runtime (e.g., containerd) cannot start the container.1. kubectl describe pod <name>, Events will show RunContainerError.
2. SSH into the node and use journalctl -u containerd (or docker) to check the runtime logs for more low-level error messages.
CreateContainerConfigErrorContainer configuration error: There is an issue with the configuration required to create the container (e.g., a ConfigMap or Secret).1. kubectl describe pod <name>, Events will clearly state which resource is missing or has a format error.
Running (but Ready is 0/1)Readiness Probe failed: The Pod is running, but it is not ready to receive traffic.1. kubectl describe pod <name>, Events will record Readiness probe failed.
2. Check the ReadinessProbe configuration (initial delay, timeout) or see if a downstream service the application depends on is failing.
Terminating (Stuck)Pod cannot terminate properly: Usually due to a finalizer preventing its deletion, or a volume that cannot be unmounted.1. kubectl describe pod <name>, check Events for storage-related errors like FailedDetachVolume.
2. kubectl edit pod <name>, check the metadata.finalizers field; a finalizer added by a controller may not have been cleaned up.
UnknownStatus is unknown: Typically means the node controller cannot communicate with the Kubelet on the Pod’s node.1. This is almost equivalent to a node being NotReady. Immediately check the health of the Pod’s host node (see Layer 4).
Job Failed: BackoffLimitExceededJob retry limit exceeded: The Pods created by the Job failed, and after reaching the retry limit, the Job is marked as failed.1. kubectl get pods -l job-name=<job-name> to find the failed Pods created by the Job.
2. kubectl logs <failed-pod-name> to view the logs and identify the root cause of the task’s failure.

Layer 2: Container Exit Codes
#

Exit CodeMeaningCore Troubleshooting Steps
1General Application Error1. Check application logs: kubectl logs <pod-name> --previous.
126 / 127Command not executable / Command not found1. Check the Dockerfile (chmod +x) and the command path in your YAML.
137OOMKilled (Out of Memory)1. kubectl describe pod <name> to confirm Reason: OOMKilled.
2. Increase resources.limits.memory.
139Segmentation Fault (SIGSEGV): Code Bug.1. Notify the developers to debug the code.
143Graceful Termination (SIGTERM): Normal behavior.1. Occurs during Pod deletion or updates; no action needed.

Layer 3: Network Status Codes and Errors
#

Error/StatusCore ReasonCore Troubleshooting Steps
Endpoints are emptyThe Service Selector does not match any Pods.1. kubectl describe svc <name> to check the Selector.
2. kubectl get pods --show-labels to compare with the Pod’s Labels.
HTTP 502/503/504Ingress Gateway Error / Service Unavailable / Timeout.1. A comprehensive check of Endpoints and Pod health (CrashLoopBackOff, 0/1 Ready).
2. For 504: Check Pod logs and resource usage (kubectl top pod) to determine if the application is slow to respond.
HTTP 499Client Closed Request. A non-standard Nginx status code. Simply put, the backend service took too long to respond.1. Check backend service response time:
Use kubectl logs <ingress-controller-pod> to check logs and identify which endpoint (URL) frequently returns 499, and confirm if its request_time is too long.

2. Check client timeout settings:
Confirm if the client calling the service (browser, app, or another microservice) has set a very short request timeout.

3. Investigate application performance bottlenecks:
Analyze the code of the corresponding service for issues like slow database queries or slow calls to third-party services.
Connection refusedConnection was refused: The network path is clear, but no process is listening on the target Pod’s port.1. kubectl exec -it <pod-name> -- netstat -tulnp to confirm if the application is listening on the correct port.
2. Check the application’s startup logs for any port binding errors.
Connection timed outConnection timed out: Packets are being lost in the network, usually due to a NetworkPolicy or firewall issue.1. Check NetworkPolicies: kubectl get networkpolicy -A to confirm if a policy is blocking this traffic.
2. Check node security groups or the underlying network firewall.
No route to hostNo route to host: Typically an issue with the inter-node network (CNI).1. Check if the CNI plugin’s Pods (calico-node, flannel-ds, etc.) are running correctly on all nodes.

Layer 4: Node Status Codes
#

Status Code (Status)Core ReasonCore Troubleshooting Steps
NotReadyNode lost contact: Communication between the Kubelet and the API Server is interrupted.1. SSH into the node, and check kubelet, containerd, df -h, and free -m in order.
SchedulingDisabledScheduling is disabled: The node has been cordoned, and no new Pods will be scheduled on it.1. This is an administrative action, not a failure. Use kubectl uncordon <node-name> to resume scheduling.
MemoryPressureMemory Pressure: The available memory on the node is too low.1. The node may start evicting Pods. Log in to the node and use top to find the memory hogs.
DiskPressureDisk Pressure: The disk space on the node is insufficient.1. Log in to the node, use df -h to locate the partition, and clean up images, containers, and logs.
PIDPressurePID Pressure: The node has run out of Process IDs.1. Log in to the node and check for any process fork bombs or applications creating too many threads/processes.

Layer 5: Storage Status Codes
#

Status Code / EventCore ReasonCore Troubleshooting Steps
PVC: PendingThe PVC cannot bind to a PV.1. kubectl describe pvc <name>, check Events to see if it’s a PV mismatch or a StorageClass issue.
Pod Event: FailedMountVolume mount failed.1. kubectl describe pod <name>, Events will provide detailed reasons, such as NFS permissions or cloud disk status.
Pod Event: FailedDetachVolumeVolume detach failed: Usually, the underlying storage (e.g., a cloud disk) is busy or has an issue.1. This issue will cause a Pod to get stuck in the Terminating state.
2. Check the CSI plugin logs or the cloud provider’s console to see the status of the volume.
App Log: Read-only file systemThe file system is read-only: The Pod encounters an error when writing to a PV.1. kubectl exec -it <pod-name> -- mount to view mount information and confirm if the mount option is ro (read-only).
2. The storage backend itself may have encountered a problem and entered a read-only protective mode.