API server request timed out
Production Risk
Operators and controllers cannot read or write cluster state; reconciliation loops fail.
A 504 Gateway Timeout from the Kubernetes API server means the request took too long to complete and was terminated server-side. This can occur during heavy LIST operations on large datasets, etcd slowness, or when API aggregation layer backends (like metrics-server) are unresponsive. It is distinct from client-side timeouts.
- 1etcd cluster is slow or under high write/read load
- 2LIST operation on a large resource collection (many pods, events) exceeds the server timeout
- 3Aggregated API server (e.g., metrics-server) is unresponsive
- 4API server is under CPU or memory pressure
kubectl or API calls intermittently return 504; more frequent during etcd maintenance or large namespace operations.
kubectl get pods --all-namespaces # Error from server: etcdserver: request timed out kubectl get apiservices | grep -v True
expected output
Error from server: etcdserver: request timed out
Fix 1
Check etcd health
WHEN 504s are frequent and cluster-wide
# If using etcdctl ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key endpoint health
Why this works
etcd health check identifies whether the backing store is the source of API timeouts.
Fix 2
Use field selectors and label selectors to reduce response size
WHEN Large LIST requests are timing out
# Reduce the scope of LIST calls kubectl get pods -n specific-namespace --field-selector status.phase=Running kubectl get pods -l app=myapp
Why this works
Server-side filtering reduces the amount of data etcd and the API server must process.
Kubernetes Documentation
Content generated with AI assistance and reviewed for accuracy. Found an error? hello@errcodes.dev