Alert Context
critical

PodCrashLooping

Hash: a1b2c3d4
Kubernetes Labels
alertnamePodCrashLooping
namespaceprod-payment
podpayment-service-deployment-7f8c9b-x2z
severitycritical
clusterk8s-eu-west-1
Annotations
SummaryPod is crashing continuously
DescriptionPod payment-service-deployment-7f8c9b-x2z in namespace prod-payment has restarted 15 times in the last 10 minutes.
Investigation Engine
RCA GENERATED

Root Cause Analysis

L'investigation technique démontre que le pod payment-service entre en CrashLoopBackOff suite à un dépassement de sa limite de mémoire (OOMKilled).

  1. Symptôme : AlertManager a remonté de multiples redémarrages (15 en 10 min).
  2. Preuve Kubernetes : Les événements du namespace indiquent Reason: OOMKilled sur le conteneur principal.
  3. Preuve Applicative (Logs) : Les logs précédents (previous=true) montrent une exception Java : java.lang.OutOfMemoryError: Java heap space lors du traitement d'un batch de facturation de taille inhabituelle.

Conclusion : La limite mémoire actuelle (512Mi) est insuffisante pour le job de facturation mensuel.

Standard Operating Procedure (SOP)

Procédure de Remédiation (SOP)

  1. Mitigation immédiate : Augmenter temporairement les resources.limits.memory à 1Gi via kubectl :

    bash
    kubectl set resources deployment payment-service-deployment -c payment-service --limits=memory=1Gi -n prod-payment
  2. Action long terme :

    • Mettre à jour le manifeste Helm (values.yaml) pour pérenniser cette limite.
    • Optimiser la pagination du traitement batch côté code (Chunking des factures).
Execution Trace
2 spans
k8s_get_events
Query Params
{
  "namespace": "prod-payment"
}
StdOut / Output
[ { "type": "Warning", "reason": "BackOff", "message": "Back-off restarting failed container", "object": "Pod/payment-service-deployment-7f8c9b-x2z" }, { "type": "Warning", "reason": "OOMKilling", "message": "Memory cgroup out of memory: Killed process 1 (java)", "object": "Pod/payment-service-deployment-7f8c9b-x2z" } ]
k8s_get_pod_logs
Query Params
{
  "name": "payment-service-deployment-7f8c9b-x2z",
  "namespace": "prod-payment",
  "previous": true
}
StdOut / Output
2026-05-12 21:45:10 [INFO] Starting billing batch processing... 2026-05-12 21:45:12 [INFO] Loading 50,000 invoices into memory 2026-05-12 21:45:14 [ERROR] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 2026-05-12 21:45:14 [ERROR] Unhandled exception caught, shutting down JVM.