Why Kubernetes Logging Is Different
Kubernetes generates a volume and variety of log data that overwhelms teams who approach it with traditional logging strategies. An enterprise cluster running a hundred services produces millions of log lines per hour. The question is not how to collect everything. It is how to collect the right things.
The cost of comprehensive Kubernetes logging, if done naively, can exceed the cost of the workloads being protected. Getting the strategy right before you deploy is worth the investment.
What to Collect
- Kubernetes API server audit logs: every API call, who made it, and what changed
- Container runtime logs: process execution, network connections, file system changes
- Node-level logs: system calls that bypass container runtime visibility
- Application logs: structured where possible, routed to a central destination
What to Ignore
The most common Kubernetes logging mistake is collecting everything and filtering later. Container stdout at full verbosity, health check logs from load balancers, and routine scheduler decisions generate massive volumes with minimal security value.
Define your logging policy before deployment. Every log source should have an explicit justification tied to a security use case. If you cannot articulate why you are collecting a log, do not collect it.
Detection Use Cases That Require Kubernetes Logs
- Privilege escalation via API server: requires audit logs with verb and resource
- Container escape: requires node-level syscall data, not just container logs
- Lateral movement via service accounts: requires audit logs plus network flow data
- Supply chain compromise: requires image provenance data alongside runtime logs
The Retention Question
Kubernetes audit logs for a busy cluster are expensive to retain at the volumes you would want for a mature security program. Tier your retention: high-fidelity retention for API calls involving privileged operations, compressed retention for routine activity, and aggregated metrics for the rest.
Define your retention policy in terms of investigation requirements, not storage budgets. Work backward from the question: if we were breached six months ago, what logs would we need to reconstruct the attack path? That answer defines your retention floor.