Kubernetes for Agentic AI: Best Practices for Security and Observabili

Kubernetes for Agentic AI: Best Practices for Security and Observability

Boris Kurktchiev, Jack Pitts

Published April 1, 2026

Kubernetes for Agentic AI: Best Practices for Security and Observability Header Image

Kubernetes security, observability, and availability best practices
Securing containers is only the first layer

Agentic AI workloads are shipping to production on Kubernetes faster than the standards to secure them. Many teams deploying autonomous, tool-calling agents as containerized microservices do so without a shared baseline for securing or monitoring those containers.

The CNCF AI Technical Community Group recently published a comprehensive article on cloud-native agentic standards, marking the first attempt to define best practices for such deployments. Boris Kurktchiev, co-author of this post, is one of the group’s elected leaders and a contributor to the CNCF article.

In this post, we will highlight 18 Kubernetes best practices outlined in the article. Each of these practices is focused on building the container security, observability, availability, and fault tolerance necessary to support agentic workloads.

Kubernetes security, observability, and availability best practices

Before layering on agent-specific controls, teams should first implement fundamental container best practices. These recommendations are not specific to agentic use cases; they apply to any modern containerized or serverless environment.

1. Enforce least privilege for each container

Enforce the principle of least privilege to grant only the minimal permissions required for the container to operate. This requires configuring user controls, network policies, security contexts, and access control to minimize the attack surface.

Least privilege has long been a container security best practice, but the introduction of agentic workloads increasingly highlights its importance. In fact, Teleport’s latest research found that organizations with least-privileged AI systems experienced 4.5x fewer incidents than those with over-privileged AI. The same survey also revealed 70% of organizations currently grant their AI systems more access than a human in the same role would receive.

Securing Kubernetes is an ongoing process that covers image assurance, network segmentation, least-privilege access controls, and continuous monitoring. Any production security checklist should include enforcing least privilege as well as implementing role-based access control (RBAC) and applying granular pod security or admission policies.

2. Hide unnecessary dependencies and information

Every additional binary or library in the container expands the attack surface that a compromised agent could exploit. Avoid exposing unnecessary dependencies outside the container by packaging only what is needed, using multi-stage builds to minimize image size, and ensuring that no build tools, credentials, or secrets leak into the final image.

Our research found that 67% of organizations report high reliance on static credentials like API keys and long-lived tokens. This reliance is correlated with a 20-percentage-point increase in AI-related incidents. Every credential that leaks into a container image becomes a persistent, stealable path into the environment.

3. Source (and verify) trusted container images

Use secure container images from official, trusted repositories and scan images for vulnerabilities regularly. Sign and verify images to ensure integrity and provenance, and add OCI-compliant annotations to document metadata, including source, version, authorship, scan status, and signature information.

Image integrity is a recurring theme across the industry. Scanning base images regularly is widely regarded as the first step in securing any Kubernetes environment, and this is reinforced by continuous vulnerability assessments of containers throughout the build phase. Teleport's integration with Sigstore locks access to only signed container images.

4. Run containers as non-root users

Define a non-root user in the Dockerfile and configure the runtime accordingly. This limits the blast radius in the event of a security breach.

Setting runAsNonRoot: true in the security context prevents containers from running as the root user and reduces the risk of privilege escalation. This is a straightforward configuration change that meaningfully constrains what a compromised container can do.

5. Update or use distroless images

Continuously update base images to include the latest security patches. Use distroless images where possible to reduce the number of components that can carry vulnerabilities.

Distroless images are container images stripped of unnecessary components such as package managers, shells, or even the underlying operating system distribution, reducing vulnerabilities and enforcing secure software delivery. Standard base images ship with hundreds of utilities that the application does not need, but that an attacker who compromises the container can use to escalate access. Distroless images remove those tools entirely.

6. Monitor and log container activity

Monitor runtime behavior, resource usage, filesystem access, network activity, and system calls to detect anomalies or security incidents early.

Our research found that 43% of organizations report AI making infrastructure changes at least monthly without human review, with 7% reporting this frequency as unknown. A practical Kubernetes security model must include observability and audit logs across container environments (including those with agentic workloads) in order to create feedback loops that surface anomalous behavior before it escalates.

7. Consolidate observability with a standard MELT stack

Use a standard Metrics, Events, Logs, and Traces (MELT) stack to consolidate Kubernetes observability data, improving the system's explainability and debuggability. MELT brings these signals together in a coherent model, providing a complete picture of system behavior.

Without a shared model, teams fall into predictable patterns: metrics absorb high-cardinality labels to explain behavior they were not designed for, and logs become the default debugging tool for performance issues. A unified MELT framework, achievable through cluster activity logging and identity-traceable audit trails, prevents these anti-patterns and keeps observability data actionable.

8. Incorporate network observability

Collect network flow logs (e.g., source/destination IPs, ports, protocols, and packet counts) for security, performance monitoring, and troubleshooting.

Traces capture how requests move across Kubernetes components and applications, linking latency, timing, and relationships between operations. Network flow logs serve a complementary function, capturing which services are communicating and how much data is moving between them.

9. Monitor resource usage across nodes and containers

Track resource-specific and network metrics for system robustness, including:

Disk usage on nodes and persistent volumes to prevent outages caused by storage exhaustion.
CPU and GPU usage at the node and container levels to detect bottlenecks and prepare for potential in-place pod-resizing overhead.
Control plane and node health.

In 2026, data management tasks are among the most important, including sampling, smart retention policies, adaptive data ingestion paths, and storage for long-term SLO analysis. Resource monitoring forms the foundation for all downstream decisions.

10. Instrument workloads with application-level metrics

Expose application-level and business-critical metrics in addition to system-level metrics. For agentic workloads, this means capturing metrics that go beyond standard resource utilization to reflect how well the workload performs its assigned tasks.

Unlike traditional applications, LLMs present unique observability challenges, including token-based pricing models, variable inference times, and the need to monitor both technical metrics and model quality metrics. Application-level instrumentation is what bridges the gap between infrastructure health and actual workload performance.

11. Configure alerting based on SLO and SLA thresholds

Set up alerting based on SLO and SLA thresholds to ensure service-level objectives are being met.

Observability provides technical telemetry, but it also guides how teams reason about user impact. SLO-based alerting ties observability data directly to the outcomes that matter.

12. Implement cost observability

Implement cost observability to support GPU and LLM benchmarking. For teams running inference workloads at scale, cost tracking should be treated as a first-class metric alongside latency and error rates.

The economics of inference are a growing concern. Organizations deploying large language models are discovering that inference costs, not training costs, dominate their operational spend. But without cost observability baked into the monitoring stack, teams may only discover cost overruns after they have occurred.

13. Secure observability pipelines

Secure observability pipelines to prevent tampering with agent audit trails. If an agent's behavior is being evaluated for compliance, safety, or debugging, the integrity of the telemetry data matters.

14. Set data retention and aggregation policies

Set up data retention and aggregation policies. Agentic workloads can generate more telemetry than conventional services because of the volume of tool calls, model interactions, and multi-step reasoning traces. Retention policies should account for that.

The focus has shifted from "collect everything" to "collect what helps us deliver business outcomes.” For agentic workloads, calibrating retention policies is essential to avoid both data loss and unsustainable storage costs.

15. Set pod resource limits and requests

Define CPU, GPU, and memory limits in pod specs to prevent noisy-neighbor issues and ensure container stability.

The LimitRanger and ResourceQuota admission templates prevent resource overcommitment, and users can define custom security contexts and enforce them at the pod level. Without these explicit resource boundaries, a single workload can consume unbounded resources and impact other workloads on the same node.

16. Use PodDisruptionBudgets

Enforce a minimum pod availability during voluntary disruptions, such as upgrades or node drains, using PodDisruptionBudgets, which limit the number of pods that can be concurrently down in a replicated application.

17. Distribute replicas across failure domains

Use Pod Anti-Affinity or Topology Spread Constraints to distribute pod replicas across nodes or zones, minimizing the impact of failures at the node or zone level. The Kubernetes topology spread constraints documentation explains how these constraints govern the distribution of pods across failure domains, such as regions, zones, and nodes, to achieve high availability and efficient resource utilization.

18. Scale workloads dynamically with Horizontal Pod Autoscaler

Use the Horizontal Pod Autoscaler (HPA) to scale workloads based on CPU, memory, or custom metrics such as request volume. However, the article notes that these general availability practices apply to smart load-balancing for inference models but do not pertain to more comprehensive MCP, agent-to-agent, or LLM tooling scenarios, which require their own resilience patterns.

Securing containers is only the first layer

The practices above secure the Kubernetes containers in which agents run. Still, they do not address the identity, access, and governance challenges that arise when those agents authenticate to infrastructure, invoke tools, access sensitive data, and act on behalf of users.

These challenges are also what the Teleport Agentic Identity Framework is built to help solve. The Framework includes standards-driven primitives, SDKs, reference architectures, and integration patterns to define agent identity, policy-governed access to tools and data, controls over LLM usage, and end-to-end auditability for production agentic systems.

Read the full article, Cloud-native agentic standards, on the CNCF website.

About the authors

Boris Kurktchiev is a Field CTO at Teleport known for his expertise in Zero Trust identity solutions for cloud and AI, as well as his contributions to the CNCF's Cloud Native AI working group.

Jack Pitts is a writer at Teleport with many years of experience covering emerging cybersecurity, DevOps, and AI/ML topics.

→ 2026: The Top AI Infrastructure Risks and Identity Gaps
→ How to Prevent Prompt Injection in AI Agents
→ How to Deploy Teleport on Kubernetes
→ From Zero Trust to SPIFFE: How to Secure Microservices with Istio and Teleport

Kubernetes security, observability, and availability best practices
Securing containers is only the first layer

Teleport Newsletter

Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.

Teleport Newsletter

Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.

Ready to Teleport?

Contact Sales Start Free Trial

Kubernetes for Agentic AI: Best Practices for Security and Observability

Table Of Contents

Kubernetes security, observability, and availability best practices

1. Enforce least privilege for each container

2. Hide unnecessary dependencies and information

3. Source (and verify) trusted container images

4. Run containers as non-root users

5. Update or use distroless images

6. Monitor and log container activity

7. Consolidate observability with a standard MELT stack

8. Incorporate network observability

9. Monitor resource usage across nodes and containers

10. Instrument workloads with application-level metrics

11. Configure alerting based on SLO and SLA thresholds

12. Implement cost observability

13. Secure observability pipelines

14. Set data retention and aggregation policies

15. Set pod resource limits and requests

16. Use PodDisruptionBudgets

17. Distribute replicas across failure domains

18. Scale workloads dynamically with Horizontal Pod Autoscaler

Securing containers is only the first layer

About the authors

Related articles:

Table Of Contents

Teleport Newsletter

Tags

Tags

Teleport Newsletter

Ready to Teleport?