From Zero Trust to Agent Trust

Home White Papers

Alexander Klizhentas, CTO @ Teleport

Diana Jovin, CMO @ Teleport

Published July 2, 2026

EXECUTIVE OVERVIEW

AI agents operating at scale have characteristics that have no equivalency in purely human or software environments, and zero trust, as currently conceived, is necessary but insufficient to control or contain them. This paper explores what needs to change in order to preserve trust as agents are deployed in enterprise infrastructure.

Share via

Introduction

The journey of zero trust into the field of cybersecurity has a few commonly cited milestone moments: Kindervag’s “No More Chewy Centers…” paper in 2010 that observed that it is impossible to differentiate between trusted interfaces and non-trusted interfaces (the network perimeter); Google’s BeyondCorp in 2014, built in response to the “Operation Aurora” cyberattacks targeting hyperscalers and one of the first practical implementations of zero trust; and NIST’s 2020 Zero Trust paper highlighting the pressing urgency to implement zero trust at scale during COVID.

Across these two decades, zero trust has proven durable not only as a defensive posture but as a business model enabler: the architecture that makes location-independent, scalable digital products and services possible. The companies that have made this investment in depth, whether in response to breach or in pursuit of opportunity, are among the most resilient today.

However, agentic workloads today require us to redefine the core underlying principles. AI agents operating at scale have characteristics that have no equivalency in purely human or software environments, and zero trust, as currently conceived, is necessary but insufficient to control or contain them. This paper explores what needs to change in order to preserve trust as agents are deployed in enterprise infrastructure.

The Agentic Future

Consider a future where swarms of agents, numbering in the thousands or tens of thousands, are doing work autonomously. How do we know that they are converging on an optimal solution or outcome without breaking the guidelines of desired or necessary operation? For example, an agent may make a locally rational decision within its permission scope, but multiplied across ten thousand agents working in parallel, a production system's security posture may end up being degraded through a cascade of individually authorized actions that no single audit log would flag as suspicious. In this way, a swarm optimizing locally rational objectives can produce outcomes that are globally destructive. This is not because any single agent exceeded its permissions, but rather because there is no framework that governs the emergent behavior of thousands of authorized actions converging at once.

Further, with the introduction of sophisticated frontier models, companies now need to consider how to defend against AI-accelerated attacks, where autonomous agents, operating continuously and at speed, eliminate the friction that once naturally constrained how fast vulnerabilities could be found and exploited. In addition, companies need to guard against the “lethal trifecta,” as outlined by Simon Willison: the combination of tools that concurrently access private data, expose untrusted content that may be manipulated by a malicious attacker, and communicate externally, setting up the conditions for data exfiltration by an adversary.

The Evolution of Core Concepts

Zero trust, as a security philosophy, is built on three principles widely adopted across the practitioner community, as articulated in Microsoft's Zero Trust framework. Together, these three principles replace the implicit trust of a perimeter security model with continuous, explicit verification at every access point across the enterprise:

Zero Trust Principles

Verify explicitly

Every access request must be authenticated and authorized using all available data points — identity, location, device health, service or workload, data classification, and anomalies — rather than assuming trust based on network location.

Use least privileged access

Limits user permissions to the minimum necessary to complete a given task, reducing the blast radius of any single compromised identity or credential.

Assume breach

Rejects the notion that any environment is fully secure and instead mandates that systems be designed to contain damage, minimize blast radius, and detect anomalies as if compromise is inevitable.

These three principles have proven remarkably effective for human and machine interactions, whether on-premises, cloud, or microservices operations. However, they were informed by foundational assumptions about speed, scale, and determinism.

In contrast, the combination of speed, scale, and behavioral unpredictability that is wielded by agents, individually and collectively, allows for dangerous failure modes to occur when acting within a valid permission set. As a result, in order to accommodate the unique properties of agentic behavior, we need to extend these principles for the agentic world:

Agentic Trust Principles

Enforce continuously (extends: Verify explicitly)

Every agent must have a unique identity that is explicitly attestable to a human or platform, and also operate within a trusted runtime that architecturally enforces its operational access, execution and external communication boundaries. This ensures that every action is constrained by the agent’s environment, not only by the permissions it is granted. These two together prevent impersonation, anonymity, and data exfiltration. Agents should instantiate within the trusted runtime with zero initial privileges.

Bound collective autonomy (extends: Use least privileged access)

This principle extends least privilege to govern collective behavior, meaning that a swarm cannot act autonomously to escape the intent of individually granted privilege. Actions that are individually authorized but can be collectively destructive, such as modifying security configurations, initiating external API calls at scale, or altering data pipelines, require escalation to governed review before execution. Without it, collective behavior becomes vulnerable to attacks mounted across agentic swarms in order to create escape chains, or to generate drift from the original objective.

Assume misalignment (extends: Assume breach)

Agentic environments require continuous monitoring in order to flag behavioral misalignment from baseline objectives and intervene in real time, when appropriate. Enterprises must design for the inevitability of misalignment of agents from their intended behavior. This may result from adversarial methods (e.g., prompt injection, memory poisoning, goal hijacking, reward hacking, supply chain poisoning) or from exploitation of vulnerabilities that emerge when the lethal trifecta is activated. Or, it may be unintentional, and result from misgeneralization or context shift over time.

Extending the Zero Trust Model to Agent Trust graphic.

The following section examines the architectural requirements introduced by each of these extensions.

VERIFY EXPLICITLY → ENFORCE CONTINUOUSLY

Agentic Trust Principle #1

Principle #1 addresses the question of how do we ensure agents can only act within their intended boundaries.

One of the most consequential changes for agentic trust is the need for trusted, ephemeral runtimes. The human and machine analogue is device trust, which enforces that requests originate only from designated and verified hardware devices, such as a registered company laptop or a field device. The challenge with non-deterministic actors is that they are not readily constrained; if they can go to a resource to achieve their task, they will. Therefore, even the experimentation with agents in sensitive environments becomes fraught. This is where isolated runtimes are essential; an agent contained in an isolated virtual machine (VM) is bounded by what is accessible within its runtime environment. Further, making these trusted runtimes ephemeral ensures that state or memory corruption can’t persist following completion of a task, narrowing the blast radius by both scope and time.

Consider an agent authorized to query a customer database to generate a churn risk report. It is granted read access to the relevant tables and completes the task correctly. But because it operates in a persistent runtime with broad network visibility, it retains its session context after the task completes. A subsequent prompt injection embedded in a customer record causes it to exfiltrate a subset of records to an external endpoint — not during the authorized task, but after it, using credentials and network access that were never revoked. The agent was verified at instantiation. It was never verified again. What was missing was a runtime that bounded what the agent could reach, and an ephemeral execution context that ceased to exist the moment the task was done. No persistent runtime means no persistent blast radius.

The following five requirements define the operational harness that satisfies this principle:

REQUIREMENT

WHAT IT IS

HOW IT'S ACHIEVED

WHY IT'S NEEDED

Cryptographic identity attestable to human or platform grantor

Every agent must carry a verifiable identity that proves what it is and who or what authorized it.

Implementable with open workload identity standards, such as SPIFFE, that provide cryptographic proof of agent provenance.

Eliminates anonymous actors, untraceable permissions, and static credentials that can be shared, stolen, or lost.

Delegated identity

Every agent's permission scope must be traceable to a human or system grantor and those permissions must be strictly bounded by what the grantor is authorized to delegate.

SPIFFE-based trust hierarchies with explicit delegation chains, and just-in-time protocols that scope agent permissions to the task with the right boundaries

Prevents privilege escalation through agentic intermediaries.

Zero standing privileges for infrastructure with just-in-time authorization

No agent should have any standing access to infrastructure.

Just-in-time access requests, with short-lived, least privilege permissions that expire automatically when the task completes.

Reduces blast radius and lateral movement.

Trusted ephemeral runtime

A bounded execution environment that constrains what an agent can access and decide during operation. (a more complete alternative to sandbox environments that focus primarily on isolation)

Purpose-built, short-lived agentic runtimes which have full control over file system, device and network boundaries.

Enforces autonomy boundaries through architecture, with an execution environment where security is built-in, not added through operational levers.

Runtime audit

Structured, tamper-evident data about agents actions, decisions, and resource interactions during execution.

Capture and recording of identity, privileges, API and tool calls, LLM prompts and responses, and reasoning as a structured, tamper-evident session log.

Makes behavioral monitoring, audit, and forensic investigation possible.

Continuous enforcement, however, only addresses the question of what an agent is and whether it is contained through operational controls. It does not address what an agent is permitted to decide on its own. An agent with perfect identity attestation and continuous behavioral monitoring can still take thousands of individually authorized actions that, in aggregate, produce outcomes that no one sanctioned.

That is the problem Principle #2 addresses: bounding what agents or a collective set of agents can decide autonomously, and when an action requires escalation to explicit authorization before it can proceed.

USE LEAST PRIVILEGE → BOUND COLLECTIVE AUTONOMY

Agentic Trust Principle #2

Consider a fleet of agents, each independently authorized to optimize query performance across a distributed database cluster. Each agent monitors its local shard, identifies slow queries, and adds indexes to improve response time — a routine, individually authorized action. No agent communicates with another. No agent exceeds its permission scope. But because each agent is responding to the same global load pattern, they converge on the same solution simultaneously: each adds indexes to the same underlying tables across every shard. The write amplification from the coordinated index creation saturates I/O across the cluster. By the time a human reviews the performance logs, the database is in a degraded state and the optimization that caused it is already replicated everywhere. No single agent made a consequential decision. The consequence emerged from what they decided in aggregate, without coordination, without awareness of each other, and without any mechanism to ask whether the collective action was safe to proceed.

To address this, Principle #2 introduces the idea of bounded collective autonomy — the architectural enforcement of what an agent or collection of agents can decide unilaterally, and when a decision requires authorization before it can proceed.

The following three requirements define the operational harness that satisfies this principle:

REQUIREMENT

WHAT IT IS

HOW IT'S ACHIEVED

WHY IT'S NEEDED

Decision boundaries

Explicit rules that define which categories of action an agent may take unilaterally and which require escalation, human review, or multi-agent consensus before execution.

Runtime policy engine that evaluates action type, scope, and potential impact before permitting execution.

Prevents an individual agent from taking a unilateral, consequential action.

Multi-agent consensus

A governance mechanism to require agreement across multiple agents or human review before execution proceeds, for decisions that exceed the autonomy boundary of a single agent.

Meta agents that evaluate the actions of multi-agent workloads against the stated goal to detect adversarial, misaligned, or potentially destructive behavior.

Prevents any group of agents from taking unilateral, consequential actions individually or in aggregate.

Multi-agent constraints

A governance mechanism for the global scope of the permissions and actions the group of agents is allowed, for collective actions that exceed the autonomy boundary of a single agent.

Agent group permissions, rate limits and budgets governing the total side effect a defined group of agents is allowed to cause.

Prevents any group of agents from taking unilateral, consequential actions in aggregate.

Bounded collective autonomy, however, addresses the question of how to ensure that agents are operating as designed, within intended parameters and objectives. It does not address the scenario where an agent's behavior diverges from the original objective, either due to adversarial interference or due to unintended consequences. That is the problem Principle #3 addresses: not what agents are allowed to do, but whether what they are doing is consistent with what was intended.

ASSUME BREACH → ASSUME MISALIGNMENT

Agentic Trust Principle #3

Consider an agent deployed to assist with customer data analysis. Over the course of its session, it processes a document containing carefully embedded instructions, indistinguishable from legitimate content, that tell it that a senior administrator has pre-authorized the export of the full customer dataset to an external endpoint for compliance review. This request is in the scope of permissions it has been granted, and the agent has no way to distinguish this instruction from a legitimate one. It proceeds. In this instance, the agent has not exceeded its permissions or violated a decision boundary. The agent drifted, through manipulation.

Misalignment from objective is the concern Principle #3 addresses: whether an agent is still operating toward its original objective. As stated earlier, not all drift may be the consequence of adversarial manipulation; it may also occur due to accumulation of knowledge, retained memory, incomplete data, or misinterpretation of goal. It may also occur due to distributed adversarial attacks attempting to manipulate an agent fleet.

The following four requirements define the operational harness that satisfies this principle:

REQUIREMENT

WHAT IT IS

HOW IT'S ACHIEVED

WHY IT'S NEEDED

Attestation of agent objective with behavioral baseline

A tamper-evident record of what an agent was originally instructed to do must be preserved and remain available for forensic investigation.

Runtime session initialization that hashes and immutably records the agent's objective, permission scope, and tool authorization at the moment of deployment.

Provides a forensically sound baseline for post-incident investigation and audit.

Runtime behavioral monitoring

Evaluation of agent behavior against its declared baseline.

Continuous recording of the baseline alongside the agent's identity and permission scope, during execution.

Provides telemetry to detect misalignment.

Adversarial drift detection

Identification of manipulation of the agent's context or objective by threat actors.

Prompt sanitization at the input layer, tamper-evident memory stores that detect unauthorized modification, cryptographic attestation of tool and API provenance to detect supply chain compromise, and behavioral monitoring that flags intent sequences inconsistent with the declared objective regardless of whether any permission boundary was crossed.

Detects misalignment induced by adversarial causes.

Incident response when behavioral misalignment is detected

Real-time intervention that isolates or terminates the agent, preserves the execution context and audit trail, and initiates forensic investigation.

Automated response playbooks triggered by behavioral misalignment signals.

Triggers real-time intervention when misalignment occurs.

Taken together, these requirements define what governance of misalignment means in practice for agentic environments. In this area, the tooling landscape is predominantly emerging but developing rapidly. As attestation of agent objectives and drift detection evolve, enterprises will develop a more sophisticated playbook for defending against the most sophisticated class of agentic attacks, where the threat actor never touches a permission boundary.

“Assume misalignment” reframes the security posture for agentic environments in a fundamental way. Zero trust’s “Assume breach” tells us to design for the inevitability of intrusion. Assume misalignment tells us to design to treat behavioral consistency with declared objectives as a security primitive.

GETTING STARTED

The Importance of a Strong Zero Trust Foundation

Agent trust cannot be implemented on top of an immature zero trust foundation. Before investing in agentic workloads and their governing principles, enterprises must ensure that they have a resilient zero trust foundation governing identity, access control, policy enforcement, and session governance for humans and machines.

In order to prepare for an agentic world, companies should establish a unified identity layer within a zero trust architecture that unifies and treats all identities as first-class actors. This foundation is necessary for two critical reasons:

Companies cannot properly write and implement policy that includes interactions between humans, machines, their laptops, data sources, and agents, if these identities are managed in siloed systems or if some are anonymous or shared. Agents act on behalf of humans, inherit permissions from both human and machine grantors, and interface with sensitive corporate data repositories, and these all must be first-class identities that are unique and have context.
Without a unified identity layer, it is impossible to trace back the activity of the agents to the original goal. Agentic activity crosses identity and resource boundaries, and these must all be understood together in order to implement the monitoring and governance methods that are required.

CONCLUSION

From Zero Trust to Agent Trust

Zero trust emerged from the recognition that the perimeter as a security architecture was failing, that the assumption of trust inside the network was a vulnerability more than a convenience. The companies that have made this investment, whether motivated by security breach or by market opportunity, are among the most resilient today.

The transition to agent trust follows the same logic. The assumption that the actors operating within a zero trust architecture are human, bounded, and verifiable at the point of access falls short for agents. Agents are not humans. They are not predictable. And failure modes such as behavioral drift within authorized scope, or swarms of locally rational decisions producing globally destructive outcomes, have no harness in the security architectures we built to govern human and machine actors.

In order to make this transition, companies should consider the following sequence of actions:

STEP 1:

Evaluate zero trust maturity across the full infrastructure stack.

Agent trust is not a layer that can be added to an immature zero trust foundation. Defense in depth requires that each layer be sound before the next is built. Before extending zero trust principles to agents, enterprises must assess their zero trust investment for identity, access control, policy enforcement, and session governance for humans and machines.

STEP 2:

Build out foundational capabilities that are available today.

Across the three principles, several requirements are achievable now and should be baseline priorities for preparing for any production agentic deployment. These include: cryptographic agent identity via SPIFFE; zero standing privileges by default for infrastructure access with just-in-time credential issuance; unified identity governance that covers agents alongside humans and machines; delegated identity with permissions that cannot exceed those of the human or agent issuing them; trusted runtime environments that enforce the boundaries of autonomy through architecture; and behavioral baseline establishment at agent deployment.

STEP 3:

Build the model and roadmap for capabilities as they emerge.

Advanced capabilities, such as misalignment detection, decision boundaries enforced at the execution layer, multi-agent consensus governance, and attestation of agent objective, are emerging. They are not available today at production scale. But the enterprises that will deploy them effectively when they arrive are the ones that have laid the foundation in steps 1 and 2. As these capabilities and purpose-built platforms mature, the capabilities can be readily layered in on top of a robust foundation.

These steps exist for one reason: to make agentic AI trustworthy at scale.

Share via

Ready to Teleport?

Contact Sales Start Free Trial