GPU nodes join and leave clusters in minutes. Training jobs run for hours or weeks and terminate. Your identity and access controls weren't built for infrastructure this dynamic. Teleport's unified identity layer secures every engineer, GPU node, training workload, and AI agent across your AI infrastructure.

THE PROBLEM
Training pipelines run on hardcoded tokens embedded in config files. Service accounts carry access to model registries across every cluster — and when engineers leave or jobs terminate, the credentials outlive them both.
An engineer at a major AI lab is reported to have spent nearly a year exfiltrating hundreds of millions in AI trade secrets using their standing access.
Modern AI infrastructure has outgrown the identity and access models built to govern it. The result is fragmented access, siloed logs, and no unified way to see who or what is touching your crown jewels.

When you're operating GPU clusters across bare metal, multiple clouds, and co-location facilities, the infrastructure scales faster than the credentials securing it can be governed. Teleport eliminates standing privileges, and credentials that can be shared, lost, hardcoded, or stolen.
Every engineer, service, workload, and AI agent authenticates without passwords, SSH keys, or API tokens — so there are no static credentials across your GPU infrastructure to steal, share, or rotate.
Engineers request elevated access to production GPU nodes and model registries — including break-glass access during incidents — and it expires automatically when the task closes.
Every session is attributed to a cryptographic identity, giving you one complete record across every GPU node, training pipeline, and model registry your team touches.
AI-NATIVE
Teams spinning up inference services, agentic workflows, and data pipelines as fast as the product demands — where credential sprawl and standing privileges accumulate across every service faster than any small team can track.
GPU INFRASTRUCTURE MANAGEMENT
AI workloads running across bare metal, multiple clouds, and co-location facilities simultaneously — each environment with its own siloed identity and access tools.
LARGE-SCALE MODEL TRAINING
Teams running foundation model training across thousands of GPU nodes — where hundreds of jobs run simultaneously, each spinning up and terminating on its own schedule.
Static credentials protecting your model weights
GPU nodes, model training pipelines, and inference services authenticate with shared service accounts and hardcoded tokens distributed across config files, container images, and CI/CD pipelines. These credentials never expire and carry more access than any single task requires. A compromised token provides standing access to your model weights — with no record of which engineer or service used it.
Engineers and automated pipelines get exactly the access the task requires, for exactly as long as it takes. When someone leaves or a job terminates, there is nothing to revoke because nothing persists. Teleport eliminates the static credentials and standing privileges putting your critical infrastructure and model IP at risk.
Disconnected clusters are hard to reach and harder to govern
GPU clusters span bare metal, multiple clouds, and co-location facilities — and they don't all look the same from an access perspective. Many operate on private networks that can only egress, making standard VPNs and bastions architecturally incompatible. Every environment has its own credentials and access model, with no unified way to reach, govern, or audit access across all of them.
Teleport's reverse tunnel proxy bridges disconnected and airgapped clusters without inbound firewall rules — the agent egresses out, engineers connect through a single proxy regardless of cloud, bare metal, or co-location. One unified identity layer spans every cluster, node, and service — however many you run.
GPU nodes cycle faster than credentials can keep up
GPU nodes are pulled from clusters, reimaged, and returned constantly — for maintenance, hardware fixes, and capacity rebalancing. Static credentials were never designed for infrastructure that changes this fast. Managing them manually at this cadence means teams are always either chasing stale access or scrambling to reissue credentials before the node comes back online.
Teleport issues short-lived certificates at runtime — the node is immediately reachable with the right identity and access. When it's reimaged the certificate expires and a fresh one is issued on rejoin. Machine identity is established, maintained, and expired automatically — no SSH keys to distribute, no credentials to rotate, no manual intervention required.
The job ends. The access doesn't.
Every training job and CI/CD pipeline authenticates with service accounts and hardcoded tokens that carry access long after the workload terminates. A job that ran last Tuesday still has a service account with standing access to your model registry. The credentials accumulate — and nobody knows what still has access to what.
Teleport issues cryptographic identity to every training job and CI/CD pipeline at runtime — scoped to exactly what the task requires and expired when it terminates. Service accounts don't accumulate. Tokens don't persist between runs. Every workload gets its own identity, issued when it acts and gone when it's done.
Third party access means shared credentials and no audit trail
Researchers, contractors, and hardware vendors need access to GPU infrastructure — but onboarding them to your corporate identity provider is slow and the alternative is shared SSH keys or static VPN credentials. There is no record of what they accessed, what commands they ran, or whether their credentials have been shared further down the chain.
Researchers, contractors, and vendors are onboarded in minutes without touching your corporate identity provider. Each gets a cryptographic identity scoped to what the engagement requires — and nothing more. Every session is recorded and every command attributed to a verified identity. When the engagement ends, so do the privileges.
No unified audit trail across your AI infrastructure
When an auditor asks who accessed your training infrastructure and what they did, the answer requires stitching together logs from GPU clusters, cloud providers, bastion hosts, and identity tools — a process that takes weeks and still produces incomplete evidence. SOC 2, FedRAMP, and ISO 27001 auditors are beginning to ask the same questions about model access that they've been asking about database access for years.
Teleport records every session with command-level logging tied to a cryptographically verifiable identity — across every GPU cluster, Kubernetes service, database, and model registry. AI-generated timelines reconstruct incidents in minutes, tracing the full identity chain from login to resource access across systems.
Model weights move through your infrastructure from training to deployment. Teleport's unified identity layer follows them — securing every engineer, node, pipeline, and service that touches them along the way. When an engineer needs access to a GPU cluster, Teleport authenticates them via their identity provider, issues a short-lived X.509 certificate limited to the minimum required role, and logs the full session at the command level. The certificate expires automatically when the task is complete.
Unify access across GPU clusters, bare metal nodes, Kubernetes services, model registries, and databases — through a single proxy with one audit trail.
Unify access across GPU clusters, bare metal nodes, Kubernetes services, model registries, and databases — through a single proxy with one audit trail.
Just-in-time access with auto-expiring privileges. Approvals via existing ITSM or collaboration tools. No engineer or pipeline retains access to infrastructure after the task closes.
Just-in-time access with auto-expiring privileges. Approvals via existing ITSM or collaboration tools. No engineer or pipeline retains access to infrastructure after the task closes.
Short-lived certificates for engineers, GPU nodes, training jobs, and AI agents. No SSH keys, hardcoded tokens, or shared service accounts that can leak, be shared, or be stolen — for any identity type.
Short-lived certificates for engineers, GPU nodes, training jobs, and AI agents. No SSH keys, hardcoded tokens, or shared service accounts that can leak, be shared, or be stolen — for any identity type.
Session recording with AI-generated summaries. Every action, every node, every identity — stored for compliance evidence and incident investigation.
Session recording with AI-generated summaries. Every action, every node, every identity — stored for compliance evidence and incident investigation.
Regulatory requirements
SOC 2 · ISO 27001
Every session is attributed to a cryptographically verifiable identity. Structured audit logs across GPU clusters, Kubernetes services, databases, and model registries reduce audit prep time by up to 80% — giving auditors a complete record of who accessed your training infrastructure, what they did, and when their access expired.
FEDRAMP · HIPAA
For organizations operating under FedRAMP or handling protected health information, Teleport supports FIPS 140-3 endpoints, SCIM provisioning, and MFA enforcement across your entire infrastructure. Every access request is task-based, time-limited, and automatically expired — so your compliance posture keeps pace with your audit requirements.
GDPR · DATA RESIDENCY
For organizations operating under GDPR or regional data residency requirements, Teleport supports fully self-hosted deployment inside your own VPC or data center — including airgapped environments — with no SaaS dependency. Your session recordings, audit logs, and access data never leave your infrastructure.

Teleport allows us to comply with the regulatory hurdles that come with running an international stock exchange. The use of bastion hosts, integration with our identity service and auditing capabilities give us a compliant way to access our internal infrastructure.
Brendan Germain
Systems Reliability Engineer
DOCS, GUIDES & DEEP DIVES
Does Teleport work with air gapped GPU clusters?
Yes. Teleport's reverse tunnel proxy allows GPU clusters on egress-only private networks to connect outbound to the Teleport proxy — without opening inbound firewall rules. Engineers connect through the proxy regardless of whether the cluster is on AWS, bare metal, or a co-location facility with no inbound reachability.
How does Teleport handle ephemeral GPU nodes that get reimaged constantly?
Teleport issues short-lived certificates automatically via cloud-init when a node boots. When the node is reimaged the certificate expires. When it rejoins a fresh certificate is issued automatically. Engineers never touch a credential — machine identity follows the node lifecycle without any manual intervention.
Can Teleport secure machine and workload identity for training jobs and CI/CD pipelines?
Yes. Teleport's Machine & Workload Identity issues cryptographic SPIFFE/SVID identities to every training job and CI/CD pipeline at runtime — scoped to exactly what the task requires and expired when it terminates. No hardcoded tokens, no shared service accounts, nothing that persists between runs.
How does Teleport secure AI agents and MCP servers running on GPU infrastructure?
Teleport treats AI agents as distinct identities — issuing short-lived credentials and governing them using the same policy and access control framework used for human and machine identities. Teleport governs both developer access to MCP servers and LLM-to-MCP server queries through a single identity control layer, with full audit logging of every prompt, query, and tool call.
Can Teleport be deployed fully self-hosted insite our VPC?
Teleport treats AI agents as distinct identities — issuing short-lived credentials and governing them using the same policy and access control framework used for human and machine identities. Teleport governs both developer access to MCP servers and LLM-to-MCP server queries through a single identity control layer, with full audit logging of every prompt, query, and tool call.
How does Teleport help with FedRAMP, SOC 2, and ISO 27001 compliance?
Teleport provides a complete, attributable audit trail for every session — across GPU clusters, Kubernetes services, databases, and model registries — tied to a cryptographically verifiable identity. FIPS 140-2 endpoints, SCIM provisioning, and MFA enforcement support FedRAMP and regulated AI workloads. AI-generated timelines reconstruct incidents in minutes, reducing audit prep time by up to 80%.