
Secrets are Dead: Why Machine and Workload Identities are the Future of Cloud Security
Secrets are Dead: Why Machine and Workload Identities are the Future of Cloud Security
Static secrets like API keys, tokens, and passwords have become a major security liability in modern cloud environments. These credentials introduce significant security risks, are difficult to manage at scale, and create compliance headaches. The future of cloud security lies in dynamic, cryptographic machine and workload identities, eliminating static secrets and enforcing zero-trust authentication across your infrastructure.
Join us for this webinar as we explore how machine and workload identities improve security, simplify access management, and ensure compliance. We’ll show how Teleport uses SPIFFE to achieve these things while showcasing real-world examples of how organizations are using short-lived certificates, automated identity issuance, and granular access controls to eliminate credential-based risks.
In this webinar, we’ll break down why traditional secrets management is no longer enough and how organizations are adopting workload identities to secure applications and services at scale. You’ll see real-world examples of dynamic identity issuance in action and gain insights into how to strengthen security while reducing operational complexity. We’ll explore best practices for securing workloads across hybrid and multi-cloud environments.
Who Should Attend?
- Security architects and engineers looking to modernize identity-based security
- DevOps and platform teams managing cloud-native applications and services
- Engineering leaders and compliance professionals concerned with reducing credential-related risks
Key Takeaways:
- Why secrets are obsolete – Understand the risks of traditional API keys and static credentials
- The power of workload identities – Learn how dynamic identity issuance strengthens security and simplifies management
- Real-world implementation – See how organizations are leveraging workload identities to secure applications and services
Transcript - Secrets are Dead: Why Machine and Workload Identities are the Future of Cloud Security
Introduction
Eddie: Okay. I think we've got a good number of people who have joined us. So let's go ahead and get started. I'm really excited to be here today. I'm happy that you've taken the time out of your day to join us. I've got a few housekeeping tips to go over before we actually get started. Please notice that in the upper right-hand corner, there is a chat tab, a docs tab, and a Q&A tab. Chat tab, if you want to chat, feel free to chat a message. We have some resources in the Docs tab that you can download. And then if you have any questions, feel free, throughout the webinar, to type your question in, and we'll do our best to get those answered. Within about 24 hours, we'll send out a link to the recording, so you would be able to share this with colleagues or rewatch if you'd like. Again, my name is Eddie Glenn. I'm the Director of Product Marketing here at Teleport. Just a little bit about myself. I started off my career doing software development. Then that moved into DevOps and cybersecurity. So this is a topic that's near and dear to me, and I'm always excited to talk about it, but even more excited to have Dave join on this webinar. And Dave, why don't you tell everyone a little bit about yourself?
Dave: Yeah. Hi. I'm Dave Sudia. I'm a Senior Product Engineer here at Teleport, and I focus specifically on the Machine & Workload Identity product. And I started out my career as a quality engineer, moved into DevOps and platform. I was one of the first people to really fully move things into Kubernetes, particularly in the SMB sort of area. And now I work at Teleport because I wish I could have bought it back then, and I really love the product.
Why secrets are the enemy
Eddie: Awesome. Thanks, Dave. Okay. So we've got quite a few things to cover today. These are the high-level topics, and let's just go ahead and get started. So if you could move to the next slide, Dave. So I think the reason we're here is that static secrets are everywhere. And when we talk about static secrets, obviously, passwords is one type of static secret, but there are others — SSH private keys, API keys, cloud access keys, database credentials, JWT tokens, CI/CD secrets, signing keys. So there's lots of static secrets when we talk about secrets. And this is the part that I think is really what motivates us to want to be here today, is that these secrets are stored throughout an organization. We see them in source code repositories unencrypted, so visible. We see them stored in CI/CD pipeline build scripts, configuration files. They've been found in public clouds, test scripts.
Eddie: Obviously, we try to manage them with using things like secret vaults. People have them on their servers, their laptops, maybe a sticky note on the monitor, which in all these places, that might end up being the most secure place for a static secret because you can't get to it electronically. Containers and VMs. So they're really stored all over the place. And that causes a huge problem for us personally, for our companies, and the industry. And just to give you an example of the magnitude of the problem, this has been going on for a long time. The problem has been going on for a long time. And I just thought I'd take the last five years and kind of walk through some of the more serious incidents. But in 2020, secrets were everywhere. We had them stored, hard-coded in code bases. We shared them with our colleagues. If you need to get to machine X, well, here's the SSH private key.
Eddie: And there were a couple of very well-known incidents that occurred, the cause of this practice. We've all heard of SolarWinds. We've talked about them for a number of years. T-Mobile is another one that I guess I had forgotten about, but this was also a pretty major breach that involved secrets. I think it was API keys. And if we —
Dave: T-Mobile has got me credit monitoring twice, so I'm very aware of the T-Mobile breaches.
Eddie: We look at 2021, secrets management starts to go mainstream. There's now more secrets managers out there to help address this problem, things like HashiCorp Vault, AWS Secrets Manager, but there continues to be a rise in secrets-related breaches. And Codecov comes up a lot. They had a secrets breach that caused quite a bit of problems. And if you look at 2022, the theme just goes on. Now we're seeing people targeting CI/CD pipelines. Codecov again, Travis CI were two very well-known incidents. And then there were incidents surrounding just the entire software supply chain. So it happened in 2022, then what's going on? 2023, the same thing's still happening. So solutions like secrets vaults didn't necessarily work — better hygiene didn't necessarily work.
Eddie: The number of incidents continue to grow. And the last one here on the list, the SAP Kubernetes secrets leak, was pretty major because one of the things that happened as a result of that is that one of the customers that were impacted by this disclosed that a major factor in their bankruptcy was the secrets leak. So a technical issue, like a secrets leak, quickly becomes a business problem. And we'll talk more about what those business problems are. But this is probably one of the more significant business problems where an action on a company causes one of your customers to go bankrupt. So that's 2024. And then if we look at 2025 — and this is where at Teleport we're saying, "Secrets need to die. They are dead. We need to do something better, something different."
OWASP top 10 CI/CD security risks
Eddie: And we start seeing more broad adoption of things like workload identity. We're going to really talk a lot about that later. We'll talk about machine identity. But the breaches are still happening because static secrets are so widely used. Let's move on to the next slide. And I thought this would be really interesting just to remind everyone of — but there are organizations out there that have identified that secrets are primary security risks. So if we look at the top 10 list from this OWASP, 4 of them are related to improper hygiene or the use of secrets. And then if we look at the next slide, there's another top 10 list where this is around non-human identities. Again, long list, static secrets are a problem for those. So I hope before I start to turn this over to Dave to where we can talk about alternatives that I just really nail home that long-lived static secrets are dangerous — even if we use tools like secrets managers, there's going to be the ability to breach those — and that these problems have impact on the business.
Eddie: So I know it's really hard if you're an engineering manager to go to CISO or CTO and say, "Hey, I need to address this technical problem." If they don't necessarily understand the technical problem, it's really easy to explain it in terms they might understand, and that is around the business risk they have. So it's going to be an increased cost for the organization. Maybe it's fees, maybe it's paying for that credit monitoring services, cost of lost revenue, increased chance of breaches and compromises. We see that all the time. And then the other aspect that maybe we don't talk about as much is an inability to meet compliance risk. And because of these breaches, more government regulatory agencies are increasing the requirements for compliance and audit. So these are all business-related issues that I think that management and senior management would definitely understand. It helps address why managing secrets properly is good for the business.
Dave: Yeah. And the one thing, to piggyback on that, Eddie, with increased costs is also just the pure friction and just time for engineers. There's been some studies showing that developers spend about three hours a year per service that they're setting up creating secrets, managing secrets. There's another one that shows that ops and security teams spend about 30 minutes per day per person going through and managing these. If you think about all the amount of time that you have to spend rotating, provisioning, all of that, that's time that could be better spent building.
Eddie: Exactly. Yeah. That's a great point, Dave. So just so there's no question, static secrets are dead. There's a number of reasons for that. They're not designed really for a cloud-native world. In the aspect that we're dealing with ephemeral dynamic environments where you're constantly having to create and manage those secrets, they're often over-permissioned and overshared and impossible to rotate fast enough. They're not scalable. So yes, you might be able to rotate secrets and manage secrets for five virtual machines, but what happens when you have 2,000 or 5,000 servers? It becomes impossible. And then it's just not one secret, for instance. It could be hundreds of secrets. And then, obviously, there's the compliance and audit challenges. So we're going to take a minute, and I'd love to get some feedback from you guys. So we've got our first poll. And in just a second, we'll bring that poll up and go to the polls tab, and then let's see how you answer.
[silence]
Eddie: So we'll give it a few more seconds. Not terribly surprised, but infrastructure-as-code pipelines, yes, we see that a lot, as well as CI/CD pipelines. The source code — that's an interesting one because that's where some of the most embarrassing breaches have occurred, right, Dave? That we find secrets stored in public repositories.
Dave: Yeah. I think that's gotten better because you've got a lot of tools out there now that will scan your source code and tell you. I think that's one of the — and now you've got vendors that are scanning your source code. I've worked at a couple of places where we've suddenly lost Twilio access because Twilio noticed we published a key. And so, yeah, I'm glad to see only a couple of people putting it there. There's a lot of people — I'd say it looks like not the majority, but the plurality is in other places. If you don't mind — if you're someone who put “Other” and don't mind putting an example of “Other” into the chat, that'd be really useful, so I can speak to some of the use cases.
A modern approach for a modern problem
Eddie: One thing I wanted to mention about the laptop servers, I worked for a company a few years ago, and we had a customer that claimed they had about — they thought they had about 3,000 SSH private keys stored in their network. And we did a scan for them, and it turned out it was more like 5 million. So that's really the danger, especially with SSH keys, is that you don't necessarily know where they're at. And then, obviously, you can't remove them when they need to be killed. Okay, great. Well, thank you, everyone.
Dave: Thank you, Marcus and Vesco, for those examples too. Yeah. All right. Let me share my screen again. All right. So I'll kind of take it from here. So what is the alternative, right? We need to replace static secrets with some kind of identity system, in particular for machine and workload identities. If you're a current user of Teleport, you're already sort of familiar with this concept as we apply it to humans, right? Every human gets a short-lived certificate identity based off of them identifying themselves to Teleport, and we need to extend that. So to have a full architecture for that, you need a couple of layers. You need a layer where you actually do the identification. I'm going to start at the bottom, right? You have to have metadata about the thing you are trying to identify — and I'm going to go kind of in-depth about how you do this for machines — and issue it some kind of cryptographic identity.
Dave: Using that identity then, there needs to be a way to connect both with authentication and authorization in a very secure way so that the communication happening between machines is protected and there's really full zero trust. You're not relying on the network level. You want to be able to enforce policy for all that, and you want to be able to see audit logs. You want to be able to make sure that things are in compliance around review and everything more on the security team side. And the points of those layers are, again, kind of down here, starting at the bottom. These two layers get rid of anonymous computing. And the critical thing about that is — I think one of the interesting conversations we've been having at Teleport is around the word “trust”, right? And you want a zero-trust environment, but in a zero-trust environment, what you're functionally doing is identifying that you should trust something, right? You don't trust it until you have verified that identity.
Dave: And so we want to create a situation where you can trust everything in your system based on you knowing that it's that thing rather than because it's in a network level or because it has an API key, something like that. At the middle layer there, the point of this is to ensure that no one or nothing, in this case, CI/CD, IAC systems, that sort of thing, can do work or can access something unless there's a unit of work to be completed. So in a human application, that would be just-in-time access, right, and something like access request. For machines, it's really about making sure that the identity issued to something is only good for as long as it needs to do work, and we'll look at how that can work in a moment. And then at the top layer, it's that compliance piece of seeing, "Okay. My security posture is what it's supposed to be. Things have not changed," right? Access is happening as it's supposed to. And that, again, kind of goes to that audit level.
What are machines and workloads?
Dave: So with that, let's talk about machine and workload identities, how they function, what they are, and then we'll kind of see some examples. I want to break a little bit in here just because I've talked with people a lot about this. And so I'm going to give you my definitions for what I'm going to use in this webinar just because in Teleport, we have a sense of what a workload identity is. And then I'll interview people about machine and workload identity, and they'll go, "Oh, yeah, workloads, it's machines." And so just to kind of provide some specific definitions for this time. We're going to talk about machines as being computing infrastructure that runs workloads. This would be VMs. This would be bare metal racks that you've got. It could be a container, although the container might kind of go to the workload. Again, there's some fuzziness and blur across these.
Dave: But the main thing is that machines tend to have a fixed identity. They're around usually for a while. They can be ephemeral, but even when they're ephemeral, like a VM, maybe a spot instance, they're sort of a thing that is running other things. So another example of a machine, in this case, honestly, might be the Kubernetes control plane, right, but not necessarily the applications running inside of it. And then on the workload side, here, we're talking about application services processes that are running on those machines. It could be a IAC process that's running in CI. It could be a microservice running in a Kubernetes cluster that needs to speak to another microservice, could be a serverless function, etc. Again, these are processes that are running across machines. So you might have your own definitions. These are the ones we're going to work with for today. So let's talk about the trouble with static secrets and, in the end, with secret managers and stuff. And a lot of folks who talked about “Other” were talking about vaults, were talking about password managers.
Dave: You have an application that needs a secret to authenticate with other resources. It needs an API token, it needs database credentials, whatever, right? And so you store those secrets in your secret manager or your vault system, whatever. But to get those secrets from the manager, the application has to authenticate. And to do that, it needs a token. So you store that token maybe in a Kubernetes secret that the application can then use to authenticate with vault. But now that secret could be exfiltrated and used to connect with vault, which could allow you to exfiltrate more, pivot, whatever. So it ends up being turtles all the way down. Every time you need to get to the next secret, you need some secret to authenticate. So the bottom turtle is a verified identity for every piece of hardware, every process, every application that is proven with some kind of document. You have to get to a point where you can issue a cryptographic identity to something without it having its own secret to receive that identity.
Problems with human identity
Dave: And let's talk about this in a human context, just to kind of ground it, right? So you have a speakeasy back in the '20s, and you open the door. And in the US, right, alcohol was banned at this point, but there were a lot of underground clubs that would serve alcohol. And so they'd say, "What's the password?" And if you knew the password, you could get in. And the problem there being that if a police officer knew the password, now you're busted, right? So if you think about it now, you might go to a nightclub and you would be on the list. But what's the person who has the list going to do? What's the bouncer at the door going to do? They're going to check some kind of identity document for you, and you have one. You probably have several. Some things about that document are that it was issued by a trusted authority, and before they gave you that document, they verified information about you before issuing it. There was metadata about you. Maybe you provided a birth certificate or another form of identity, like a driver's license, right? Maybe you used biometrics, you provided fingerprints or something, trusted ID from another government, which we could maybe think of as Federation.
Dave: I was kind of doing a little research for this, and to get a passport in Canada, for example, you can provide a passport from some other country. Some critical components of this identity document are that others can use information in it to verify your identity. There's probably a photo in it, and they can hold up the photo, right, and see, yes, you seem to be the person that this is. Where you can go and what you can do can be governed by that document. Are you allowed to travel to this place? Are you allowed to drive a vehicle, right? All of these things. So there's authentication and authorization determinations made through this identity document. So how do we approach workload machine and workload identities? We do it the same way. You get an identity document. And an identity document for a service, for a machine, well, we already have this great standard. It's X.509 certificates, right?
Dave: They're cryptographic. They can have TTLs, so they can be short-lived. They can be revoked. They have broad support and compatibility. There is support for federation, sort of cross-governments. In this case, it would be cross-CA issuers. Issuance and renewal can be automated. And we already do this in a lot of different places. Everyone probably has a PKI system set up, or they're using some other sort of key issuance system. But what we want to do is create a system where we can validate the identity of a machine and issue it that certificate based on some sort of metadata. Now, again, this kind of exists in several places. If I want to go get a Let's Encrypt certificate, one of the things it'll do is reach out to my domain and make sure that I have some kind of metadata on my domain that proves that I own this domain, and then it'll allow me to issue the certificate for Let's Encrypt, right? So this is not brand new, but I think it's the extent to which we can do this now that is newer.
Dave: So for machines, what could that look like? It can look like OIDC tokens. If I want to know that something is running on a particular GitLab CI run, I can reach out to GitLab and ask on a domain, sort of similar to the Let's Encrypt, ask for a token, and get back a whole bunch of metadata about that run, right? We have cloud metadata. We have the assumed role on an EC2 instance that we can validate through APIs available in AWS. For bare metal hardware, you have a TPM. You actually have a cryptographic certificate that is only available on that piece of hardware that you could verify. In Kubernetes, you have JSON Web Key Sets, right? You can grab the JWKS from the cluster, grab the public set of that, we can validate that and know that, oh, this is this Kubernetes cluster. At the workload at the process level, you've got Kubernetes Pod metadata, like service accounts or labels. Similar in Docker and Podman. On Linux, you have the user and group and process IDs. You can grab systemd unit properties. So there's all of this available information that we can grab about a process and say, "Ah, cool. We've got validation." The service brought its driver's license, right, or its birth certificate, essentially, so that we can issue an identity to this.
Machine and workload identity - results
Dave: And the result of doing that is now every machine and workload has a trusted, cryptographic identity. You have real zero trust in your system because every single thing that has that cryptographic identity, you can actually trust it, right? You verify it, but you're verifying that thing, not just some token. You can start to write authorization rules based on identity rather than on tokens and passwords and keys. You, as an example, could say, "If you're trying to keep your data in the EU — your EU data in the EU, and your US data in the US, if a service says, 'I am the European checkout service,' then the European cart service can validate, 'Ooh, yeah, I can send this cart over to this checkout service.'" But if it reaches out and tries to create a TLS connection and the certificate comes back that it's the US checkout service, it can be like, "Oh, actually, no, I got routed incorrectly. I'm not going to create this connection and send this information." And we know that it's that thing because of how we validate it around the metadata.
Dave: So if you have trust, and again, not trust in the system, but trust in each individual thing in your system, you don't have to worry about access paths anymore, right? If you try to map out in your head every access path, every place that an attacker could pivot, in any kind of modern infrastructure, it's just going to completely blow out. It's not possible. And I think a fantastic example of this is our CEO shared recently that he found out that he still had admin access at some startup that he was at like 12 or 15 years ago. And this was, I think, a company that was pretty good at this kind of stuff, and it just missed this access path, right? You can't enumerate everything anymore. And so to handle this now, you have to have these revocable identities. And you can automate all of this, right? You don't have to go around and manually rotate these things. You can have a system where it just goes out there.
Teleport’s approach and demo
Dave: So what are we doing? We're establishing trustworthy computing. We're getting rid of over-privileged services by having that authorization capability. You're reducing your overhead, you're improving your time-to-market because if you don't have to go manually manage and set up all these things — and I'd say beyond the manual management, I was just talking with someone earlier today where they have an entire system where teams can requisition an AWS account. The account gets set up for them, the credentials to manage that account get injected directly into a CI system, but that CI credential is static and long-lived. It has to be rotated. Just maintaining all the systems to do that takes time. You have interoperability across infrastructure, which I'll talk about in a second here. But essentially, a lot of these standards have been adopted by all the major clouds, and you have all these other benefits. You can audit it. With that interoperability, you can build integrations. Everything's on open standards, right, and we get rid of secrets in these environments.
Teleport Machine & Workload Identity
Dave: So now I'm going to dive a little bit more into what that physically and operationally looks like within Teleport. And in case there are people on the call who have no familiarity with Teleport, I'm just going to do a couple of quick slides that go over our overall architecture, and then I'll really dive into the machine and workload use cases here. So within Teleport, you kind of have a couple of concepts. You have things that you want to protect: cloud consoles, databases, Kubernetes clusters, applications. There's a ton of things you can protect. We don't even have all the cards for every single thing in the product right now. On the left side, you have users, humans and machines, that need to access those resources. So you have a CI system that needs to push new infrastructure into a cloud or write things into a database, write logs. You have humans that need to get into an SSH server or they need to do analytics in a database or reach an application, etc.
Solutions for all NHI needs
Dave: And then in the middle, there's value-add stuff here where all of that goes through encrypted tunnels. All of this is based on those short-lived X.509 certificate identities. And then because of that, we can audit things, we can provide access policy, role-based access control, all of that. So the core thing here is resources to protect humans and machines that want to access them. And we're now going to dive into the machines and the resources. And we feel, at this point, that we really provide a solution for all of those machine use cases. So we sort of have two branches of this. One is zero trust access for machines that goes through that tunnel system that I was kind of showing on the last slide, that I'm going to dive into more on a slide or two from now. It has the authorization. It has full audit logging. But we ran into the situation where people had machine use cases that they needed more flexibility for, right?
Dave: It was trying to access something that you can't protect with Teleport, or maybe it just needed to do something like write some logs into an S3 bucket. Or you don't want to run every single process call in your entire infrastructure through Teleport. You don't want every single microservice making a call to the next microservice running through our proxy, but you do want to have those really secure identities for those things. So we now have these more flexible identities for machines that are based on SPIFFE, which I'll dive into in a little bit, which is open standard. And that kind of — we step back, and we don't provide all the proxying and the authorization, but we just become the identity issuer. And now you can make direct connections, and it enables a lot of new use cases because these major clouds have adopted that standard. AWS, which is an example we'll go through in the demo — they have a new thing called Roles Anywhere where you can trade one of these certificate identities for a role in AWS without having to keep a key pair somewhere to validate that role or have a user for it.
About SPIFFE
Dave: So a bit more detail on SPIFFE. I'm not going to super deep-dive into this, but it is the secure production identity framework for everyone. It is a CNCF graduated project, and it is the specification for this workload identity standard. They have a reference implementation of doing it as well called SPIRE, but SPIFFE is the thing that everybody needs to meet so that we can have this interoperability. And it's got really broad adoption: the cloud providers, as I mentioned before; there are client libraries for code that you can bring in to very quickly and easily adopt this into services and microservices; there are proxies that are out in the community that now adopt this. So if you have really old code or a mainframe or something, you can still benefit from this. And the structure of these things is — the really top level of this specification is that you have a SPIFFE ID, which gives you the standard ID, a way of encoding a workload's identity into a URI. So kind of referencing what I was mentioning before, it has a SPIFFE prefix.
Dave: This is the domain of the thing that is issuing it. And then after that, everything in the URI that's out of the URL component is really something that you get to structure and match to your domains and namespaces. And then the other core thing that is produced is something called a SPIFFE Verified Identity Document. Usually, this is just an X.509 certificate. It just has a specific structure within it. And one of the things that is within it is that SPIFFE ID so that someone receiving that certificate can validate, "Oh, yes, this is for this ID. It's this service. I should receive requests from this service and allow that." In addition to X.509, you can issue JWTs in this format. There are benefits and drawbacks to that that are beyond the scope of this webinar. So we're going to do another quick poll just mainly because this is very cutting edge, and there are a lot of people who are doing this, but a lot of people who are not.
Dave: And really quick, Zach asked the question of, he's got a hard stop at 11:30, and we are going to provide a recording of this, yes. So I'm going to stop sharing my screen and we'll do the next poll.
[silence]
Functional architecture of Teleport
Dave: So we've got two folks using it in production, which is really cool. Someone evaluating it. People who are already considering it and planning to use it in the future. Okay. Great. So I will definitely spend some time on the SPIFFE part of the demo here just to show everyone, especially for the folks who are planning to use it in the future, but maybe haven't seen it in action or gone deep on it. Okay. I'll just give this five more seconds and then we'll go. Oh, no, we'll go. Okay. So the actual functional architecture of Teleport, and this is how we'll kind of get down into the Machine ID case, right? This is at the human level. You have someone, they sign in, they get a certificate identity from this auth service. There is a local proxy that gets set up, and then all of their requests and access to infrastructure goes through this proxy, and the proxy reaches out to or is reached out to by these agents that run in front of your infrastructure.
Dave: You have a database agent that could connect to one or multiple databases. If it's an SSH server, the agent runs on the server to provide the SSH engine, and those agents reach out through a reverse tunnel to the Teleport Proxy. And the reason that's important is it allows you to completely close those resources off from any kind of public access. You can close port 22. You can completely close the inbound rules in the security group on that subnet. It is locked down, right, and then you reach in with the certificate identity from a person. For Machine ID, that looks very similar. In this case, we have a different binary. It's called `tbot`. It runs on the machine next to whatever process you're running. That process can use outputs from `tbot`. It can use database credentials. It can use SSH configurations, etc.
Dave: Those requests go through the proxy service, and then those are proxied to whatever resource you're accessing. Although, again, that resource on the back end is connecting through a reverse tunnel, it's completely closed off. And one of the things that we found, I'll just expand on this a bit more, is that we have a lot of people using Teleport for their humans, and that's great. It gives them all the authorization. But then they've got some CI system that's reaching out to the same thing, and so they have to leave port 22 open. They have to have another set of credentials for their database. And so that's a problem because it kills the whole promise of this architecture, right, which is that you can completely close these things off. And so one of the big reasons we started in on Machine ID systems is because we want, again, there to be an identity for every single thing in your system so that you have this completely closed network and you have that true zero trust.
Dave: For workload identity and that SPIFFE compatible piece, the architecture is a little bit different in that, again, the Auth Server is issuing certificate — or the `tbot` binary is making certificate signing requests out to the Auth Server. It's getting these certificates, and it provides them to either the workload or whatever process needs to use it. But from there, we're not going through the Teleport Proxy. Again, at this point now, they can use these certificates to communicate directly with each other across clouds, potentially, or out to third-party cloud providers. And there are some people doing this in a very impressive way. I was speaking with someone recently who they run in AWS, Azure, Google, and on-prem, and they have true workload identity running for every single process and service that they have. And the way they do that is they provision Azure service account files, google-service.json’s, and AWS key pairs, and they distribute all three of those to all four environments for every service. And that is amazing.
Dave: And I would love to give them all that time back by just issuing this one certificate that can be used in all four places. So real quick, I just want to look at this question to see if I can address it now. Being short-lived credentials, SVIDs need renewal, will you cover how you protect the secrets used to refresh the SVIDs? Scenario where the refresh tokens are compromised. Yeah. So I'll just talk about that really briefly, which is that there are no secrets used to refresh the SVIDs. Again, our goal is to remove all secrets from your system. So the way that the SVIDs are refreshed are using that metadata, using that attestation about the machines and the workloads. We'll get into a very specific example in the demo, but that's the point, is that there's no tokens, right? Because otherwise, there'd be another turtle. And then how do you protect access to that token? So it's all based on that metadata. It's all based on that attestation, some other verifiable thing about the workload that allows us to provide that document.
Dave: There is a very niche case for us in particular where you can join in a machine using a token, but it should be short-lived. It should be 5, 10, 15 minutes, the maximum time it can be used to provision some infrastructure. And after that token is used once, the bot running on that instance receives a certificate identity. It then uses to renew, and you have a lot of control over how that renewal is allowed. But no secrets is the goal here. So then the whole feature set really looks like this. You've got a Teleport control plane. It is talking to `tbot` in either case. But on one side, you can provide things out to some sort of client or SDK. It goes through the proxy and then reaches out to these resources, or it just receives that workload identity, and then that can be used wherever a workload identity can be used, which is lots of places. So let's look at this in action now.
Demo
Eddie: Hey, Dave, just before you start on the demo, I wanted to remind everyone, if you do have questions, please enter them into the Q&A tab, and then we'll do our best to address those at the end.
Dave: Yeah. Thank you. Okay. So I am in Teleport's desktop application called Connect. There are lots of different ways you can do this. You can do this from the CLI. You can do it in a web portal. Just for the simplicity of showing this user specifically, I'm coming from our desktop app called Connect. And I'm actually going to do the human thing here where I'm going to reach out to this VM. And within this VM, then I'm the Ubuntu user and I have Ansible installed on this machine. So again, this is a use case, I think, you'd normally see or more commonly see now in some kind of automated [inaudible] job where Ansible's running, or you'd see it in a CI system running infrastructure-as-code, but just to kind of be able to get in and run it manually here. And this is running on a machine. I think this is a part I normally skip, but I'm going to go over it very specifically since that question came up of how this all runs.
Dave: So I have this — this annoys me to no end that we call these “join tokens” because it's really not a token, but it's basically a definition of how something is allowed to reach out to Teleport and validate itself. So in this case, this is an AWS IAM join token, and it is specifically for a bot called `dev-ansible`. And it has to come from this account ID, and the EC2 instance running that account actually has to be — you can wildcard this in certain places, but it actually doesn't even have to come from this. It has to be this specific instance. It is this instance ID that is the only thing allowed to run this bot. And so this is how we're attesting to this specific runner — is using this metadata. There is no token on that machine, but we know we can validate with AWS that it's this machine, and so that's the one that is allowed to run.
Dave: So with that bit of context, on this machine, I am running that bot service, and it is set to renew this identity every 20 minutes that it's outputting. And it is putting that out to this directory, and this is something that's in a config file. The key thing here is that it is running an agent. This is our highly scaled version of our SSH provider. And so it can also just put out a straight SSH config that tells Ansible to use a specific binary that runs through our proxy, but in this case, it's using a binary that's going to run through a multiplex system. But there's an SSH config here that says, "Use this binary, run it through Teleport or through the `tbot`." And then if I look at Ansible, Ansible is set up to use that SSH config and go through Teleport.
Dave: So now I'm going to run — I can type. I'm going to run Ansible, and it's going to go out, and it's going to make these SSH connections through Teleport. And again, the resources it's reaching out to connecting through reverse tunnel, they have their own certificate identities. We now have this very well attested at the machine level certificate identity, again, through the metadata about the machine itself. That bot has a role that allows it to access certain things. So let me find that one. So in this case, the only thing that this bot is allowed to access are these dev servers that I just had it run against. So you can control, again, that role-based authorization component through Teleport as well because it's running through the proxy.
Dave: And so that instance, like the bot running there allowed to reach dev resources. And then the really interesting thing here is that because everything's running through the proxy, we also have this really comprehensive audit log around everything that has happened. So we can see when that `bot-dev-ansible` is creating sessions, joining them, running commands. We get this very detailed data about this. All of this can be shipped out to your SIEM so that you can collate it with the rest of your security logs and run analysis on it. We're recording these sessions. If we're seeing command, we can see the exact command that was run on that system, and there's just a whole lot of really juicy, valuable stuff within this audit log that you can run things on.
Dave: So that's kind of the Machine ID side. I see we're starting to run low on time, and I want to leave time to do Q&A. So the next kind of piece of this is that Workload ID piece. Let me come back here, and I'm going to show an example of workload identity in that sort of cross-cloud scenario. And again, this can be used for direct workload-to-workload communication, but what we're seeing is people are generally finding the most utility in that multicloud, hybrid cloud use case. You want to get rid of the key pairs in all your CI/CD systems, but you still need to have access into these clouds or across them. So the way this functions, and this is very similar in all the clouds, is that you would take the public CA that Teleport generates for its SPIFFE CA and bring it in and put it into AWS as a trust anchor within IAM. And then you create this profile for workload identity, and you basically say, "This role belongs to this profile." I'm not going to get super deep into this.
Dave: The last resource you create is the actual role that you want something to be able to get in as. So in this case, the permissions on this role are S3 access, but it now has this trust relationship tab. And in the trust relationship, we say, if something comes in with this SPIFFE ID, so kind of jumping back to that URI concept earlier, mine is the domain of my Teleport cluster, and then again, this part is very open. I've made a very simple example here of just an S3 writer service, right? And if a certificate with this SPIFFE ID comes in and it has been issued by this trust anchor that we've already registered with AWS, then it should be granted this role that allows it to do S3 access. So if I come back to my dev user here — and let me just get a terminal, here we go, within this — then I can run — and again, this is something that in an automated scenario would be run.
Dave: You'd run the `tbot` binary. It would be something very similar here where it would output a short-lived certificate. You can run it in one shot. Thanks, Lexi, for that advice. But I'm going to do it as the Teleport user binary here, `tsh`, and I'm going to ask it to output some workload identity certificates specifically in X.509. I want it to be this workload identity which maps into that SPIFFE ID we're just looking at. I want it to only be valid for the next 5 minutes. The total max time here can be configured, so you can say, "No one can issue this one for longer than 10 minutes," or whatever. And the general advice here would be, in a CI system, that if you have a CI job that takes 10 minutes, make the certificate for 12, right? And at that point, even if that certificate is exfiltrated, it's not good for very long. They've not really exfiltrated something super valuable that they can use to pivot effectively. So I'm going to issue those certificates.
Dave: And then I have this AWS profile. And the profile is very long, so I'm just going to kind of skim over it, but getting the core concepts here is that I'm using this AWS binary called `aws_signing_helper`. I am telling it to use the credential process. I want it to use this public certificate that I just issued in that last command. I'm also telling it where the private key is. And then to jump to the end here, I'm saying map to that role that we were just looking at in the AWS console, that profile, and that trust anchor. So it's going to take all that and ship it to AWS. And if I then run AWS CLI, use that profile, and then list my S3 buckets — I have not logged into AWS, it's just taking those keys, sending them off, and here I have my list of S3 buckets. And so you can take that and kind of extrapolate it out, right?
Dave: If I have a CDI system that is shipping logs to S3 at the end of every run, if I have a Terraform system that needs to directly provision resources into AWS, it's going to be the same functionality. We set this up as the profile credential, we trade out for that role, and then that automated process can do all of that. But there is no key pair, there is no long-lived secret. And in this case, it's not even needing to renew, right? In a CI use case, you're just doing it as a one-shot, time-based system. And so you don't even have to worry about that piece. But in a renewal situation and a long-lived thing where it's providing a certificate for a workload, there, `tbot` just runs constantly. `tbot` is constantly checking back in with the Teleport cluster and reauthenticating itself as the machine, essentially, as the node. And then every time there's a request for that identity from the process, from the workload, it is revalidating that workload metadata to ensure, yes, I'm allowed to provide this identity to that process.
Dave: For that side, we don't have as comprehensive of the audit logging and all the commands that are run because it's not running through the proxy. It's a direct connection. What we do have is —
[silence]
Dave: Should be pretty close to the top. Yeah, here. So we have records of when these SVID certificates are issued, who they were issued to, and just all of the metadata about that specific certificate. So this can all still be shipped to your SIEM. You can still get — you can still really track and audit the issuance and locations and patterns of these certificate issuances. We just don't have all of the logs coming out of the service-to-service communication. So that is the overview. And real quick, I'm just going to talk over some best practices, especially for those of you who responded that you are thinking of, or you're looking at adopting these. We've done this with some very large organizations. I'm not going to spend a ton of time on this. You can come back and read it on the recording, but the main thing is starting with a small scope.
Recommended best practices
Dave: One of the reasons we see people using the cross or hybrid cloud use case first is that it's very easy to be like, "All right. We're not going to try to roll this out to every single service in our system." And so it's a very tightly scoped place to do this, right? We can just say, "Hey, let's just take these production CI things and secure them. That's the most important." And again, jumping back, if you're using Teleport today for humans, you should be using it for machines too. That's really how you're going to get the most benefit out of the security model, especially if they're accessing the same things that machines are accessing. So with that, I'll pass it back to Eddie. Thank you very much.
Conclusion and Q&A
Eddie: Thanks, Dave. So just to close up before we switch over to Q&A, we have a couple of resources that you can download today. One's a data sheet on Teleport Machine & Workload Identity, and the other is a solution brief on how it works. And next slide, please. The technical part we talked about was some of the high-level business considerations that I addressed. Teleport Machine & Workload Identity helps decrease business risk. It helps lower your costs. One of the bigger areas is just the time it takes to manually rotate secrets and then, obviously, to pass audits. And with that, I'm going to mention that we're going to show a survey, and we really appreciate you taking a few minutes to answer the survey questions while we get to some of your Q&A questions. So with that, if we can pop up the survey. Then, Dave, I'm going to read through the questions that have come in so far. First one is, I use GitLab, CI, and Azure DevOps. Can I use Teleport Machine Identities with both of those?
Dave: Yeah. That's a great question. So we do have native support for GitLab CI. And again, when I talk about this native support, this is the support for grabbing that metadata that we use to validate the process, the machine. And so with GitLab, we can reach out to the OIDC, endpoint, get that metadata. Azure DevOps — we do not have direct native support for at the moment. They have a much more complex system of authenticating. But it is on our roadmap. And if it's a really critical use case for you, I'd love for you to reach out just because the more critical use cases we know about, the faster I can push it up the roadmap. But yeah. So it's definitely one of those things where we build to people's most immediate needs. And so even if there's some other system that you're using that I haven't mentioned, then we probably can build support for it.
Eddie: Okay. Great. Next question is, organization — actually, I need to paraphrase this a little bit. TLS certificates for apps are owned by our PKI team. So I guess this is a larger organization. How does Teleport work with that PKI?
Dave: Yeah. Thank you. That's been a really interesting space for us now because we've primarily been adopted by infrastructure teams, by platform and DevOps teams. And so for a bit of a deep dive on this, the way that a Teleport certificate infrastructure works is that Teleport Auth Server is the root CA for all of these different services. And we run different CAs for each one for both compatibility, for blast radius reasons, right? So there's a database service CA, there's a user CA, there's a SPIFFE CA. And so within that, it's a very self-contained system, and we haven't really gotten a lot of pushback from people. But as we've entered the workload identity space and we're now issuing certificates that can be used in an entire organization system across applications, IAC, all of this stuff, we found that the security teams have gotten more involved, and the security teams are going, "Hey, wait —" or in your organization, whoever runs PKI, right, and going, "Actually, we have a whole system for this already, and it needs to fit within our system." And so a thing that we have very recently shipped is the ability for the SPIFFE CA to actually work as an intermediate CA within your larger PKI system so that any of the compliance and just operational requirements you have from the larger security and compliance organization — we can fit in with and their concerns can be met.
Eddie: Great. Thank you. Okay. There was a question, and I think I can answer this one, Dave. Is there any intention to pursue FedRAMP? We already support FedRAMP. We have customers that have achieved FedRAMP compliance using Teleport. It's a great question. Another question, and actually, I'm going to combine these just for the sake of time. The question is, just like secrets and password keys, you have to rotate your X.509 certificates. So we already use a tool that rotates PKI information. How is what Teleport's offering better?
Dave: Yeah. I mean, so you may already have a robust certificate rotation system. If so, good. My compliments. I think on the machine side, that machine zero trust access side, the advantages of Teleport specifically are getting that reverse tunnel system, that secure connection. I think also the ability to just — I think our rotation system is easier to use. You basically just set a TTL on the configuration on the agent running there, and it does it on a regular basis. If there's a disconnection and it needs to reauthenticate, it has very robust methods for doing that. I could kind of go deep on that, but we've got five minutes. So reach out to me afterwards if you'd like, and I'd be happy to talk to you about it more. There was another component of what you asked.
Eddie: So one of the things I was going to add to that, Dave, is that a lot of older solutions or different solutions are focused more on IT use cases. And one of the things that sets Teleport apart is that we were designed to support specific infrastructure use cases. So in many ways, we've optimized around those use cases. And then there —
Dave: Yeah. For sure.
Eddie: Yeah. And then just one —
Dave: And I'll piggyback on that and just say, yes, I think if all you're doing is SSH, those other systems probably work very well. If you need to access cloud consoles and 17 different kinds of databases and applications and everything else, then, yeah, there's a lot of other functionality we have.
Eddie: Okay. Great. And I think that is the basic question.
Dave: There's one more here from Mario about AI trends. So yeah, I think there's a lot of — there's a lot of discussion right now around AI agents and how they access things. And I think that thinking about identities for those is something that we're doing. And I'm sure we're going to have another webinar soon on what our thinking is on that.
Eddie: All right. And then there was a chat comment that came in, and I don't know if it came in before I answered the question, but I just wanted to reiterate that we do support FedRAMP now.
Dave: Yeah. So, okay, I want to clarify this because I think maybe part of the confusion is coming from this. So Teleport offers a SaaS. We offer a hosted version of it, but we also offer self-hosted and even air-gapped versions of Teleport that you would run. And so Teleport, ourselves, are not FedRAMP. We don't run in AWS GovCloud. If you are trying to achieve FedRAMP, we can absolutely enable that, but you would run the self-hosted or air-gapped versions, and we have a lot of experience helping customers reach FedRAMP by running the self-hosted version. And Eric said he's understood, so awesome. Glad I could help clear that up.
Eddie: Okay. We're at the end of the time. I wanted to thank everyone for your attention today, and I hope you got something valuable out of this. If you have any questions, please reach out to us. And as we mentioned earlier, within about 24 hours, we'll be sending out a link to the recording. So with that, we're going to close off the webinar, and thanks, Dave, for your great demo.
Dave: Yeah. Last thing would be if you've got direct questions for me, just join our community Slack. I'm in there, easy to find, and happy to just get direct questions as well. Thanks. Thanks for watching.
Eddie: Yeah. Bye.
Join The Teleport Community
