Teleport Workload Identity with SPIFFE: Achieving Zero Trust in Modern Infrastructure
Teleport Workload Identity with SPIFFE: Achieving Zero Trust in Modern Infrastructure
Teleport Access Platform generates cryptographic identity for users, machines, devices, and resources, creating a single source of truth for what users and machines are accessing what in your modern infrastructure. Now, engineers will be able to generate identity specific to workloads and services, enabling your full modern infrastructure stack to operate with zero trust authentication.
With Teleport Workload Identity, we are implementing the SPIFFE standard from the Cloud Native Computing Foundation. Teleport can now issue a SPIFFE ID for each of your workloads so they have an identity within Teleport in the same way that users and machines do. They can use this ID to authenticate to each other using mTLS, so only the right workloads can access the workloads and services that they are authorized for.
Join Noah Stride, Software Engineer at Teleport, and Dave Sudia, Product Engineer at Teleport, as we:
- Introduce Workload Identity and why you will want to implement it in your system
- Give an overview of the SPIFFE standard and why implementations are rapidly expanding
- Demonstrate Teleport Workload Identity in action
- Show what else is coming for Workload Identity
This session is appropriate for anyone interested in learning about Workload Identity, as we will cover the topic from the basics up. Join us and learn more about this new frontier of cloud security!
Key topics on Teleport Workload Identity with SPIFFE: Achieving Zero Trust in Modern Infrastructure
- Teleport Workload Identity provides cryptographic identities for applications, enabling zero-trust authentication between workloads without shared secrets. It implements the SPIFFE (Secure Production Identity Framework for Everyone) standard from the Cloud Native Computing Foundation.
- SPIFFE IDs are standardized URIs that encode a workload's identity, including the trust domain and a flexible workload identifier.
- Secure Verifiable Identity Documents (SVIDs), typically X.509 certificates or JWT tokens, prove the workload's identity and are issued by the trust domain's certificate authority.
- Workloads retrieve SVIDs and trust bundles from a local SPIFFE agent via a Unix domain socket, using attestation methods like Unix user IDs and groups.
- Teleport's Workload Identity avoids managing long-lived secrets, enables mutual TLS, and provides identities for authorization policies. It integrates with SPIFFE-compatible platforms like AWS Roles Anywhere and Envoy Proxy.
- Teleport's implementation is open-source, audits issuance of SVIDs, and can authenticate across cloud providers.
Expanding your knowledge on Teleport Workload Identity with SPIFFE: Achieving Zero Trust in Modern Infrastructure
- Teleport Kubernetes Access Guide
- SAML Identity Provider Access
Transcript
Introduction
Dave: All right. Well, we'll start with a good solid grounding in Teleport since there are so many brand new folks. So this is Teleport Workload ID with SPIFFE: Achieving Zero Trust in Modern Infrastructure. And I'm Dave Sudia. I'm a senior product engineer here. And I kind of sit between the very product manager — product folks — and the engineers like Noah here. And I kind of represent the user and try things out first, and do demos and stuff like this. And with me is Noah Stride.
Noah: Hey, yeah, I mean, as David said, I'm Noah. And I'm the team leader of all things kind of Machine ID and Workload ID over here at Teleport. I've been here for around two years, basically as long as we've had a Machine ID feature on the platform.
Overview of Teleport
Dave: Cool. So just a quick overview of today. We're going to do an overview of Teleport, especially since there's some new folks here. We're going to talk about what Workload Identity is and what SPIFFE is. And I'll kind of show you a demo of this new feature in the Teleport platform, talk about the future a little bit. And then we'll have some time for Q&A. So the Teleport platform is really broken into three major pieces. There's the access component, the identity component, and the policy component. And access here — both Machine ID that Noah was just mentioning and Workload Identity — really fall under that access component of the platform. And it's about how do we get — all of it is really about how do we get people and machines to be able to talk to other machines in a completely secure way — in a way that also isn't frustrating for developers where the security gets in the way of productivity. So Machine ID kind of lives over here on the side — on the left side where we're talking about how do we get machines to talk to other resources that Teleport governs. And Workload Identity is a brand new feature that kind of sits towards the right side of this, but really is almost its own whole new thing that we'll kind of be going into. So I hate to give you two polls in a row, but I do want to kind of get a sense from everyone of where you're at with Workload Identity in that space as well. So let me stop sharing here so we can push that second poll.
[silence]
What is Workload Identity?
Dave: All right. I've got some folks who could use it with another SPIFFE- compatible service already inside of Kubernetes, direct app to app. Quite a few folks who just need to learn more about it, and that's what today is all about. All right, we'll give about five more seconds on this. But for those of you who need to learn a bit more, you're in the right place. And for those of you who want to see what it can do right now, you're also in the right place. Let me go back to sharing my screen. Is it going to come up? Hold on. There we go. Okay. Cool. So let's talk about what Workload Identity is. At the core, using Workload Identity is really about hardening your infrastructure security. So think about the old way of doing this or the current way in most places. Requests are authenticated one way. You're sending in an API token. You're sending in some kind of key or a JSON web token, something. And you could almost think about it as being two-way in the sense that maybe the thing calling the upstream service is getting a TLS certificate back, and you're confirming a domain. But that's really all you're confirming — is the domain call. Right. And so with all of those things, you need secrets. You need secrets generally stored in the environment somewhere, maybe as an environment variable, maybe as a Kubernetes secret. Maybe those secrets are stored in a manager, but you still need — the irony of this, right, is that you need a secret to get to your secret. You have to authenticate to vault to say I should be able to access these secrets.
Hardening infrastructure security
Dave: And so then you still need some way to manage that secret. Maybe that's the one that's sitting in the environment. And the thing here is “Turtles all the way down”. And being the quote — and I've got a link here to the SPIFFE book which is excellent, called The Bottom Turtle, which we'll talk about in a second. And for those of you who want to go deeper into this slide and the next slide in the content, I really highly recommend that book. We're going to pull a lot of content from that book here today. It's excellent, and it's a much deeper reference than we can cover in a webinar. But the core problem we're trying to solve, right, is not having shared secrets that are long-lived everywhere. And you kind of have to have that in this current scenario. So what's the more secure way? Moving toward a zero trust environment where requests are authenticated two-way every time every application receives a call or sends a call and is saying, "Did I send this the right place? Am I receiving it from someone that I accept calls from?" You have mutual TLS. But more importantly, authentication is based on identity. And identity is obtained without another secret.
Dave: You have to get down to the bottom turtle in the stack, saying there is some way for me to say, "I am this app, and I need these credentials without providing some kind of secret." So let's break down the vocabulary of that. I'm just going to pull up the Chat as well so I can see if anyone — if there's any clarifications I can make while I go here. But we're going to do Q&A at the end. So a Workload — long or short-lived application that interacts with other applications through API, calls, network calls, socket calls, RPCs, right, something that is sending off requests or receiving them. Identity — it's who your application is. Right. And the metaphor is that I'm me. I'm Dave. I am a being that has this inherent identity. And if I was talking about an application, it might be the cart service in the EU region or some other important set of aspects about this application. But if I want to get on a plane, then I need to provide some kind of identity document. I have to provide my driver's license or my passport. And to get that, I provide information about myself to an authority that says, "Yeah. Okay. You are Dave. And we're going to grant you this passport." And then I use that passport to verify and authenticate myself with people or systems. Right. And so that is an identity document. And we're going to break these down even more in an upcoming section here. But the core aspects you want are Workloads have an identity, just inherently. Right. But we need to be — we need to have some way to prove that.
What is Workload Identity used for?
Dave: What is Workload Identity used for? Well, the main thing we'd like it to be used for is replacing API tokens, long-lived certificates, other manually managed and long-lived forms of app-to-app authentication. We want to get those out of our environment, out of your environments. They're really a core way of having breaches be caused by people getting access to those credentials. We want to implement mutual TLS between applications, and we want to provide identities that can be used for authorization. So again, an identity for an application goes beyond a key. Right. I have this key. And any bearer of this key is allowed to take these actions to, "I am the cart service in the EU region. And I'm going to maybe not accept requests from a payment service or a shopping service in the US region." Right. Or "I am an application in a given tenant. And I want to make sure that all requests only come from other applications within this tenant," so we eliminate any possibility of cross-tenant requests. So it's a really powerful thing to have a true identity for an application rather than just some kind of identifying secret.
Dave: What's the value of it? Well, the folks at SPIFFE have kind of done some math and said that's a — they've found that's about approximately three developer hours saved per software component requiring a secret. Having been a platform engineer, I would kind of think that that's a little low even. Right. When I think about every time we launch something that had to have a secret, well, it's not just about the production environment — it's the dev environment and the staging environment and maybe someone's personal dev environment that's not the collective one. And you have to have all this lined up and make sure that you didn't accidentally put the production one in staging. And there's a lot of time in this. And if you add all that up, multiply it by the compensation value of your engineers — it's a lot of value. It makes it easier to manage a zero trust environment, and it enables a more seamless creation of one. If developers are not having to manage all of these secrets, if these secrets are provided via a piece of identity, then you don't have to — that reduced setup time is just about ease of use for developers and for platform folks as well. Workloads become secure by default. You don't need to think about it because everything is just using it. You have to use your compliance certification, and you just got a stronger base for policy enforcement because of those identities. And from us, it's part of the Teleport platform.
Dave: So if you're already running Teleport, which many people in the poll were, there's no extra infrastructure to manage or maintain. It's just a feature of tbot which we'll get to in a moment for those of you who haven't used it. But that's the basis of Machine ID. And it's just there. You can turn it on and start using it. So on that note, how is it different from Machine ID? So for people who aren't familiar with Machine ID, Machine ID in Teleport is how you give machines access to resources. So Teleport started with, "I want this human to be able to access this database with an authenticated identity." And then we went to, "Well, I want my CI system to be able to access that database with an authenticated identity." And that's what Machine ID does. It allows you to take some kind of process running on a machine and access a resource. And it runs through the Teleport proxy which is the service that provides the audit log and all of the other great things that we do when we make connections for you. So it's part of the same — or what you get with it is more of the puzzle. You have authorization and authentication because you can control the connections coming through from our role system. You get auditing. You get lockout. You get the fine-grained access control we provide.
How is Workload ID different from Teleport Machine ID?
Dave: Workload Identity and Workload ID is a little different. And the primary way it's different is it's the first thing we ship that doesn't go through the Teleport proxy service. So what we do is we provide an identity to your applications that you can use with any SPIFFE-compatible endpoint. That is probably another thing that you're getting a SPIFFE ID from us from Teleport, but it could be AWS roles anywhere, any other thing that's compatible. So it's a little more — a bit more like Legos. We're giving you the building blocks. You can build this system. The trick is, since it doesn't go through the proxy service, you don't get any of those add-ons, those built-in add-ons that I've discussed above. But there's also less stress on the proxy service. There's no additional latency. Your applications are still communicating directly with each other which is critical if you're going to implement this at scale in your entire organization. So that's really the critical piece here. And with that, to get more into what SPIFFE is, for those of you who aren't familiar, I'm going to pass it to Noah.
What is SPIFFE?
Noah: Okay. Great. So SPIFFE, you've probably heard of it before, especially if you're kind of looking into this Workload Identity space. Or indeed if you've been paying attention to the first half of this presentation, you'll have heard Dave say it quite a lot. SPIFFE is the standard that we have built Teleport's Workload Identity Platform on top of. It stands for the Secure Production Identity Framework For Everyone. Try saying that twice fast. It's a set of open-source specifications that set out rules and ways of identifying software systems securely. It covers a bunch of different topics. But the ones most interesting to us for the purpose of this presentation is it tells you how to structure an identity or how to encode this identity into a verifiable document, and the rules and steps you should take when verifying these verifiable documents to make sure they're legitimate. It specifies how Workloads should retrieve this identity and the information they need to verify another Workload's identity. So at the very core of the SPIFFE specification is the rules around what is an identity. The SPIFFE ID is a standardized way of encoding a Workload's Identity into a string. And it takes the format of a URI. The scheme element of the URI is always SPIFFE, indicating that we're talking about a SPIFFE ID. The host section is perhaps different to what you'd be familiar with. Rather than being a valid DNS name, it in fact identifies the trust domain that this identity belongs to. We'll cover those a little bit more later. But basically, all Workloads within a trust domain can verify one another. Typically, this is going to be your organization. Or if you're a particularly large enterprise, you might have multiple trust domains within your organization.
Noah: Finally, the path section of the URI is known as the Workload identifier. And this identifies that individual Workload or group of Workloads within the trust domain uniquely. This part is what's really cool about SPIFFE — is they've left this really flexible. It's up to the trust's main operator to determine how they want to structure this. If you're a business with regional compliance needs, you may well find it's useful to put the region — in this case, EU — directly into the Workload's identity. And it allows you to create rules that say, "Hey, I don't want to send sensitive personal information outside of the EU." Now, it's all well and good having this SPIFFE ID URI, but if a Workload connects to other Workload and just said, "Hey, I'm this Workload," there's not really much reason for it to trust it. We need some sort of verifiable identity document, as Dave said earlier, about passports, so that Workloads can trust they're legitimately talking to the Workload they think they're talking to. And this is where the SVID comes in — the Secure Verifiable Identity Document. Typically, this is some kind of cryptographic document that is then signed by the root of the trust domain. And this document contains the Workload SPIFFE ID.
Secure Verifiable Identity Document (SVID)
Noah: The SPIFFE specification sets out multiple different formats of SVID. The first and the most popular is the X.509 SVID. Here, the SPIFFE ID is encoded into a URI SAN inside of an X.509 certificate which is then signed by the CA of the trust domain. X.509 SVIDs are pretty useful because they can be used as part of mTLS where both the client and the server present their SVID. This is super useful because, normally, the server needs to verify that, "Hey, this client is allowed to do something." But in the case with sensitive information, you want to try and prevent these sort of man-in-the-middle attacks. What if your server has been replaced by some malicious server, and now your client is sending confidential information to it. This is where mTLS steps in — it’s pretty great because the client also checks that the server it's talking to is the right server. Now, X.509 SVIDs are probably the most popular and kind of most common type out there. But they don't work if there's some kind of TLS terminating load balancer between those two Workloads because that interferes with the kind of direct TLS connection between both components.
Noah: This is where JWT-SVIDs come in. Here, the SPIFFE ID is encoded into a short-lived JSON web token which is then also signed by the CA of the trust domain. The SPIFFE specification doesn't set out rules about how this JWT should be presented, but you can do the normal thing here. Right. You can just present it in an authorization header like you would do with any kind of bearer token. These are great for those situations where X.509 SVIDs can't be used, but they have a couple of issues. For example, if a bad actor was able to get access to one of these JWT-SVIDs, they would be able to use it, usually in a replay attack, for as long as that token lives, which is why they're usually short-lived. Right. They're going to be like 5, 10 minutes old because our Workloads can always use SPIFFE to get a fresh one when they need it. They're also a little bit more awkward when we're talking about that mutual trust. There's a lot of accepted practices around how a client presents a token to a server, but not a lot about how a server would present a token to the client.
Noah: Finally, it's hard to talk about SVIDs without talking about trust bundles. Trust bundle is a term that refers to the information that is needed by a Workload to verify an SVID that's been presented to it by another Workload. Realistically, this is usually actually just the certificate of the trust domain certificate authority because, using that, you can verify anything that's been signed by that certificate authority such as the X.509 SVIDs or the JWTs. The only time this is slightly different is when you have federation between trust domains. When you federate between two trust domains, you're effectively saying that Workloads in one trust domain should be able to verify the identity of Workloads in another trust domain. In this case, the trust bundle will also include the CAs from the other trust domains that your trust domain trusts. This is really useful in those situations where, "Hey, your organization may have multiple trust domains because you're so large, but one team in one half of the organization's Workload needs to talk to another team's Workload in a completely different part of the organization." Or there's really no reason why this couldn't extend between organizations. In the modern day, there's plenty of cases where a Workload owned by one company needs to submit information to a Workload owned by a completely different company.
Noah: Finally, that brings us to the Workload API. We now have an identity document, and we have a structure for identities. But we haven't talked about how a Workload gets these. The SPIFFE specification sets out a standardized gRPC API. This is usually exposed by an agent that runs close to the Workload often on the same host and via a Unix domain socket. This avoids going over the network with perhaps a TCP socket. A number of off-the-shelf libraries exist for integrating your Workload with SPIFFE in languages such as Java, Go, and Python. But how does the agent know which Workloads are meant to get which SVIDs or SPIFFE IDs? We call this Workload attestation. What's great about this process is that, often, the Workload doesn't need to present a credential at all. We solve the bottom turtle problem. If the Workload had to present a credential to get its identity, then how would the Workload securely get that credential? We end up in the same place. Instead, the agent can use a process called Unix Workload attestation. Here, when the Workload connects to the agent, the agent can ask the Linux kernel for information about who is connected. And it will give it information such as the process ID or the user and groups it's associated with. The process ID can then be used to perform further lookups against the system, perhaps to find out which Kubernetes container this relates to. All this information can then be compared to a set of rules, and then the agent can decide whether or not to issue the SVID.
Noah: Now, I've put all of that into a diagram to kind of show how it all fits together. So here we have two Workloads. We have our bar service and our food service. Each of these are hosted on completely different hosts. These could be VMs or even physical machines or containers. The Workload, upon starting up, reaches out to the agent on the well-known location of the Unix domain socket and says, "Hey, I need my X.509-SVID and my trust bundle." The agent performs that Workload attestation process. It looks at the Workload, and it says, "Hey, you're running as this specific user on the system. I'm configured to, therefore, give you this SPIFFE ID." It then goes off to the certificate authority of the trust domain and says, "Hey, I've got this Workload talking to me. I need an X.509 certificate with that SPIFFE ID in it. Please give that to me." Once it receives that, it can then forward it back to the Workload along with the trust bundle. Over with our other service, the exact same process has taken place, but instead it's spoken to the agent running on the same host as it. When it finally comes time for Workload bar to talk to Workload food, it'll open an mTLS connection. The bar Workload will check that it's actually talking to the food Workload and vice-versa. This avoids us sending any sensitive information to the wrong party. Or in the case of a server, it can say, "Hey, I only want to allow the bar service to complete certain actions." As well as this authentication, another benefit of MTLS is that this is all encrypted over the wire. So no eavesdroppers will be able to see what's going on.
Noah: So why did we choose SPIFFE as the basis for Teleport Workload ID? There's a bunch of reasons for this. The first one is that SPIFFE is a graduate project of the CNCF. And this means it has a certain amount of maturity, and we can know that it's supported by enough organizations that it has a future ahead of it. As we've already seen, the SPIFFE specification is really flexible. And this means we're not locking ourselves down to only supporting a few little use cases. SPIFFE, over the past two or three years, has seen a massive growth in adoption along with the entire concept of Workload Identity. We can see this in the now widespread compatibility with a bunch of existing platforms and tools. For example, Ghostunnel, Envoy Proxy, or even AWS through the AWS Roles Anywhere system. If your platforms or tools don't support SPIFFE, there's a bunch of great SDKs and client libraries that allow you to directly integrate SPIFFE with your Workload. And finally, the X.509-SVID is probably the most common kind of identity document used by SPIFFE. And it's something that we use heavily at Teleport already for the rest of our platform. This gave us the confidence that we knew how to implement this securely and safely. With that said, Dave will now give you a demo.
Workload ID demo
Dave: Thank you very much. Yeah. So what I'm going to look at first here is I'm in a Teleport instance that we have for demos and stuff, and I've searched down through our resources into just my Workload ID demo backends. And one of the things that I have here is an app that is protected via Teleport app access. For those of you unfamiliar with Teleport app access, you can take HTTP or HTTPS applications that maybe don't have a strong login, or you just — I'm a platform engineer, so I tend to think of Grafana and Jaeger that just serve up a frontend. And you can actually put them behind this, and then use our role system to protect them. So I'm going to come into here. And basically, I'm taking Noah's diagram that he just shared a little bit ago and making it a bit more real. So this is a live view of a running system, and I'll kind of break down the system. And then we'll go look, and we'll break it as opposed to just breaking it down. Also, as you spoke, Noah, all I could think of is how many SVIDs could SPIFFE specify if SPIFFE could specify SVIDS? There's too many noises involved in this. [laughter] Too many SPIFFEs.
Dave: So we have two VMs here that are running in AWS. One of them is running a Node app that is actually serving up this page that we're looking at. And Node is one of the languages that currently doesn't have a great officially supported SDK for reaching out to the Workload API. So it is sending its requests to Ghostunnel, that Noah just mentioned, which is a proxy. It's SPIFFE compatible. It's really fast and lightweight. And it can be used in client or server mode. So here, I'm using it in client mode where the Node app is sending requests to it, and then it is forwarding requests to this backend down here. On the same VM as with NOAA's diagram, there is an instance of tbot running. And this tbot is configured to only issue SPIFFE IDs and SVIDs for this ID here. Our trust domain is the domain of our Teleport instance. And then the path here is Workload ID demo, demo web. And that is configured right here within a role. So you can make a role for a specific bot, a specific tbot instance that's running and say, "You're allowed to issue SVIDs for a given path. And here, I'm doing this in Terraform, so I can make them easier and template them and stuff. So the path is a little more abstracted. But basically, here's the roles that I'm creating.
Dave: And you can also say, within tbot’s configuration, you should only issue those to processes that are running with a certain user ID or group ID. And that is handled within tbot’s configuration file here. And just down here, we say, "All right. You're going to run a SPIFFE Workload API. You're going to listen on this socket. You're going to issue SVIDs on a given path. And for that path, it must run as this user ID and group ID." So on these machines, I've configured specific users for running these applications, Ghostunnel, the Node app, and the backend app. And tbot is configured along with that to listen for those certain user ID and group ID. And that allows you to really lock it down. My Node app and my Ghostunnel are actually running as different users. So even if someone got onto the box and modified my node code to stick in some kind of hacky, SPIFFE Workload API thing, they couldn't get a SVID and change my code, and then restart the process and run it. So those requests come down here to the backend. And this one's SPIFFE ID is `demo-backend-1`. It is authorized to only accept requests from `demo-web` in the same way that this is allowed to only accept — or send requests, and verify that it's sending the `demo-backend` 1 or 2. And this is running as user ID (UID) and group ID (GID) 3,000. So that's kind of the breakdown of the running system and how we get there, how it all gets configured.
Dave: But let's go break it just to show that it works. So I'm going to SSH, using Teleport, into my `demo-backend-1` server. And I'm just going to open up the service, and we can see that I'm passing in a backend-approved client. SPIFFE ID is the demo web. But I've just changed it to `demo-app`. So now, it's not going to accept requests from `demo- web` because it is not going to want to do that. So I'm just going to reload the daemon, and restart that system. And if I come back to my page here and refresh, it's broken because this has said, "Wait. Wait. Wait. You're `demo-web`. I only accept requests from `demo-app`. I'm no longer going to accept this." It cancels the handshake between Ghostunnel and the backend service, and it's no longer a valid request. So Ghostunnel passes a 500 back up to my Node app. In that same vein, I could go fix it. I could go into the tbot config and change the user ID and group ID. And at that point, tbot would stop providing SVIDs to the backend, and we'd see the same breakage just for a different reason. And that is just a quick demo. You can see the code for all this at github.com/asteroid-earth, which is our sort of demo home organization/Workload ID demo. And I'll paste that into the Chat as well if people want to go see it. And I'm going to pass it back to Noah.
The future of Teleport Workload ID
Noah: Okay, awesome. So we've kind of seen what kind of the whole Workload Identity space is about. And we've seen a little bit about how Teleport Workload ID works. Thanks to Dave's demo. But that being said, Workload ID is a relatively new feature of the Teleport product, and we're constantly looking for design partners who can help us set the direction going forward. If you're a Teleport customer and you're already using some other Workload Identity platform like Spire, or you're interested in deploying a Workload Identity platform in the next few months, quarters, or years, please get in touch. We'd love to book a chat with you and our product team, so we can kind of learn a little bit more about what you want to use it for and what your most important requirements are. This means we can then get those on our roadmap and make sure we're building the product that works for everyone. You can reach out to us for a conversation at [email protected]. Okay. Great. And I think that is kind of the end of the main content of the webinar. So we can go to the Q&A now if any questions have been raised.
Live Q&A
Dave: Yeah. And right before we do that, our wonderful behind-the-scenes producer here, Lexi, is going to push out a survey that we would very much appreciate if you fill out. But you can be doing that while we go through the Q&A. So the first one is: Any idea if this can be integrated with a Kubernetes service mesh like Istio? Noah?
Noah: Yeah, so a couple of different options there, really. At the moment, we've been looking into Cilium and Istio. And at the end of the day, Istio does just use kind of the Envoy Proxy to power it. And Envoy does support, through Envoy SDS, kind of receiving these certificates to use. You can then also kind of configure filters through Istio and through Envoy to kind of set up these rules around who's allowed to talk to what.
Dave: Oh, and I just realized that the default sorting is the most recent, so I'm going to jump down to the bottom so that the people who ask their questions first get their questions answered first. So “are agents a single point of failure?” is another question here.
Noah: Yeah, so I suppose that's actually kind of hypothetically true. Right. This individual agent that's running on your host is the single point of failure. But these are also fairly simple. And there's no reason why you couldn't run multiple agents and have your Workloads attempt to receive their SPIFFE IDs and SVIDs from these two different sockets that are presented to them.
Dave: Yeah. The next is: “How does tbot authenticate to the SPIFFE CA?” And I can actually answer that one so I look like a smart person. So tbot uses our Machine ID system. And you can go read our docs on this because this is not Workload ID specific. This is Machine ID. And depending on where tbot is running, it could be running on a CI system. It could be running on an on-prem hardware. But we have multiple join token methods that allow the bot to authenticate. So the ones in my demo environment that I was just showing, all those nodes that it was running on, had specific VM profiles. And so there's an ARN. And you can configure Teleport to accept authentication via an ARN down to the specific instance ID or up to just the profile group level. So that's AWS specific. We have similar ones for Azure and GCP, but then there's also other methods. But that's how tbot authenticates over to Teleport.
Noah: Yeah. And what's kind of really awesome about all of these different — we call them our delegated join methods, is that in a lot of cases, we actually avoid the use of a long-lived kind of static secret that tbot would need to use to authenticate. Instead, it's all about kind of trusting some other platform whether that kind of be Kubernetes or, as David said, GCP, AWS, Azure, and saying, "Hey, we're going to let you swap one of those identities for access to this bot in Teleport."
Dave: Almost like we solved the bottom turtle before we adopted Workload Identity.
Noah: Yeah. [laughter]
Dave: Is Workload ID a Teleport enterprise-only feature?
Noah: Nope. It's fully available in the open source assuming you're running — I think the last of the features are coming out in the release that will come out like tomorrow or kind of the end of the week. But yeah, it's fully in the open source version.
Dave: Yeah, 15.4, I think, will have everything that we've shown here today very stable. “Configuring tbot looks pretty complex. For an org like ours with thousands of Workloads. It would be too daunting a task. Is there something I'm missing?” Go ahead, Noah.
Noah: I was going to say it's quite a good question. I think one of the interesting things that Dave has shown, that can be a great tool in doing this sort of thing, is infrastructure as code tools like Terraform. In addition, when we're talking about looking into the future and kind of roadmaps and stuff, we'd love to do kind of remote configuration models or kind of like ClickOps models. The Teleport console today — not console, I guess, web UI, already has a bunch of these kind of really awesome ClickOps, click-through wizards. And I'm pretty sure kind of down the road, we'd want to add one of those wizards for all of this SPIFFE Workload ID stuff.
Dave: Yeah. But yeah, I mean, in the end, you're going to be heavily relying on templating, I think, and auto generation, ideally, of some kind of IAC to infrastructure as code to get all this out. And I mean, that's not much different than setting up all of those Workloads in the first place. And again, a single tbot can — so I only showed an example here. But a single tbot can provide identities to many, many things. So as we talk about especially like Kubernetes, you're going to have tons of pods running on a given node. And tbot could be authenticating all of those pods ideally through some kind of templated system where you accept — again, I was very specific to kind of show how much you can lock it down, but you can also be a little more general. So yeah, I think that's a key thing to keep in mind as well. “Can the bot verify a request comes from a specific Kubernetes Workload?”
Noah: So if we're talking about — Okay, there's a couple of different ways of looking at this. If we're talking about that sort of Workload attestation process, that's not something we've got a native integration for just yet. But the Unix Workload attestation that we've completed recently with kind of the process ID, user ID, group ID lockdown, what we can actually do is we can take that process ID. We can speak to the local kubelet, and we can say, "Hey, which container, which pod is this process ID associated with?" When we kind of implemented that, that Kubernetes Workload will be able to kind of secretlessly fetch its identity from this local agent. The other thing to keep in mind, as well, is you wouldn't — so that is describing kind of an implementation model where you would have one tbot agent running on every single one of your Kubernetes nodes. And that would service all of the Workloads that are running on that one Kubernetes node. You could achieve the same thing today by running the tbot as a sidecar to the Kubernetes Workload that requires an identity, because we already have a really kind of rich and powerful Kubernetes-based join method where you can basically say, "Hey, this tbot should only be able to join and use its roles and Teleport if it's coming from this Kubernetes service account. And then you could, basically, give that single tbot that's linked with that service account the right to issue the SPIFFE ID for your Workload. And then you, basically, wouldn't necessarily have to use any Workload attestation at all between the Workload container and the tbot container within that pod because you could kind of lock that Unix domain socket down and kind of share it within the pod. And assuming you know it's just your code running, it wouldn't be kind of necessary to have any further Workload attestation.
Dave: Yeah. And I'm going to take — there's a couple of questions here. I think that one is included, but there's another one that says: The demo showed Unix ID and talked about Unix sockets. What about Windows? Is it supported?” And I think I'm going to capture a couple of things here which is that even what Noah was just mentioning around Kubernetes Workload attestation, that does not exist yet. And a Windows implementation does not exist yet. And if those are things that would be important to you, please email us. And that's definitely something where we'd like to bring you in potentially as a design partner to talk about what the next thing we build is. Going through. “Is this part of the base Teleport enterprise license for an additional cost like the identity product?”
Noah: Yeah, I can answer that one. So yeah, it's not kind of an additional cost like the kind of identity product or the policy product. At the moment, I don't know. It depends which kind of era of Teleport enterprise license contracting you're on. But if you're on the most recent ones, what we're essentially saying is those tbots count as a TIA — not as a TIA, as a Teleport protected resource. So there'll be a small monthly cost associated with each of those bots if you're on an enterprise license. But kind of my best advice would be there is — if you reach out to your account manager over at Teleport, we can get you all those precise details. And if that pricing model doesn't work for you, we might be able to kind of do some adjustments there.
Dave: “Would this be a fix to integrate with GitHub? This is pretty broad, but we have GitHub Advanced Security enabled, and I'm looking for a way to have break glass access approval workflows for repo admins to assume to elevate the normal restrictions?” I'm not sure. I think that's one we'd have to get back to you on because I'm not familiar with GitHub advanced security. And so yeah, I think we'll have to get back to you on that one. “What is AWS Roles Anywhere, and how does Workload Identity work with it, or Workload ID?” Take that, or me?
Noah: Yeah. I'll take it. So AWS Roles Anywhere is basically a feature of the AWS platform where you can say, "Hey, I want to allow X.509 certificates to be able to be used to authenticate with AWS." And this is kind of really useful in terms of SPIFFE because you can basically configure AWS using Roles Anywhere to say, "Hey, these SPIFFE IDs that kind of belong to my Workloads, they should be able to assume these roles in AWS." And it's kind of moving away from that previous model that I know a lot of people used to do, right, where you would just kind of export that long-lived AWS kind of API token, and store that on your machine. And maybe you'd rotate it every — maybe a year, if you're lucky, maybe six months. And if someone managed to get a hold of that, they would have access to your AWS account. But now with SPIFFE and kind of AWS Roles Anywhere, you can have these kind of short-lived certificates that are going to constantly be rotating and kind of lock down your access to AWS using that.
Dave: And I think, yeah, to take that and tie it to the other — somebody with the GitHub Advanced Security Question, email us. Because what Noah's talking about there, one thing you can do is he's built out a feature where you can just issue yourself a SPIFFE ID and a SVID as sort of a one-time, short-acting thing via the command line. And if you're looking for a way to give people the ability to short-term assume an identity interacting with an API like that, then I'm not promising you anything, but it sounds interesting. I'd love to talk to you more about it. Next one is what auditing does Teleport provide for Workload ID?
Noah: Yeah, Dave, do you still have your demo environment up? You could pull up the audit logs for this.
Dave: I do. Let me see how crazy the audit logs are.
Noah: But yeah, just kind of give an overview of that. Essentially, in your Teleport cluster, what you'll see is all the information about the SVID that's been issued and which bot has issued it. And then if you're also pulling the logs from your tbot agents, that will give you all the precise information about the Workload that requested that ID. So you'll be able to see kind of the process ID, the user ID, the group ID. And that's all tied together by the unique ID that's in each of these X.509 certificates.
Dave: Yeah. So you can see a specific bot joining the cluster via a given method. You can see certificates being issued. Again, the only thing that you're not getting that you get generally through Teleport is the audit log of every single action taken because that would require things to pass through the proxy. And that would slow all of your — that would send all of your traffic through the Teleport proxy service which we don't want to do. So we get actions taken, but not individual transactions.
Noah: If you just pop open the Details tab as well, you can see there's actually a bunch more information. Yeah. There we go. Cool.
Dave: Yeah. Can we use SPIFFE to authenticate distributed Workloads, for example, authenticated Workload running in GCP to a server running in AWS? Yes. That's the brilliant thing about using an open standard like this is — it is not inherent to any given platform. You can create a SPIFFE ID in one place and authenticate it in the other. And again, I think that's a strength of Teleport in general is that if you're an all-in-one Cloud, you can probably use the tools that are within that Cloud. But if you're crossing Clouds, you need something that's Cloud-independent and that can authenticate things in multiple places. And Teleport is great at that. And it's one of the things that makes it really strong here as well. I'll just give it another minute or two to see if there's any last questions. And then otherwise, thank you so much, everyone, for joining us. Again, please do that survey before you click Exit, if you can, for us. That's super helpful to get information on how well we did and what we can improve in the future. And otherwise, yeah, thanks for being here. All right, I think that's it. Thanks, everyone. Thanks, Noah.
Noah: Thank you, everyone.
Join The Teleport Community