Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions
Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions - overview
Current approaches to managing machine identity for infrastructure like Kubernetes Clusters and CI/CD workflows rely on outdated security mechanisms like passwords, shared secrets, and other manual processes that are error prone and increase the risk of breach.
This session focuses on the existing vulnerabilities and downsides of using shared long-lived secrets to access key pieces of infrastructure programmatically using GitHub Actions and how you can eliminate those secrets using Teleport Machine ID. As GitHub Actions has matured as a product and more and more companies rely on it for their CI/CD workflows, an exposed secret in the repository can be the difference between a team being able to efficiently test/deploy code, and an infrastructure breach of nightmarish proportions.
In this session, we demonstrate:
- Production-like Teleport GitHub Actions integration highlighting Teleport’s rich audit logging, Kubernetes management, and secure access capabilities.
- Workflows that apply Kubernetes manifest updates directly to the cluster upon commit to your repo.
- Security best practices assigning individual identities to humans and worker nodes without the use of any static credentials.
Leave shared secrets and passwords behind with Teleport Machine ID, allowing your DevOps engineering team to sleep easier at night by replacing GitHub secret management and reducing your organization's attack surface.
Key topics on Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions
- Six million secrets leaked in 2021, double that of 2020.
- As long as long-lived credentials exist, humans will make a mistake eventually, and when they do, if it's not properly handled, there could be huge repercussions.
- Teleport Machine ID extends identity-based access to IT Infrastructure and applications.
- Teleport Machine ID is the easiest way to issue, renew and manage X.509 and SSH certificates for microservices, CI/CD automation, databases, Kubernetes clusters, servers and all other forms of machine-to-machine access.
- GitHub Actions is a popular CI/CD platform that works as a part of the larger GitHub ecosystem.
- Teleport, with the help of Machine ID, allows for GitHub Actions to securely interact with Teleport protected resources without the need for long-lived credentials.
Expanding your knowledge on Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions
- Teleport Machine ID
- Teleport Passwordless
- Teleport Server Access
- Teleport Application Access
- Teleport Kubernetes Access
- Teleport Database Access
- Teleport Desktop Access
Learn more about Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions
- Teleport Labs
- Contribute on GitHub
- Join our Slack community
- Participate in our discussions
- Why Teleport
- Get started with Teleport
Transcript - Securely Deploy Kubernetes Clusters with Teleport Machine ID and GitHub Actions
(The transcript of the session)
Kenneth 00:00:06.243 Hello, everybody. Thank you so much for coming to this little webinar. We got a great session for you today. We're going to talk about how to securely deploy your Kubernetes clusters using Teleport Machine ID and GitHub Actions. So this is a really cool feature that we're really excited about. Came out in Teleport 11, but we actually added GitHub Actions Support for Teleport's Machine ID feature. We're going to go all about what that means and how we can use it in a more production-like environment. So my name is Kenneth Dumez. I'm a developer relations engineer here at Teleport. And we were supposed to have our friend Noah join us as well. But unfortunately, he could not make it today. So you're stuck with me. But it's going to be a good time. Don't worry. And so yeah, the format of how we're going to go, we're going to go over what Teleport is, what Machine ID is, and how you can leverage it in your environment. And we're going to do a little demo. In the meantime, if you have any questions, please feel free to throw them in the chat. And we also have Kateryna here, who's going to be kind of helping manage that aspect. So yeah, let's dive in. I'm going to go ahead and share my screen here. Let's see if I can get the right one on the first try. I think that should do it. Yes, I see my screen. That's perfect. Great. Let me just pull my speaker notes here. Awesome.
Why you should never use static shared secrets in GitHub Actions
Kenneth 00:01:52.719 Yeah, so we're going to talk about why you should never use static shared secrets in GitHub Actions and some alternatives. So I hope I can teach you a couple of things about securing your automated workflows, how the landscape looks right now, and why it's probably a bad idea to use these long-lived static credentials in your various CI/CD flows. But today, we're going to focus especially on GitHub Actions. So you're probably familiar with these two logos. If not the one on the right, certainly the one on the left. You got to love that strange little Octocat guy that the GitHub folks have conjured up. The logo on the right, if you're not familiar, is for their CI/CD solution, GitHub Actions. GitHub Actions is great because it allows you to centralize all of your integration and development and testing workflows in the same place that you keep your code that you're actually testing it on. So there's no need to have separate DevOps repos, where you store all of your repository information, and you get this nifty little UI, where you can see all of your test runs. You can click into them. It's super intuitive and really easy to use. And I'm certainly not alone in my opinion on this tool. So this is data from HG Insights that shows the adoption of GitHub Actions by companies in the last year. So as the product has matured, its user base has grown wildly and still continues to grow. And this is only tracking enterprise organizations and doesn't account for the thousands of open-source projects that are relying on GitHub Actions for their CI/CD needs. I remember when it first came out, and it had a ton of bugs, but now, again, it's a super mature product and more companies are taking notice of that.
Kenneth 00:03:31.936 And if you've seen any of my other talks, you know I love the GitGuardian State of Secret Sprawl Report, kind of a mouthful. This is the most recent number from their 2022 report, looking back on the past year. I love this report so much because it really illustrates how big the problem with secrets in GitHub is. You would think by now that we as an industry would start adapting our practices and being a little bit more careful with how we manage credentials. But no, the problem is actually getting worse. Six million. Six million secrets leaked in 2021, double that of 2020. And part of this has to do with the increased number of companies moving their infrastructure from more traditional, on-prem setups to the cloud. As there are more cloud resources, of course, there going to be more credentials required to access them, different access tokens, API keys, long-lived passwords, you name it. All just ready to be leaked. All ready to be stolen. And frankly, most organizations are just not equipped to deal with these leaks. Another quote from the port is that, "On average in 2021, a typical company with 400 developers and four AppSec engineers would discover 1,050 unique secrets leaked upon scanning its repositories and commits." That's a lot of secrets. And each of those secrets is typically not leaked in isolated way. On average, each of these individual secrets appeared 13 times each in different places across the codebase. 13 times. Accounting for all of the duplication across the codebase, this means that a single AppSec engineer, on average, needs to handle 3,413 secrets annually, on average. That said, this is simply not sustainable.
Solutions and problems
Kenneth 00:05:14.004 So there's a couple of solutions and problems associated with them that come up. One purported solution to this problem is just to use GitHub's encrypted secrets. These are pretty good. Everything is encrypted on the client side and then decrypted on runtime. So the secret can be injected into the workflow, and GitHub actually does use a mechanism that attempts to redact any secrets that appear in run logs. However, because there are multiple ways secret values can be mutated and transformed, accidental exposure does happen. Another problem is dynamic access. For example, say you were using a private key to generate a signed JWT to access a web API. Unless you register that JWT as a secret in GitHub, it won't be redacted and can be exposed. Another issue is chain of custody. Any user with write access to your repository, for example, has read access to all secrets configured in your repo. This makes it difficult to audit and keep track of who is accessing what resource at what time and who is doing what with your various secrets. And of course, it becomes increasingly challenging at scale. And there's also the issue of duplication. In an ideal world, of course, the secrets you are using in your GitHub Actions repo would only live there, and that's it. They would just be in the encrypted GitHub secret repo and not be stored anywhere else. However, a common setup that I have personally seen in the past — and I'm sure many of you have as well — is that these secrets will actually be duplicated and be stored in a password vault as well as the GitHub repository. This is useful for an engineer because if you want to manually access a resource, you'll have those credentials at hand. The problem, though, is that now you have these credentials floating around in a few different places, making rotation more difficult and while also expanding the tax surface that would allow malicious actors to take advantage of these credentials. The more places you have those secrets stored, the less secure they are, leading to more chances for mistakes and compromising developer efficiency whenever secrets are added, removed, or needing to be rotated.
Kenneth 00:07:23.536 Another avenue is saying, "Okay, we know that secrets are probably going to be leaked at some point, so we should constantly be monitoring our repositories for these creds, so we can respond as quickly as possible to leaks." These tools are great and not mutually exclusive with using encrypted secrets when you can, but really they're just not quite enough. They're more of a reactive solution that you can use to do damage control, rather than preventing the problem at the source, which is kind of always the end goal. You want to prevent the problem before it happens. They also often require manual intervention. So when a secret leaks, a security engineer may get pinged, have to put down dinner with his family, and then go rotate that cred in the password vault and the 13 times it appears leaked in the codebase. Again, these tools are great and a good addition to pretty much any repo setup, but they just don't do enough. And especially they aren't enough by themselves.
Remove the long-lived credentials
Kenneth 00:08:16.125 So it's not all doom and gloom. So what can we do? Well, what if we just remove the credentials themselves? Keeping long-lived credentials safe is hard. So the reality is that as long as they exist, no matter everyone's best intentions to follow best security practice guidelines, humans are human. They'll make a mistake eventually, and when they do — not if — if it's not properly handled, there could be huge repercussions. You might stop 99 out of 100 of these leaks, maybe 999 out of 1,000. But eventually, one of those secrets is going to make it into a Pastebin file somewhere on the internet that some kid in Brussels is going to buy — is going to sell to buy some NFTs or whatever it is hacker teens in Brussels do. One of the ways that we can prevent this and one of the ways that we can eliminate these long-lived credentials is with Teleport Machine ID, specifically for GitHub Actions. What Teleport Machine ID does is it provides secure machine-to-machine communications based on X.509 and SSH certificates while removing static and shared credentials from applications, microservices, and code. And in Teleport 11, we actually added support for GitHub Actions workflows.
Teleport Machine ID for GitHub Actions
Kenneth 00:09:39.672 So here's kind of what that means. The idea is that you give your machines rights. You give an identity to all your microservices, your CI/CD automation, and service accounts in the form of a short-lived X.509 certificate. Think about the cert as almost like a driver's license for a piece of automation, with Teleport acting as the DMV kind of issuing these certificates. So you eliminate these shared credentials entirely. You prevent human errors from escalating into a full-blown cyberattack with certificate-based policy enforcement. You want to unify your access policies for people and machines, reducing operational overhead and increasing security and compliance. Minimize the blast radius by tying those certificates to specific pieces of automation like microservices, bots, or even GitHub repos. So even if those secrets — even if those certificates are exfiltrated, there's not much you can do with them, and attackers cannot leverage them to scale up attacks. It's also important to keep it simple. So one of the things our founders always like to say is that the most secure solution also has to be the easiest. Otherwise, people will find workarounds. If it's challenging to use, people won't do it. So you have to work with your engineers and not complicate their existing workflows. Teleport does that by simplifying the certificate management for IT infrastructure and applications, making it easier in engineering workflows rather than complicating them.
Kenneth 00:11:23.131 So there's a few pieces to machine-to-machine access that are critical for making them work. They're authentication, authorization, connectivity, and audit. So these are all the pieces that go into securely making machine-to-machine access work. And they're all necessary. And I'm going to tell you how Teleport's Machine ID kind of provides those pieces. So authentication. So this Teleport Machine ID generates an identity in the form of a short-lived X.509 cert, as I said, for the microservice and ties that identity to a role managed by Teleport. So you can still leverage your existing RBAC roles now with your various pieces of automation. So you can assign the same RBAC level controls you can to your engineers to your pieces of automation and your bots. Authorization. So Teleport automatically approves or denies access requests to a range of resources, like servers, databases, Kubernetes clusters, microservices, CI/CD systems. So once that piece of automation presents that certificate to Teleport, Teleport will then validate that cert, making sure that all these pieces of infrastructure have valid access. Connectivity. So we're going to come back to this diagram in a second. But Teleport also establishes connection between the microservice and the requested resource using reverse proxy tunnel from the Teleport server directly to the resource. And finally, audit. So this is extremely important. You have to be able to have a rich audit trail for every piece of automation you have, every piece of infrastructure you have, knowing who's doing what within your system. So this is extremely important for compliance purposes, diagnostic purposes, and really, really, very useful at scale. Once you have all of these different pieces of automation, it can be tricky to keep a bird's eye view and know what's happening, where, and who's doing what.
Machine ID architecture
Kenneth 00:13:39.546 So this is kind of like a high-level architecture diagram. And with Teleport Machine ID, if instead of managing your access using long-lived credentials, you can instead join each infrastructure resource to your Teleport cluster and use automated short-lived certificates. There's no credentials to manage, and there's a rich audit log of everything happening in your CI/CD environments. So this is, like I said, a higher-level architecture diagram, showing how Teleport Machine ID can interact with a Kubernetes cluster. The worker node will actually refresh its credentials on a cadence, getting a new kubeconfig from the Teleport host and renewing its access in an automated secure fashion. So you have this tbot agent. And what this does is it runs on the automation node. So in this case, like a worker node. And that tbot will actually reach out to Teleport and grab new certificates on a cadence. So it'll continuously refresh. There's no downtime. And it's completely configurable. And then Teleport will actually supply a kubeconfig to the worker node, allowing the worker node to access the Kubernetes cluster securely. Again, while all the traffic being passed through that Teleport cluster, everything being audited, everything in a very secure fashion.
Kenneth 00:15:08.666 All right. So let's see it in action. So I have to change my screen sharing. Let's see. This is going to be the trickier part because I'm going to be sharing my entire screen. So let's see. All right. So yes, great. All right. So the way that we kind of have this currently set up is that we have this Kubernetes — so this is also — this is the Teleport UI. This is the web UI. I've authenticated using my GitHub login to log into this Teleport cluster. And what we have here is we have this Kubernetes cluster called
cookie, and this is managed by Teleport. And so we can actually see where this is hosted because this is also managed by Teleport. It's hosting this little VM in here called
k8s host. And so if we go connect to this, we're inside of the host right now, and so we can run some commands like kubectl, get pods, and we can see all of our various pods here. And so this is running and managed by Teleport. And as we can see here, we have this
colormatic namespace. And so what this
colormatic namespace is — is it's this little web application, this Go web application that we built. And this is what we're going to be using to kind of demonstrate what a production-like use case of Machine ID is going to be. So we have this Kubernetes cluster. And we have this pod serving up this webpage.
Kenneth 00:17:05.715 And so as a little bonus, part of the demo, we can actually show you how to access Application Access as well. So we have this application registered in Teleport. And what this is doing is — so this Kubernetes cluster is actually hosting this Go web app locally on the local host. And we can actually still access that from outside of this host using Teleport. And so what that looks like is that we actually set up that reverse proxy tunnel into that host, and we can access that in an external way. So here's our little web app. It's called “Colormatic”. It's very simple. It just has a configurable color here that shows the back of the screen — or shows the background and it changes. And it shows what pod that it's being served by. And it actually shows what GitHub Actions produce the container image for the application. So we'll get into that in a second. But this is the little web app we have here. So if we go to — oh, did I just pause the screen share? I hope not. Okay. I think it's good. So if we go to our VS Code here, we can actually see the deployment YAML for our Kubernetes cluster. So it's very simple. We have our metadata here. We have our
colormatic namespace. And we have our
replicas set to 1. We only have one. We create. And this is interesting because this is actually the container that we build using our GitHub Actions. So we're going to build this container. You can see it's hosted in the GitHub Container registry here. And this is the container image that we're actually serving with our Kubernetes cluster. And here, of course, we have our little Go app. Again, very simple, we're just hosting it on our local host here, which then we access via Teleport Application Access, so we can actually access it outside of that. And here we have our constants where we're going to change to actually see this in action. So pink is great. But I would like to see blue. So we can change this to
colorBlue. I'm going to go ahead and save this.
Kenneth 00:19:50.143 And what's going to happen is we have this GitHub Actions action here. And so what this does is it will actually build and push our Docker images. So this is building our Go application image, which is our little web app. It's going to push that to the GitHub Container registry. And then it's going to deploy it to our Kubernetes cluster using Teleport and kubectl. So the interesting part here is that we have this little
teleport-actions action here. So this is a public GitHub action that we put out — Teleport did — and you can use this yourself right now. So all it does — it's very easy. You just specify some different environment variables. And this will actually interact with your Teleport cluster and authorize your GitHub Actions runner as a bot, so the little VM that is running your GitHub Action will authorize it to use Teleport and allow it to access the resources that the bot role assigned to it allows. So first, you just have your proxy here, which is just your public address of your Teleport cluster. Then you have your token. So what this token looks like is it's used to actually authenticate the bot into Teleport. And so this is kind of the YAML definition of this resource. Again, very simple, and you'll notice that is a very long expiration date, and this lives inside of the Teleport host cluster, and it's actually okay to have this kind of long-lived token here because it's specifically tied to an individual GitHub repo. So we have here, we can see that this is the only repository that will be allowed to use this token. So even if this token — worst-case scenario — someone breaks into your Teleport cluster somehow and actually traced that token. There's not a lot you can do with it. Because you would also need an authorized GitHub user to access this repository. And then from there, it's kind of a convoluted hack. So while it's possible, it's very unlikely and not very useful to an attacker. So this is a lot more secure than a token, than like a GKE API key, for example, where you can just have general access and be able to use that anywhere. This is a lot more specific and a lot more locked down.
Kenneth 00:22:32.193 And then you have the certificate time to live. So we just set this to 10 minutes because our GitHub Action will only run for — I think it's like two minutes. So it only needs this — and then it will expire, and then we'll be completely useless. And then you have the name of your Kubernetes cluster. So ours is
cookie, of course. And that's it. So we made our change here. We have our configured color now set to blue. As you can see — whoops, wrong tab. As you can see, remember, it's pink, so we're going to change that. So if we go to our terminal here, we're going to actually add commit this change. So we see our change here, change pink to blue. Beautiful. We're going to go ahead and commit this change. And we're going to push. So now, if we go to our repository here — we have our little repository. And I'll be linking this later. So that it could actually just try this out for yourself as well. If you want to fork it and try it out. And we can see that we have our actions running here. So this is running on the commit to main. And right now, we're building and pushing our Docker container image, which is just, again, our little Go web app, and we're pushing that to the GitHub Container registry. It's running, building and pushing.
Kenneth 00:24:25.049 Great. So now, we're going to actually deploy this web app to our Kubernetes cluster. So we're installing kubectl. We're installing Teleport. What's great about Teleport as well is it's super lightweight and really extensible. So we're installing this right now in the GitHub Actions runner. It took only nine seconds. And we're doing our authorization. So as you can see, we successfully registered — okay, hello. We successfully registered the bot. We generated a new identity. And this is just warning that our time to live is short. That's fine. And we're actually starting watching for the certificate rotations. So we're not going to do any rotations right now because our time to live is so short. But if you wanted a longer running bot, you could actually watch for those certificate rotations and Teleport will continuously issue those new certs after the time to live has expired. So we see that it renewed the certificate and got its identity for Cookie, which is the Kubernetes cluster. So now, if we refresh our web page here, we see that it changed to blue. So just by pushing a change in our Go app, we saw that using Teleport and GitHub Actions, we were able to see that change reflected automatically via our CI/CD flow.
Kenneth 00:26:06.330 So if we go inside here, back into our Kubernetes host, we can run again — we can get our pods again. And we can see that this pod is brand new here. And we can actually go ahead and describe it and take a look a little bit at what's going on here. Yeah. So you can see our container image here. You can see that it's created very recently. And yeah, so that's kind of what it is. And so the last part I wanted to show you is that all of the bots are actually all recorded and audited in Teleport in this central location. So you can see all of the different commands that the bot ran. And you can see the bot joined. And all these, you can view in a rich detail here, so you can see, all of the different verbs, you can see when exactly this command was run, the user that ran it, the verb, what Kubernetes users that they were using, the Kubernetes group that the bot belongs to, and what identity they were assuming. So this is a very rich audit log, and kind of helps for Kubernetes auditing. Because Kubernetes auditing can be difficult and challenging to kind of manage. So this centralizes it all in one location. All these logs are also very easy to pipe and ingest to third-party tools for constant monitoring. And you can format them however you want, depending on your needs. And that is pretty much it. So we kind of walked you through the whole workflow.
Kenneth 00:28:07.229 And yeah, let's see. Maybe we have some questions. Let's see. Is it possible to run Teleport in an air-gapped environment?
Kenneth 00:28:18.700 Yes, it is possible to run Teleport in an air-gapped environment. All that the resources would need is they would only need to talk to the Teleport cluster itself. So they don't actually have to have any public egress within your AWS cloud or whatever on-prem solution — or whatever on-prem setup you have. As long as those infrastructure resources can talk to the Teleport proxy, they don't need any external internet communication.
Kenneth 00:28:53.205 So how is Teleport Machine ID different in comparison to Argo CD GitOps approach using HashiCorp Vault dynamic secrets?
Kenneth 00:29:03.113 So Teleport's approach is a little bit different because it acts as its own certificate authority. And HashiCorp's Vault — the dynamic secrets are similar, but they don't have that level of granularity with the identity baked into the X.509 certificate. And it's a little bit less secure in that way because you have less information about the microservice and about the bot, or whatever piece of automation you use, tied directly into and built into that dynamic secret.
Kenneth 00:29:41.120 Restrict commands. I'm not sure what that question means. How can we control commands in Teleport?
Kenneth 00:29:51.359 I'm not quite sure what you mean by that, Alexi, if you want to elaborate a little bit.
Kenneth 00:30:02.307 Oh, how can you restrict commands based on what commands that the bot is able to run?
Kenneth 00:30:09.408 That's a good question. So again, it's all based on RBAC. So what you do is you assign a role to the bot or the piece of automation that you need. And those roles will have certain permissions attached to them. So it would be the same way that you assign RBAC roles to an engineer. You only want them to be able to access certain pieces of infrastructure. And it's all extremely granular. So again, the whole idea is treating your machines the same way that you want to treat your engineers.
Kenneth 00:30:46.518 Is short-lived X.509 creds managed by SPIFFE/SPIRE under the Teleport hood?
Kenneth 00:30:53.566 So I'm not super familiar with SPIFFE or SPIRE. But the way it works, again, is that just Teleport, it acts as its own certificate authority and issues and checks for the validity of these certificates. So I'm not sure what exactly the underlying technology is, the specific way it does that, but I can totally find out and email you and reach back out with that answer.
Kenneth 00:31:24.908 Thanks, Chris, for the comment, and thanks for joining. And we're going to give it one last call for any questions, feel free to throw them in that chat there. All right. Well, thank you so much. This was a lot of fun. I'm sad Noah couldn't join us, but I hope I was enough for you. And we'll see you next time with our webinar. And this will also be posted in a recording. So if you want to watch it back, we'll send that out to you. And thank you so much again for joining. Have a great rest of your day.
Join The Community