Secure RBAC/SSO for Kubernetes with Teleport OSS and GitHub Teams - overview
The promise of elastic scale and cloud native has driven the demand for K8s, but developers now have the harder task of building applications in a secure manner. This talk will focus on best practices and potential pitfalls for securing K8s for the engineering team by using the K8s API server and control plane. Join us for a how-to on implementing a robust Role-Based Access Control (RBAC) tied into the corporate SSO/Identity provider using GitHub Teams and open source software.
Key topics on Secure RBAC/SSO for Kubernetes with Teleport OSS and GitHub Teams
- According to NIST, an attack surface is the set of points on the boundary of a system, a system element, or an environment where an attacker can try to enter, cause an effect on, or extract data from that system, system element, or environment.
- An organization has different technology layers and needs to perform an attack surface analysis for each layer to determine how to best mitigate potential threats and vulnerabilities in that layer.
- When working with Kubernetes, you will likely work with multiple clusters and might therefore want to centralize access.
- RBAC is a method of regulating access to computer or network resources based on the roles of individual users within your organization.
- Kubernetes RBAC Role and ClusterRole contain rules that represent a set of permissions.
- A Role in Kubernetes always sets permissions within a particular namespace. A ClusterRole, on the other hand, is a non-namespace resource.
- Open-source Teleport can map GitHub teams to groups and users that exist within Kubernetes clusters.
Expanding your knowledge on Secure RBAC/SSO for Kubernetes with Teleport OSS and GitHub Teams
- Teleport Database Access
- Get started in 10 minutes
- Database Access with Self-Hosted PostgreSQL
- Teleport Quick Start
- Teleport Unified Access Platform
- Teleport Kubernetes Access
Learn more about Secure RBAC/SSO for Kubernetes with Teleport OSS and GitHub Teams
Introduction - Secure RBAC/SSO for Kubernetes with Teleport OSS and GitHub Teams
(The transcript of the session)
Micah: 00:00:00.999 All right. That looks good. Again, thank you for showing up today for our webinar, Security and Access for Kubernetes with Jonathon Canada, our Sales Engineer. Jonathon has a lot of experience and a lot of certifications in the industry. And he’s going to go over a workflow on how to use RBAC and SSO for Kubernetes using open-source Teleport and GitHub teams workaround. So, Jonathan, take it away.
Jonathon: 00:00:29.093 Thanks, Micah. Hi, everyone. As Micah mentioned, I’m Jonathon Canada. I’m a sales engineer here at Teleport. In this webinar, I’m going to talk about best practices for controlling access to Kubernetes clusters using role-based access control. So native Kubernetes RBAC capabilities. And then tying those roles to SSO identities using open-source Teleport. So everything I’m going to discuss today is all open-source. It’s all free tools. And if any questions come up, please feel free to put them in the Q&A section of the Zoom session, and we’ll definitely get to them at the end.
Jonathon: 00:01:21.429 Here’s the agenda for today’s webinar. So I’ll first start with a high-level overview of security discussing attack surfaces, security architecture, zero trust concepts, and tying those into how you might approach securing Kubernetes workloads. I’ll then move into speaking about using a unified access platform or an access gateway for accessing infrastructure. I’ll then discuss native RBAC capabilities within Kubernetes. I’ll then move into an overview of my demo environment, followed by an actual demo. And finally, I’ll answer any questions that have been added to the Q&A chat.
Security overview
Jonathon: 00:02:10.049 In this section, I’ll define a few key security concepts so that we have a common set of definitions to work from. The first is an attack surface. According to NIST, an attack surface is the set of points on the boundary of a system, a system element, or an environment where an attacker can try to enter, cause an effect on, or extract data from that system, system element, or environment. So in other words, an attack surface describes all of the vectors for exploitation.
Jonathon: 00:02:49.934 And this slide discusses the different technology layers you’ll find within an organization. And as you look at each one of these layers, you’ll want to perform an attack surface analysis for each one to determine how you can best mitigate potential threats and vulnerabilities within each of these layers. So some questions to ask within each one of these. So for networks and infrastructure, how are users and devices accessing your networks and servers? Is SSH being used? Applications. What applications are running in your network? Who has access to those? How are those being secured as well as the underlying operating system and host? How is data being protected? Endpoints. How do you control which users are authorized to access different parts of your network in different parts of your infrastructure? Then, lastly, cloud. Are there open S3 buckets? Are API keys being shared or not rotated? And with each one of these layers, if you look at Kubernetes, the Kubernetes deployment can cover all of these layers. So it’s critical to think about how you can properly secure your Kubernetes server API, the applications running in your clusters, the infrastructure the clusters are running on, and the endpoints your users are accessing the Kubernetes clusters from.
Jonathon: 00:04:34.114 The last high-level security concept I’ll mention is zero trust. Zero trust is a model that was developed in response to NIST's request for feedback to a document called Developing a Framework to Improve Critical Infrastructure Cybersecurity. Now, according to the analyst at Forrester, the three main concepts of zero trust are: ensure all resources are accessed securely regardless of location. Adopt a least privileged strategy and strictly enforce access control. And then lastly, inspect and log all traffic. Now, how Teleport can fulfill each of those concepts at a high level is — Teleport uses end-to-end TLS encryption. Open-source Teleport can map GitHub teams to groups and users that exist within Kubernetes clusters. So RBAC within Kubernetes. And then lastly, with Teleport, all kubectl requests are fully logged and even kubectl exec sessions themselves are recorded.
Unified Access Platform / access gateway
Jonathon: 00:05:50.374 So now I’ll discuss using a unified access platform. And when you talk Kubernetes, the odds are that you will end up with multiple clusters. You might have a development cluster, a staging cluster, a production cluster, etc. You might have clusters per region. But in order to deal with all of those different clusters, you really want to centralize the access. So you want to pipe all of your developer access to your clusters through a single chokepoint so that you can enforce your policies there, like a proxy or a gateway. And that gateway is a great place to attach to your SSO identities any requests and any modifications. That way you get accountability and user attribution from everyone in your organization about who does what and when to your production and staging environments. So you really want to make sure that the gateway only allows SSO users through and records their identity with their requests.
Jonathon: 00:07:15.576 The other thing is, even though you’re using Kubernetes and it hides a lot of the underlying hardware and infrastructure, as I mentioned in that layer slide, all of that is still there. The servers are still there. There are applications in your clusters. There are still operating systems. And most likely, you still manage everything under Kubernetes. So the worker nodes, control plane, access via SSH or something similar. So you want to remember all those different parts that go into a Kubernetes deployment. So you should do all of the same enforcement that you do for Kubernetes as you do for SSH or internal applications. That is, attach SSO identities to your access control. And when using SSO identities, you should map users and groups that exist in your identity provider to users and groups that you create within your Kubernetes clusters. And groups and users within Kubernetes can be created using native Kubernetes role-based access control, which is what I will get into now.
Role-based access control (RBAC) with Kubernetes
Jonathon: 00:08:32.485 So role-based access control, or RBAC, is a method of regulating access to computer or network resources based on the roles of individual users within your organization. So this means use least privileged and grant access based off of a user’s role within your organization. Now, Kubernetes RBAC Role or a ClusterRole — they contain rules that represent a set of permissions. Those permissions are purely additive, so there are no deny rules. Now, the differences between Roles and ClusterRole — a role in Kubernetes always sets permissions within a particular namespace. So when you create a Role, you have to specify the namespace it belongs in. A ClusterRole, on the other hand, is a non-namespace resource. So the resources have different names, Role versus ClusterRole, because a Kubernetes object always has to be either namespaced or not namespaced. It can’t be both. If you want to define a role within a namespace, use the Role. If you want to define a role cluster-wide, use a ClusterRole. And in my demo, I’ll demonstrate using a Kubernetes Role to limit a group to a single namespace. So I won’t be using a ClusterRole in my demo.
Demo environment overview
Jonathon: 00:10:10.671 As far as my demo goes, I’ll first provide an overview of my demo environment. So I’ve created within GitHub an organization that I’ve called kubernetes-org. I’ve created two GitHub users. An admin user and a dev user. I then created two GitHub teams. An admin team and a dev team. My admin user is a member of both teams. The admin team and the dev team. Whereas my dev user is only a user within this dev team. I’ve also limited within Kubernetes that the dev team can only be part of the dev namespace. They’re not going to be able to do any actions outside of that one dev namespace. I’ve set up two Kubernetes clusters, two internal applications. I’ve enabled SSH on one of my two Kubernetes clusters. And I’m using Let’s Encrypt for signed HTTPS certificates on my Teleport proxy. So again, everything I’m doing here is completely freely available in open source.
Jonathon: 00:11:25.008 So these are my two users. Again, the admin user, dev user. They want to access some resources down here, whether it’s an SSH server, Kubernetes cluster, an internal application. From their perspective, they will never directly access any of these. So that’s one added layer of protection — is you can now keep your Kubernetes server API hidden from the public. From these users’ perspective, they go through the proxy before they access anything down here. And how they go about accessing these different items is they can use tsh, which is Teleport’s CLI tool. They can use kubectl, normal kubectl with Kubernetes, or they can use a web UI that’s served from Teleport proxy.
Jonathon: 00:12:20.374 So for my demo, I have Teleport proxy and Teleport authentication service both running on a single EC2 instance. For production deployment, I would advise separating these two components and making each one of these highly available. So you might have a load balancer in front of a group of proxies. A load balancer in front of a group of authentication servers. And the role of the authentication server is to store all the logs that Teleport is generating. So those logs are a trail of who is doing what, when. And it’s all tied back to their identity. The other log is session recordings. So SSH sessions, kubectl exec sessions, and then also certificate authority.
Jonathon: 00:13:15.296 So when a user goes to authenticate to access something down here, what I have set up is this SSO integration. I created an OAuth app within my GitHub organization. And based off of each of these user’s group membership within my GitHub organization, they will be able to access something accordingly. So for the admin user, they’re going to be able to access everything. My developer user — they’re going to be limited to only being able to SSH as an Ubuntu user and they will be limited to the dev namespace within my Kubernetes clusters. So when either one of these successfully authenticate, they will receive a short-lived SSH certificate that they can use for accessing SSH servers, and they will also receive a short-lived kubeconfig that they will use for interacting with Kubernetes. And again, everything they do is tied back to their identity, as it is within GitHub, for this example.
Demo
Jonathon: 00:14:25.962 So I will now switch over to my demo environment. So the first thing that I will show here, first of all, this is my local laptop. So I’m not remotely accessing anything right now. How I’ve set up the integration with my GitHub organization and my Kubernetes clusters is using a GitHub authentication connector. So if I take a look at that in here, the part that I will highlight is this teams two log-ins piece. So what this is saying here is, "Here's my organization, kubernetes-org. Here’s a team within that organization." So anybody who is a member of this team called admins, they will be allowed to SSH as Root, as Ubuntu. And here’s where they get mapped to any Kubernetes groups that I’ve created. So in this case, they’re getting mapped to the “system:masters” group, which is an admin group within Kubernetes clusters. So they’re going to be able to do essentially anything.
Jonathon: 00:15:39.922 Whereas here, I’ve created this mapping also within the kubernetes-org in GitHub. This team I’ve created called devs, somebody who’s a member of this team within GitHub, they can only SSH as this Ubuntu user. And I’m mapping them so they only can be part of the devs group within Kubernetes. And I’ll show you that the devs group can only be part of the dev namespace in my clusters. So if I show you the role that I created for my two clusters, it’s pretty straightforward this one. This is saying this is in the dev namespace. This is the name of this role. And here are the rules. So somebody who has this role is going to be able to do all these verbs on these resources in API groups. So essentially everything within this dev namespace. And I’ve also created a role binding, which is going to bind a subject to that role. So in my case, the subject is a kind of group. And the group name I’m creating is devs also within the dev namespace. And this role binding is referencing this dev access role that I just showed you. So this is how I’m creating this devs group, and then also mapping it to the role so that somebody in the devs group can only do things within the dev namespace.
Jonathon: 00:17:18.258 So now to actually log into my Teleport cluster to show you my Kubernetes clusters and everything that Teleport can do for me here. I’m going to use tsh, which is that CLI tool used by Teleport. So I’m going to tsh login
to this proxy that I have. So this is its URL. And this proxy is being served from that EC2 instance that I mentioned a moment ago. What I’m going to do here — so by default, it’s going to open my Chrome browser. But I have two other browsers open. I have a Safari browser. This is where I’m signed in as my admin user in this GitHub organization. Then, I also have a Firefox browser open, where I’m logged in as this dev user who’s also part of that Kubernetes organization.
Jonathon: 00:18:17.760 So what I’m going to do is copy and paste that link that Teleport gave me. So it allowed me to successfully login via GitHub because I was already logged into my GitHub here as this user. So if I return back to my terminal window, I can see I’ve successfully logged in as this user. So that’s my admin username. This is the cluster. I get these SSH users that I can log in as. And here’s that “system:masters” group that I get to be a part of. So now if I do tsh kube ls
, I can see that there are two Kubernetes clusters that are part of this Teleport deployment. So if I want to switch between the two, you can see right now I’m logged into cluster1
. If I do tsh kube login cluster2
, it switches me over. And if I do any kubectl requests, like get pods
— so all the usual stuff. And if I kubectl exec into one of these pods, I'll do that here, that's the pod I'm going to go into with a bash shell.
Jonathon: 00:19:50.181 So now that I'm in here, I'm in this pod. I can do any admin stuff I want to do. Might jump around. Maybe do top. Fix what I need to fix. When I’m done, I can exit out of this. And so what I’ll show you now is if I go to the UI, so my Teleport UI and I’m going to do it in the Safari browser window because this is where I’m logged in as my admin user. And here, if I go to the login page, so this is the same proxy I was just interacting with in my terminal window, teleport.gravitational.io. So I’m going to log in using my GitHub team. So I enter here. First of all, these are two servers that I’ve enabled SSH access on. This one is that cluster1 Kubernetes deployment I showed you when I first logged in via tsh. If I wanted to SSH into this, I could click Connect. Choose one of these two users to SSH as. And this other server here — this is my actual proxy that I’m on right now where this UI’s being served from.
Jonathon: 00:21:08.961 But what I want to show you here is an activity in the audit logs. So built into Teleport is this robust audit log feature. So I can see even Kubernetes requests if I click the details on this. So I can see lots of information. IP addresses. I can see it was some request on this cluster. I can see that it was a get on pods. And everything in here is all tied back to this user within my SSO. So within my GitHub teams. So I no longer have to guess who might be SSH'ing or who might be using kubectl against a cluster. It’s all tied back here. And so even though there’s robust auditing, you can also set up third-party integrations. So you could have your logs be sent to something like Splunk or Elastic. And I’ll next show you what it might look like to use an ELK Stack.
Jonathon: 00:22:19.192 But first, I’m going to click this Session Player button. And so this is the full session. When I used kubectl exec
a moment ago, this is everything that just occurred. So this feature, I mean, auditors love this. I’ve also known a lot of developers, myself included, who love this when they have to go back and try to configure something that maybe they have not configured for a while, maybe they’ve forgotten some steps. You can come back in here, watch one of these, copy any commands that are occurring, and then use this to reconfigure something. So again, the two audit log types that Teleport produces are these ‘who’s doing what’-type audit logs and then also the session recordings, like I just showed here.
Jonathon: 00:23:17.014 I mentioned that I’ll show you an ELK Stack that I’ve deployed here. So I have this internal ELK Stack that I’ve deployed that I’m using to aggregate all of my logs. So if I come in here, the only way to access this is via Teleport. So this is in a private subnet. If I go to discover, I can come in here. Use queries to search for specific items. So in this case, what I’m looking for are any kube requests or any session starts. And so I can start building out queries and start building out alerts on anything I want to. The other application that I’ll show you while I’m in here is this kube UI.
Jonathon: 00:24:10.324 So you might have applications that you’ve deployed in your Kubernetes cluster. So what you can do here is you can expose those applications via this Teleport UI. So this Kubernetes dashboard I’ve deployed into one of my clusters, and it’s only accessible via Teleport. So I would have to use that same SSO login to actually access something like this. So you can get creative. Imagine all kinds of different applications you might deploy in Kubernetes and then expose via Teleport. Maybe Jenkins, Grafana, all that stuff. And so if I also go back to my audit logs, you can see that even opening these applications is generating audit logs. So I can see app session started. I view the details. So I can see it was this Elastic, this ELK Stack. And again, everything is tied back to this user’s identity.
Jonathon: 00:25:20.560 The last thing I’ll show as part of this demo is — so again, I mentioned that I created this integration with my GitHub organization in teams within GitHub organization. So I first showed you just now somebody who is part of the admins team they’re able to SSH as Root and Ubuntu. They get to be part of the “system:masters” group. And it’s because of this mapping I set up here. I’ll now show you somebody who’s in the devs team and how they can only SSH as Ubuntu. They can only be part of the Kubernetes devs group. So let me log out as that admin user. I’m going to log back into my proxy here, and I’m going to copy and paste this into my Firefox browser because this is where I’m signed in as this dev user. So if I paste that in here, successful login.
Jonathon: 00:26:31.245 Come back here. You can see that I’ve logged in as that dev user. I can only SSH as Ubuntu. I’m part of this devs Kubernetes group. So if I try to do something in the default namespace, it’s going to fail. Because this user, they’re only allowed to do things within the dev namespace. But if I now do kubectl get pods
on the dev namespace, now I can actually get stuff here. So to summarize here, I created an EC2 instance. Deployed a Teleport proxy onto that EC2 instance. And I used two different Kubernetes clusters that were deployed behind my Teleport deployment. Added two applications. One application was an ELK Stack. One was a Kubernetes UI that I had deployed within a Kubernetes cluster. And I showed you how you can map teams within your GitHub organization to any groups that you create within your Kubernetes cluster. So now let me switch back to my slide deck here.
Webinar summary
Jonathon: 00:27:57.194 So to summarize what was covered throughout this webinar. Remember that Kubernetes expands the attack surface of your environment. So if you introduce a new layer, you have to make some other layer less relevant. And again, those layers I’m referring to are things like network layer, cloud layer, end-user hosts, all of those. You should also turn SSH off for the majority of your engineering team. Having both present increases the probability of you getting compromised. But if you do have SSH access enabled in your Kubernetes cluster, just be sure to apply role-based access control and synchronize the two so that they have the same authentication gateway and the same access gateway. And access to all of your different environments — dev, prod, etc. — should all be controlled through the same gateway for access and for authentication. Then, finally, role-based access control tied to your SSO identities should also be used, and be sure to regularly inspect and audit all access. So thank you for attending today’s webinar. For next steps, here are a few links I recommend viewing. I highly recommend watching this webinar, Best Practices for Auditing Kubernetes. This one covers best practices for auditing, logging, generating alerts in Kubernetes.
Join The Teleport Community