Most Common Kubernetes Security Misconfigurations and How to Address Them
Most Common Kubernetes Security Misconfigurations and How to Address Them - overview
There are hundreds of public cases detailing how companies leaked sensitive user data accidentally due to misconfiguration issues. Any data breach can have catastrophic effects on the business and individuals of the compromise.
Along with misconfiguration, people within organizations are also targets. Attackers start with phishing campaigns to get access to the network. If a user's authentication isn't handled properly, it can result in long-term damage to your business. However, the risk of exposing data can be minimized through security scanners or reduced by providing strong centralized access controls.
Ben and Anaïs discuss misconfigurations, what they are and how to identify and fix them with Trivy, Aqua's open source security scanners for Kubernetes. Next, we showcase how to deploy identity-based infrastructure access using open-source Teleport Access Plane, moving to a secretless security model to make the secure way the easiest way of accessing infrastructure.
This webinar is aimed at DevOps engineers and developers who want to understand better misconfigurations and anyone who would like to learn how to set up more secure access through Teleport.
Key topics on Most Common Kubernetes Security Misconfigurations and How to Address Them
- Misconfiguration security scans are part of the landscape of cloud-native security scanning and also non-cloud-native scanning.
- Trivy is a comprehensive and versatile open source security scanner.
- Trivy has scanners that look for security issues and targets where it can find those issues.
- Trivy scans for private and public registries, local filesystems, and container formats such as tar archives, Podman and Git repositories.
- Tracee is an eBPF-based Linux runtime security and forensic tool and can be used in addition to Trivy.
- There are benefits to using Teleport with Trivy, for example to detect authorization or authentication misconfigurations.
Expanding your knowledge on Most Common Kubernetes Security Misconfigurations and How to Address Them
- Teleport Passwordless
- Teleport Machine ID
- Teleport Server Access
- Teleport Application Access
- Teleport Kubernetes Access
- Teleport Database Access
- Teleport Desktop Access
Learn more about Most Common Kubernetes Security Misconfigurations and How to Address Them
- Teleport Labs
- Kubernetes distributions
- Contribute on GitHub
- Join our Slack community
- Participate in our discussions
Introduction - Most Common Kubernetes Security Misconfigurations and How to Address Them
(The transcript of the session)
Ben: 00:00:11.255 Okay. Hi, everyone. I see people can now join. We're going to give it four minutes until we kick it off. So just hang on in there, and we'll get started at 9 o'clock.
Anaïs: 00:00:36.589 Does your mascot have a name?
Ben: 00:00:39.866 It's a good question. Yeah. For people who are joining, this is Pam, a mascot.
Anaïs: 00:00:49.646 The story of Pam.
Ben: 00:00:51.413 Well, Pam, historically, we said Teleport is Privileged Access Management but sort of a new way of doing it. And so there's sort of a throwback to the past of PAM, although we are not that closely aligned with privilege access management, but we have lots of people sort of coming in using Teleport for access management. And Pam's kind of a cute name. [laughter] And for anyone who was at our conference, we had Pam in a suit.
Anaïs: 00:01:18.379 Pam in a suit.
Ben: 00:01:19.771 Yeah. So there's a real-life Pam. So if you come to any of our events in person, you might be lucky enough — or a large conference, you can meet Pam and get a selfie with her.
Anaïs: 00:01:29.059 She reminds me a bit of Pepper. You know, from IBM, the robot?
Ben: 00:01:34.156 I don't think I've seen that one.
Anaïs: 00:01:35.754 You should check it out. It's a rolling-around tight-size robot that has a huge tablet, and it has small fingers and can dance. [laughter]
Ben: 00:01:49.970 Perfect.
Anaïs: 00:01:50.421 Pepper.
Ben: 00:01:50.494 That's what we need.
Anaïs: 00:01:51.925 Yeah. [laughter] [inaudible].
Ben: 00:01:56.228 We need more dancing robots. Hi, everyone. As you sort of join in, we're just waiting for people to join. We'll start in, I guess, like two more minutes.
Ben: 00:03:19.582 Okay. One more minute and we'll kick things off. The chat's available. Colin, thanks for your message. And we'll also be taking Q&A throughout the presentation. So if you have any questions as they come up, we'll be keeping an eye on the chat and the Q&A. So feel free to ask us questions as we go along. I'm happy to answer your questions.
Introducing the speakers
Ben: 00:04:05.295 Okay. It's 9:00 on the dot, so I think now's probably a good time to kick things off. Yes. Good morning, good afternoon, everyone. Thanks for joining today's webinar. I'm really excited to be here with Anaïs from Aqua. And today, we're going to be talking about the range of things, like Most Common Kubernetes Security Misconfigurations and How to Address Them. And as I've been preparing this webinar, I've done multiple deep dives into Trivy. It's been a great tool, and I'm really excited to not only find some of my misconfigurations but also learn how to fix them. So I'll do a quick intro, and then I'll pass over to Anaïs. My name's Ben Arent. I'm a developer relations manager at Teleport. And yeah. I'm really excited to have everyone here today.
Anaïs: 00:05:00.775 Hi, everybody. It's really great to be here. My name is Anaïs. Yeah. I'm the open source developer advocate at Aqua Security. And you might hear in the background some coughing. That's my poor, sick four-month-old puppy. So I apologize for the background noise. So I'm here to talk about Trivy misconfigurations. [laughter] Go to next slide. Oh, that's a puppy. [laughter] Just —
Ben: 00:05:29.021 It's a great way to start the webinar.
Misconfiguration security scans
Anaïs: 00:05:29.274 — a picture to [crosstalk]. Puppy pictures. Yeah. So apologies for any background noise. Let's hope going to stay that way. Next slide. Thank you. Yeah. I just want to hear from everybody joining, when you think of misconfigurations, just maybe put in the chat, what comes to your mind? What are you thinking of? It could be like word, buzzword, of a buzzword in the space, such as supply chain security or anything that you're working with, any tools you're working with that might come to your mind when you're thinking about misconfiguration. A lot of times there's not an official definition of what is a term, but it's whatever applies to multiple different things. So I'm just curious of those who are joining today of what you think of when you hear misconfigurations. Yeah. I'm going to keep an eye on the —
Ben: 00:06:26.859 We have opened the world —
Anaïs: 00:06:27.381 — chat as well.
Ben: 00:06:27.945 — [crosstalk] shouldn't be.
Anaïs: 00:06:30.110 Oh, I will get to that. That's a good sentence. Next slide. I'm going to keep reading the chat in between. Yeah. So misconfigurations are part of — or misconfiguration scans. Hopefully, not misconfigurations. Misconfiguration security scans are part of the landscape of cloud-native security scanning and also non-cloud-native scanning. But here is an overview that I'd like to show of the different open-source security scanners that we have in the cloud-native space. So the most popular ones tend to be vulnerability scanners. We also have compliance scans that are talked about a lot. Like in CNCF, KubeCon presentations, you will hear lots of people talk about compliance scans, compliance frameworks. Those are basically frameworks that look at how you should configure your resources, how your resources are supposed to run in your infrastructure. And then you can run scans to compare your running workloads with those frameworks and see how compliant is your infrastructure. So compliance scans kind of relate to misconfigurations as well. But then we also have a whole different, separate family of misconfiguration scanners. And misconfiguration scanners can — anything related to your configurations, anything that defines, how is your infrastructure supposed to run, and how is it supposed to interact with other resources? So rather than defining the application, it's like parts of the application, but it's not defining or not interacting with the application directly. It's more like to do with — how is it running, and how is it in your infrastructure?
Anaïs: 00:08:10.604 Now, in this talk, we're going to focus on Trivy because it's the open-source tool from Aqua. When I say open-source tools, you might realize that there is some not mentioned in this graphic because I don't consider them open-source. When I say open source, it's like, in this case, you don't send us any data. You don't sign up to those scanners versus there are other scanners where you sign up, where you kind of send us that project information. And the thing is it's particularly important for security scanners because to perform a security scan of your cluster, of your running workloads, that needs lots and lots of permissions, so. And that's something just to keep in mind that we'll also go into more detail later on, that if you run a scanning from within your cluster on your running workloads, you will give that scan a lot of permission within your cluster. So you want to know exactly, what is it doing? And with some of the scanners, it's not really clear of how much permission it actually has and how it's integrating with [inaudible] tool. Next slide. [laughter]
Trivy and Tracee
Anaïs: 00:09:13.416 So Aqua open-source projects. We have Tracee, which is an eBPF-based Linux runtime security and forensic tool. It can be used as well in addition to Trivy. It's just not directly related to security scanning as two projects are fairly independent. Now, Trivy is our all-in-one security scanner. It started off with vulnerability scanning, then moved to misconfiguration scans. And now there's a whole suite of different security scans. So that's why we say it's the all-in-one scanner. Next slide. [laughter] Here's an overview of everything that Trivy does. It's not just, Trivy as the project, that just does all of the different features, but it's actually also using other tools under the hood. So, for example, performance configuration scans. Trivy can scan your Terraform, your Kubernetes manifest, cloud formation, home charts, also customize your Docker file, and also, since the summer, your cloud account for misconfigurations.
Why Trivy is the all-in-one scanner
Anaïs: 00:10:12.685 Now, in the case of Terraform, Trivy is using another tool called tfsec that's fairly popular in the space for its misconfiguration scanning. So it just basically acts as this umbrella across other projects. So you can have this unified experience across your different scans. If you want to do misconfiguration scans and vulnerability scans, it's the same kind of process. Trivy can be used as a CI tool either locally, or you can integrate it with steps in your CI/CD pipeline. Now, we have lots of different steps available that you can just kind of copy and paste into your pipeline for more automated scanning. However, a lot of the misconfiguration scans will likely happen locally on your machine as you configure your resources as well as within your running infrastructure. Trivy also has some additional features that I'm not going to go — I'm not going to go into SBOMs or attestation or anything related. But the in-cluster scanning, we will look at that a little bit because as part of in-cluster scanning, we also do misconfiguration scans in the cluster. Next slide. Thank you.
Scan targets of Trivy
Anaïs: 00:11:19.557 So here's just a brief overview, a list of different scan targets that Trivy can get. So basically, a scan target, any resource that you want to scan for security issues, whether that's misconfiguration issues, exposed secrets within your workload or vulnerabilities. Next slide. Thank you. So how do we actually go about identifying misconfigurations? What is the process for that? When do you do it? How do you get started? So once you start to package up your application to create a containerized workload, to create a container image that you can use and other people can use, that's when you get started to configure your cloud-native workloads, also for your Kubernetes deployments, right? So you have to define a Docker file. A policy application actually is supposed to run. How is it packaged up? What other resources, what other third-party libraries you might be using? And similar. And that's when you can get started not only to scan for vulnerabilities but also performance configuration scans. And the earlier you get started across your development life cycle with misconfiguration scans, the easier it will be to mitigate security issues further down the line.
The process of performing misconfiguration scans
Anaïs: 00:12:28.608 Now, once you have your container image, that's when you want to package it or configure it with Kubernetes manifest, or if you have lots of Kubernetes manifests that relate to your application, you might want to use a templating tool, which is Helm or Kustomize. And then once you go to your deployment, you might want to use a such as Terraform to deploy your application. Now, each of these different tools across your stack will need different configurations, different ways of how they run your workloads in the end. And at each stage, you can obviously introduce misconfigurations. So it's really about this continuous process of performing the misconfiguration scans and then improving your configurations, your workloads over time. Next slide.
Ben: 00:13:12.274 I think there's actually a question here for container images. Are the scans done at build time or at runtime?
Anaïs: 00:13:19.584 You can do both. So basically, once you define the Docker file, what you want to get started with is scanning your base image that you're using or any third-party resources that you're using within that container image. Now, then when you use a CI/CD pipeline to build your container image, you want to maybe perform every time a scan of your Docker file and then packaged up container image, scan that. So that's basically at build time. And then once you deploy the container image to your infrastructure, use it within your tools, that's when you want to have a tool that's just Trivy-operated that performs continuous scanning. So it's really at both stages you want to have scans because of — I'm going to go into the reasons in the later slides as well why you also want to have then scanning of your container images and your configurations at runtime. Next slide. Great question. Thank you.
Learning about common misconfigurations
Anaïs: 00:14:22.770 So how do you learn about misconfigurations? When you have to, what kind of tools you're using, when you should do the scanning, but ultimately, how do you learn about common misconfigurations, and how do you mitigate those? So there are different ways that I would recommend people to learn about misconfigurations. If you're just getting started in the cloud-native, or you have been in the cloud-native space but maybe not so much focused on the security aspects of your containerized workloads, then I would just highly suggest you check out the resources that are out there by organizations, such as the frameworks that I mentioned earlier. We have the CIS and the NSA framework that are very popular, that allow you to perform the compliance scans. Then companies such as Aqua, Snyk, Sysdig, and Teleport are publishing very valuable resources on an ongoing basis. Also, as soon as there's a new vulnerability coming out, you will find a blog post that details the vulnerability, how to identify it in your workloads with security scanners such as Trivy, and then how do you actually mitigate it? How do you go about it?
Anaïs: 00:15:30.373 And then, obviously, following security professionals in this space, using security scanners, such as Trivy, I'm going to show you as well how you can use the scanner itself [inaudible] to learn more about the misconfigurations as well. And through that, you build up your experience. It's really not an either/or. I think getting started with security in the cloud-native space is really about a process because, ultimately, it doesn't really help you to look up common misconfigurations and then go one by one through your workloads and see if they are there. That's not going to be very effective. You're probably going to forget, or most of the common misconfigurations might not necessarily matter to your workloads, some of them. So it really depends on what tools you're using and how you deploy them and so on. So I don't think it's something that you would have to study such as how you would study other areas in the space. Next slide.
The value of scanning misconfigurations using Trivy
Anaïs: 00:16:28.670 So let's look at some of the common — or let's look first, why does it matter for us to look at misconfigurations with a tool such as Trivy? Can you click on that link? This is a very cool overview, a kind of repository of 2021 cloud misconfigurations that resulted in data loss in different companies, the impact that it had, and why the data was exposed, for example. And you have some less critical, some more critical issues. Ideally, or why you want to use misconfiguration scanners, because you want to be the one with identifying any exposed endpoint, any exposed data before somebody else does it so you can ensure that nobody's accessing, for example, the data in your S3 buckets. As you can see, there's lots of AWS we've listed here. It's very common to misconfigure your AWS resources, your services that you're using for AWS because it's very difficult to configure those properly. So misconfiguration scanners, such as Trivy, can then also tell you if there are any things that you could configure differently in your cloud account, yeah, to basically highlight anything that's not right there versus you having to click through your account or using the AWS console. Yeah. So this is a really interesting overview of that's why it matters to look at misconfigurations. Okay. Next slide. [laughter] Thank you.
Docker file common misconfigurations
Anaïs: 00:17:59.112 So let's look first at Docker file common misconfigurations. And I know this is not necessarily like — you can use containers without using Kubernetes in one way or another, but ultimately, that's where you want to get started looking at your misconfigurations because, like mentioned, anything that you have up in your container image that's not like it's supposed to be will triple further down the line and will make your infrastructure and wherever you use those container images less secure. So one of the things — here are five common misconfigurations that are commonly done and that can easily be prevented. The first one is using deprecated fields in your Docker file. They're deprecated for a reason, even they haven't been used like they are supposed to be used. And that's kind of a security issue. Or for other reasons, they have been deprecated. So don't use deprecated fields. User is not specified. If you don't specify a user, the root user will be used for the container. So you want to make sure that you specify a user of how this container image is supposed to run. The mounting host files, basically, any operating system files into the container, giving the container access to host paths. Unprotected secrets is very common. People use environment variables for their secrets and similar, not really managing them properly in a secure way. Generally, it's anti-pattern to use environment variables. And then exposing the wrong port is another thing that's posing — or it's an endpoint that shouldn't be. Next slide.
Anaïs: 00:19:35.538 So here is just a Docker file that I'm going to scan at the end of my part of the presentation and just show you how Trivy will detail the different misconfigurations in that Docker file. Next slide. [laughter] So looking at YAML manifest and common misconfigurations there, one of them is using the default namespace. Ideally, you want to define a namespace in your Kubernetes manifest. Namespaces are there for a reason because they allow you to have logical separation between your different workloads, your different running workloads, so they can't necessarily affect other workloads with another namespace. Now, if you set the namespace upon installation, then maybe the next person will not use that same namespace or similar in an upgrade of the application, for example. Another thing is running privileged containers or not specifying resource requests that can result in your infrastructure going funky. There are lots of different examples in that case. Also, if you define the image for policy, like of your container image [inaudible], it can happen that the Kubernetes keeps pulling that container image for various reasons and is causing outages or your application to go down and similar.
Anaïs: 00:20:52.891 Then, again, sharing host process and expose public endpoints. There are actually in the millions of Kubernetes cluster endpoint exposed because lots of times people are using local clusters and accidentally expose the endpoint of the cluster. Also, I think by default in AWS, if you spin up a container, the endpoint is public. You will still have to write permissions within AWS, but the endpoint itself is still public. So here's just from Target. In 2018, they were running over 20,000 lines of configuration files. And if you think about that scale, if companies run so many configuration files, the likelihood of having misconfiguration is just a given. Next slide. I think that was the point, yeah, when I'm actually going to share my screen. I'm going to show you how you can perform the scanning. So I hope you can see my terminal.
Ben: 00:21:49.360 Yeah. We can see it.
Misconfiguration scanning demo
Anaïs: 00:21:50.153 My dog is taking apart my chair. That's the noise in the background, apologies. [laughter] So once you get started with Trivy, there are lots of different installation options. With the CLI, you will have all of the different scans that you can perform. So here are the different scans. In our case, we are going to focus on configuration, misconfiguration scans, the Trivy config command. And here's just a list of other ways that you can use Trivy to scan other resources. So I'm here in a Trivy demo repository that I've set up. I can share, after I speak, the demo link. But you can see lots of different examples of Trivy, including the misconfiguration scanning. Now, in this repository, I have a bad-infrastructure directory. And within that, I have bad infrastructure related to Docker, Kubernetes and then Terraform. And I can use Trivy config to go ahead and scan, for example, the Docker file that's in this directory, in the Docker directory. So Trivy config. And then it's going to spit out a list of misconfigurations within that Docker file. And that's a Docker file that you've seen earlier. So it starts with using the latest tag in the base image. That's one of the things that you will often learn first when you get started with containers — that you shouldn't use the latest tag for your images.
Anaïs: 00:23:21.488 Now, in total, Trivy did 22 tests on this Docker file. And of those, five were failures, and it's going to break it down by the severity of the failures. Now, what we can do is — in this case, the list of misconfigurations is fairly easy to navigate and to go over the misconfigurations one by one. But what happens if you have a huge YAML manifest, let's say, or a larger file, and you're provided with maybe 100 misconfigurations. What you can do instead then in that case is break down the different — for the severity, for example, of those misconfigurations. So we can specify severity and say high in this case — I don't think we had any critical ones. And then we just see the three high misconfigurations within that Docker file. And then you can break that down further with the different tags just to make sure that you can filter the different number of misconfigurations, and you can address them one by one.
Anaïs: 00:24:26.672 Next, we go ahead, and we can scan also our Kubernetes manifest. And it's pretty much the same result. You will have a list in this case of different misconfigurations. If you perform vulnerability scans with Trivy, you will have a nice table, nice database. So this is basically the list of different misconfigurations in that Kubernetes manifest. Now, at this point, given how large this list is, it will probably rewrite it completely, this manifest, and then we scan it. Now, with Trivy, you could go ahead and use Trivy Kubernetes to scan your running workload, so your running container images within your cluster. Note what I, however, do instead is I use the Trivy operator within my cluster. So I have here just a basic kind cluster. And within that kind cluster, I have the Trivy operator installed. And because it's a kind cluster, it keeps breaking. So let's just kick that pod.
Anaïs: 00:25:31.498 So I have here the Trivy operator installed within the Trivy system namespace. And the Trivy operator, it's just basically running within the cluster and looking out for, well, any new resources that I have to scan. And then it performs on those new resources. For example, any new container image, it will perform vulnerability scans. If there are any new workloads, it will perform misconfiguration scans on those workloads. The thing is, if I have an application here, so for example, for my app namespace, I could go ahead, and I could change the configurations right through the cluster - right? — which is not something that people should do, to modify resources directly in the cluster. Now, if I change anything within the configurations to have more critical misconfigurations within those, then Trivy will pick those up in the next misconfiguration scan. And that's where you want to have static scanning, but also scanning of your running workloads. Yeah. That's it. That's what I want to share.
Ben: 00:26:31.756 Anaïs, I think there's some questions. One question from Jason, he says, "In that list you had failures in the tests running, what does it mean for — is that just things that didn't pass?"
Anaïs: 00:26:44.922 So it's basically checking — so Trivy has a list of common misconfigurations, like from across the industry, basically, best practices, also from the information that I give from the Aqua Security research team. Basically, here's the list of misconfigurations that shouldn't happen within your Kubernetes manifest. And Trivy will basically look at your Kubernetes manifest and compare it with the database of common misconfigurations and then perform checks. So if Trivy is performing 20 checks, then those are like the 20 common misconfigurations that could be present in that YAML manifest. But of those, if they're like five failures, then those are the actual misconfigurations you have in that YAML manifest. There were 20 tests performed, but five of those failed. Five of those are now the misconfigurations that you have present versus the other 15 passed are not part of it. Does that make sense?
Ben: 00:27:43.605 Yeah. That makes sense. We have another question. "Does Trivy have a management console GUI in addition to the CLI?"
Anaïs: 00:27:51.608 So that's on our roadmap. For now, you can easily integrate — and we have some tutorials also on the Aqua open-source YouTube channel on how you can set up Grafana and Prometheus with the Trivy operator, and then you have a really fancy dashboard. So that's kind of your alternative to a GUI. And then you can also set up alerts as you would do for new observability stack and similarly.
Ben: 00:28:17.048 So suddenly, if you had like 10x amount of misconfigurations, something bad's probably happening. [laughter] All right. We have another question. It says, "Will Trivy also find any misconfiguration that can lead to containers being able to access something it shouldn't?" So sort of an — I guess this is in the realm of privileged containers but maybe around namespaces and networking [crosstalk].
Anaïs: 00:28:39.825 So generally, yes. Well, generally, yes. But Trivy — it's more like the possibility of something happening — right? — versus Tracee, for example, will pick up any events in your cluster as they happen. So, for example, if somebody is actually — if a container is actually trying to access something it probably shouldn't, that's when Tracee would notify you versus Trivy, it's just more the possibility based on your configuration of something going wrong. So it really depends on — it depends on what the process is. Yeah. Because the thing is if you have a misconfiguration, it doesn't mean that — it could either translate into several things going wrong of that container image maybe doing lots of things, or through the container images, maybe lots of things could be done wrong. But not one particular thing. So you can't really look at one specific event or one specific action versus the misconfiguration that might relate to or —
Ben: 00:29:45.132 A high level.
Anaïs: 00:29:46.466 Exactly. An action might relate to multiple misconfigurations.
Ben: 00:29:51.491 Yeah. I think that's a fair answer. Another one actually, I had this question myself. When I was doing this, I tried running Trivy on GKE Autopilot, and it kind of locked down the control plane node. How does Trivy work for managed Kubernetes providers, such as Autopilot or EKS, where you don't have direct access to the control plane node.
Anaïs: 00:30:13.163 I haven't tried it yet on Autopilot. That's interesting. You mean, in GKE, it's just going to —
Ben: 00:30:23.247 I think it works with EKS because you have a system master's role. But I believe when I tried running it on Autopilot, I think I probably needed to define my service account, as what permissions Trivy hits, and they may not always be available on heavily locked-down managed Kubernetes clusters.
Anaïs: 00:30:41.312 That's the thing because, ultimately, any — that's the thing that — you have to keep that in mind, again, when you're choosing security scanner. Security scanners need basically access to everything in your cluster. You can't tell Trivy to just scan specific namespaces and then only access to scan specific namespaces. That's something you could do upon installation. But ultimately, in terms of resource within that namespace, it will likely access to everything. And that's why you want to be so careful when you're using security scanner that's from outside the cluster or that's sending data somewhere else because most of them have access to everything. Just iterating on that.
Ben: 00:31:17.186 Yeah. That's a good answer. Okay. I think that's kind of good for questions. Let me share my screen.
Anaïs: 00:31:24.629 [crosstalk]. [laughter]
Ben: 00:31:27.348 And I know we have a — let me resize mine. Yeah. My terminal there will be switching between. So do you want to see our last summary, Anaïs, for your slides?
Anaïs: 00:31:43.917 Our last summary.
Ben: 00:31:45.339 I think this was it.
Anaïs: 00:31:46.446 New security scanning. They are not as difficult as they seem to be. [laughter] So yeah. Through those scans, through misconfiguration scans, you can learn a lot about how Kubernetes actually works and why you would want to have certain configurations and setups and things like that. So yeah. Use it as soon as possible. [laughter]
Ben: 00:32:10.512 Okay. Great. And then I think, hey, we had a segue into my slide, which is Many Misconfigurations Around Access Control, which I can segue now. And so thanks, Anaïs. Be online to answer questions at the end as well. And so just to reiterate, the Red Hat State of Kubernetes security report from this year also said that most security instances, 46%, were a result of misconfigurations, much more than vulnerabilities, which is sort of interesting. It's not zero-day that gets you. It's someone leaving the front door open by accident. And one thing that we found at Teleport is often when you start a vanilla cluster, maybe deploy MicroK8s, or use one of the shelf's Kubernetes distributions, it makes it very easy to get started. And by making it very easy to get started, there might be some defaults which may not be great. And we recently made a video on securing a default Kubernetes cluster, five tips. You can scan both these QR codes to get the videos and also be in the show notes below.
Ben: 00:33:25.927 And I think this is sort of a segue into — is it a default, or is it a misconfiguration? One classic example of this is when you have access to the control plane, you will often have access to the system master's role and the system master's group. And the system master's group is sort of a super admin which can do sort of everything in your cluster. And you will often need a role with this much permissions to run security scanning tools. And I think this is why it's important to have an open-source security scanner because you do give it a lot of privileged access to your cluster. There's actually a great blog post from the Aqua team on — don't use system masters. And this outlines some of other options for using service accounts and sort of really limiting who has access to those roles.
Authorization vs authentication misconfigurations
Ben: 00:34:15.282 And in Teleport, we see two types. We see authorization or authentication misconfigurations. AuthN is often getting access to the API, and AuthZ is what people can do with the API. And so a common one you saw — I think Anaïs said there's like, I think, 900,000, maybe a million Kubernetes APIs in the public Internet. And some of these also allow anonymous auth, which is obviously not great to allow anonymous users to interact with your API. And then also just not securing it. You want to have like a firewall. You maybe want to limit the IP addresses. You may not want to put it on the public Internet, as an example. And then what do you do once people have access to the API is following the principle of least privilege. And this is around creating service accounts and users, which they can only do a limited amount of action within the cluster themselves. For example, not using system masters. And then not auditing service accounts. Along with human users, you might have multiple other systems, could be CI/CD. It could be other scripts. You might create long-lived service account tokens for them. And that has other problems, and I'll sort of dive into this. And then another example is container runtime security. I think Anaïs covered this, like running root containers. And lastly, not keeping track of logs knowing what people are doing within your cluster.
Ben: 00:35:49.808 And for people who are sort of new to Kubernetes, there is a whole sea of different authentication methods that you can use, going from tokens to authentication proxies to passwords. One of the best and most secure way is using certificate-based authentication, X.509, but there are a few caveats around using certificates for authentication. I would highly recommend looking at our OpenID Connect and also service account tokens for short-lived tokens and also tying it back to identity. And then you have to make sure that everything is well-maintained, so making sure you have user accounts and service accounts for people who have access in the cluster.
How to run Trivy
Ben: 00:36:39.042 So how do you run a tool like Trivy? I think we kind of already saw this demo from Anaïs. She downloaded Trivy. She already had the kubeconfig. You run Trivy. This is a great way to get started. And within the Kubernetes flag, there's a few other options. So you can parse in different kubeconfigs, or you can parse in the namespace to reduce the scan. But how do you get these kubeconfigs? So depending upon your provider, what you have, your Kubernetes installed — there's a range of tools that you use. So if you're running something locally, like Rancher or MicroK8s, they have tools to sort of export your kubeconfig locally. If you're using a cloud hosting provider, you'll be quite familiar with commands such as the EKS Kubernetes export. And at least I know for the EKS one, this export certificate is valid for 24 hours, whatever authentication process you have with AWS.
Potential problems with kubeconfigs
Ben: 00:37:41.145 But there are some problems with some key kubeconfigs. kubeconfigs can't be revoked that easily. So in the case of the 24-hour one from EKS, that kubeconfig is pretty much valid for the 24 hours. I don't believe you can reduce the time. And you have to be careful that sometimes you can get a very broad range of permissions, which can lead to dangerous operations, depending if you're switching to multiple clusters. Last thing, I run tests all the time, so I run multiple clusters. It's like knowing what my current context is. Am I doing this in my staging cluster, my production cluster? And then lastly is knowing who's interacting with the API.
Ben: 00:38:21.798 So this is where I'm going to introduce Teleport. Teleport is an identity-native access for engineers and machines to get access to the kubeconfig. So I'm going to sort of show you the experience of using Trivy with Teleport and bring my terminal here. Let me make this a bit bigger. And this is sort of how I start my day. So I'm going to log into my Teleport cluster. And I have a couple of Kubernetes clusters. You can see this one's trying with Autopilot. And I need to log in to my Kubernetes host. And so I can now get pods. I can, let's say, exec. I can just sort of go about my day interacting with the Kubernetes API. And since I have the local kubeconfig, I can now also run Trivy. And so I'm going to run the Kubernetes. Also, you can use K8s report to run a summary on my MicroK8s cluster that I have here. And it's just running. I believe this is downloading the —
Anaïs: 00:39:44.573 If you haven't used Trivy within like 6 hours, it will re-download the database. So every 6 hours, if you're not — you can also define a different time period, just mentioning it. Because especially for the K8s command, what happens is — it's doing the secret scanning, which takes a lot of time. I mean, among all the other scans. But you could also turn off some of the secret scans, but just mentioning.
Ben: 00:40:15.820 Okay. Thank you, Anaïs. That was very helpful. You can see in my screen I have the vulnerabilities. You can see I have a few critical ones here, all my ingress controller, my shell pod. And so now we can go — is it all to get the more detailed response?
Anaïs: 00:40:43.528 Yeah. I think so.
Ben: 00:40:44.817 Well, that runs. Teleport also comes with a web UI. Sort of under the hood, you can see that we have the activity. So all these Kubernetes requests here are the requests that Trivy is making on my behalf. So you can see it is looking for cluster role bindings. This one's looking at cluster roles. And so all of this activity is also audited along with the API. And another benefit of using Teleport is it also records kubectl execs. So you can see this was the kubectl exec that I performed. And sort of engineers and developers, your day-to-day flow is very smooth. You can just sort of go about just using a few different changes to how you're getting your kubeconfigs. But from a security and also compliance perspective, it makes it very easy to see what's happening to your cluster. And then also having an audit log of all of the Kubernetes commands that you're running. And so you can see here my MicroK8s cluster has a lot of misconfigurations. And actually, one of my favorite editions of Aqua, which I don't think Anaïs went into, is there's always a page that you can go to, and it gives you a more detailed information to jump off and learn more about what's the impact of different vulnerabilities, which is, I found, super helpful.
Ben: 00:42:16.287 And so just to summarize, I don't think you saw me going through the GitHub authentication since I was already logged in, but all of my users are backed by GitHub SSO, since I already authenticated for the day. I can do
tsh status. You can see I have access for 8 hours for my certificate. And then you can also see I have the not-great system master's group, but this is also something that can be customized on a per-role basis through the Teleport mapping. Roles, YAML files, but I'll show you the UI here. There's options to add Kubernetes groups and Kubernetes users and then also use labels for role-based access control.
Ben: 00:43:06.471 So yeah. Demo using Trivy locally — I've done this. So just to summarize, Teleport consolidates the connectivity, the authentication, the authorization, and the audit, like I showed all four of those. So this was running it locally, which is great to get started. One thing that you might want to do between the operator and running locally is running something in, let's say, a CI/CD service such as GitHub Actions. And I think your option one is — you create a service account. You export a long-time-to-live service account, and then you upload them to GitHub Actions. And then the second option is — we get the same flow. You register your cluster in Teleport. You create a GitHub Action bound token, and then you use GitHub Actions. So for this flow, I've done the same thing, but instead of myself, it's the user accessing my Teleport cluster, my kubeconfig. I'm going to let GitHub Actions run and obtain the certificates for me using a new edition, which launched last year, called Machine ID. And what's kind of great about this edition is— it’s also completely secretless. Let's rerun this. I can run it live. And so I'm going to run this job here. And this job is going to just run the Trivy summary command, and it's going to fetch some binaries. We have our own GitHub action. Let's see how quick it's going to take. All right. Maybe I'll show you a complete one.
Ben: 00:44:48.751 So if I come — oh, I think I'd have to let it run. So this is just configuring things. It's going to create a few different things. I'll show you the credentials afterwards. As this runs, I will come back and show you. So the old way, we talked about this. You do a kubectl create service account, GitHub actions. You create the service account and you export it. You have long-lived kubeconfigs, which are bad. It's also impossible to rotate X.5O9 certificates within Kubernetes without a lot of breaking of other things. So really recommend using short-lived credentials. And then also, there's not a huge amount of visibility into these credentials. If you're looking to use Teleport Machine ID, the process flow is the same. You create a new token. You have a join method, which is GitHub. And so there's actually no token actually needed for this one. The token isn't a secret. We have GitHub Actions, and it runs in CI/CD. So let's see where we are.
Old way vs new way
Ben: 00:45:57.538 Okay. So now we're running. I think we run into the same thing. We're uploading our database here. But if I come up, let's see. You can see I have access to that shell demo that I showed you. And then the same as with all human actions, these actions are also recorded in Teleport itself. So you can see the bot GitHub Actions have this request. So there's a full audit visibility into the user accessing it itself. Oh, and then we see scanned. And the formatting is a bit weird, but it's the same results. I have a lot of critical issues that I should resolve in my Kubernetes cluster. And so there's some benefits around how it rotates it, how it retains the kubeconfigs. It's also secretless. There's no long-lived tokens that you have to worry about rotating. And if you run this on Machines, it also supports our certificate authority rotation as well, makes that very smooth.
Ben: 00:47:05.772 So I think the third way, which Anaïs already covered, was the Trivy operator. This is a great way to install it and have a constant visibility into what's happening. And then last up, fixing the misconfigurations. I think I mentioned the Report All. This is a great way to see links to the vulnerability database that links out to places. It's very helpful. I think actually someone asked this in the chat. "Can you view the logs?" I guess there's the — output to sort of save the results to a file. And I think similar to many people who are like — let's say you have lots of tests that are failing. I think the best way to start is create a baseline of how many security issues you have. I think I might check out the Prometheus dashboard, track them, and just make sure that you're hitting the key ones that are key to your business and try to resolve the most important ones that are relevant to you, but also keep track to make sure that you're not deviating more from the baseline.
Protecting internal dashboards
Ben: 00:48:10.354 Oh, and then I have a bonus one. So I think prior to this webinar, Anaïs shared this tweet. What do people see as critical in these misconfigurations, and what should you avoid? And one thing that we see a lot is people exposing dashboards. So it could be the Kubernetes dashboard. It could be any internal dashboard that also inherits a very privileged role within Kubernetes. Anaïs, do you have any other favorite ones that you saw from your tweet?
Anaïs: 00:48:38.736 I mean, it's really just give your containers access to everything in your workload, to make your life easy and expose endpoints, so other people can manage your resources effectively, things like that.
Ben: 00:48:52.258 Yeah. And so this is like a bonus one. So along with accessing the Kubernetes API, you can also use Teleport to access internal applications. And this means that you don't have to put these applications on the Internet. You can host them within Teleport, launch it, and sort of run it as a sidecar to get access to these internal apps. So here's an example wiki. You see it's available on the public. There's a URL that's publicly available, but it's protected. You have to go through the authentication. So if you do have dashboards that you want to provide to a wider team, this is a very easy way to give them access without having to open up an extra ingress gateway that you have to manage your firewall and limit.
Next steps and Q&A
Ben: 00:49:38.414 So I think next steps are — I have a typo here. So these ones are the wrong way around. We have Trivy, aquasec/trivy, and then gravitational/teleport, the two places that you can download and try Teleport. So this kind of brings us to the end of the webinar. I know we've been asking questions as we've gone. Let me see through the ones. So feel free to ask us questions, and we'll kind of go through them now. I think Roy asked, static password file. In the case of Teleport Machine ID, there's tokens, but the tokens aren't secrets. The way in which you create them, you limit which GitHub Actions can join it. And it's another method for GitHub Actions specifically. We also support CircleCI, GitLab, and AWS IAM token join method. I think we had another question. "How can we see the K8s scanning results?" I think that's mainly through the CLI output. We can save it to a file. Any other methods for the results?
Anaïs: 00:51:04.460 So the K8s scanning in the cluster — it will produce CODs, Kubernetes custom resource definitions, which are basically YAML reports. And then you could get the YAML reports out of the cluster as well as an alternative or visualize them through Grafana and another tool.
Ben: 00:51:26.017 Okay. It looks like there's no more questions here. I give a couple of minutes, and there's always like a last-minute question. So feel free to type them in. All right. Well, if there's no more questions from anybody, feel free to reach out to both me and Anaïs. What's your best contact information, Anaïs, if people want to get in touch with you and ask more about Trivy?
Anaïs: 00:51:54.789 Usually, it's Twitter. But with Twitter being in flux, you can always join the Aqua Security Slack. So if you look for slack.aquasec.com, that will lead you to our open-source Slack. If you have any questions, we have the channels there as well. Yeah. [laughter]
Ben: 00:52:16.744 Yeah. And the same for Teleport, we have a community Slack. It's goteleport.com/slack, and that will take you to our Slack channel, although you can also open probably GitHub Issues if you have any issues, which actually helped me when I was preparing this webinar. So it's very helpful. So thank you everyone for joining today. Do you have any last closing thoughts or comments, Anaïs?
Anaïs: 00:52:43.646 This was great. Yeah. It's great to see more people joining this webinar. Yeah.
Ben: 00:52:49.881 Oh, and I have one last question around, "Does Teleport integrate with CircleCI?" Yes. It does. And I think this actually just came out in 11.3. And so we're looking for people to try it out. I'm happy to give you a demo about that as well. Okay. Great. All right. We're going to call it —
Anaïs: 00:53:12.285 Thank you.
Ben: 00:53:12.383 — everybody. Have a great rest of your day, rest of your evening. Thank you so much for joining today.
Join The Community