Navigating Access Challenges in Kubernetes-Based Infrastructure
Sep 19
Virtual
Register Today
Teleport logoTry For Free
Background image

Overview

In a hybrid work setup, workers should be able to authenticate themselves in the virtual environment securely. However, identity theft and technologies like deep fakes ensure that securing identities remains a major challenge. Organizations want to ensure their identities are safe and hack-proof. Join industry leading practitioners and experts to learn how to protect identities and topics including:

  • Implementing identities in the cloud
  • The impact of zero trust architecture on identity
  • The future of passwords and biometrics

Zero Trust - Replacing Depth with Logic | Identity & Access Management eSummit 2022

(transcript)

Blake: 00:00:00.000 Thank you for joining us. I'm Blake. I work with Teleport as a solutions engineer. We provide access to all of your infrastructure, anywhere with a zero trust solution, and that's what we're going to be talking about today is Zero Trust and how you can replace depth with logic, both on a general scale, and we'll take a look at Teleport as well. Now we're going to be talking about, again, how access is handled today, taking a look at a couple of case studies, talk about my own experience having worked in DevOps, and then dive into how things can be solved going forward. Now, as far as how things are handled today, in a typical legacy enterprise environment, we're relying on password vaults, shared keys, VPNs, and that typically comes with an ununified auditing and traceability aspect to that. By that, I mean as you're moving between Kubernetes to SSH into your Linux environment to accessing your database, being able to maintain an end-to-end trace of what somebody did throughout their work session is something that you typically can't see in any normal access solution. And then different solutions for different platforms is often a big problem, especially when we're looking at multi-cloud environments that everybody's working in today, whether AWS or GCP or Azure, as well as private cloud or self-hosted. Being able to address all of your infrastructure from a single point is definitely key when you're considering how you can unify access in a safe way.

What's wrong with the legacy access solutions?

Blake: 00:01:28.084 So what's wrong with the legacy access solutions? Well, I can tell you just from my own experience, I was DevOps at JPMorgan Chase for several years as an engineer within the hedge-fund industry. And then even before that, I was at General Growth Properties, a real estate investment trust. And everywhere we were relying on shared passwords. And typically how that would work is I'd have to get a ticket number, go to a password vault, pull out my password, and access my infrastructure. Now, that whole process — to get the ticket, to meet all the ticket requirements, to go and get the password, and meet the documentation requirements — that would take several minutes — 5, 10 minutes sometimes depending on what kind of ticket I needed to make. So if somebody had already pulled out that password on my team, I might just ping them, ask them for the password, and save me time so I can get into the infrastructure and fix the problem sooner which was an innocent ask, I guess, on my part. But it's an easily exploited loophole that somebody might be building within your just-in-time access solutions. Now, keys are also commonly targeted vulnerability. Sometimes up to 35% of SSH traffic in the world is actually just scanning for keys. Now, keys are something that you can harden, but again, they are something that once compromised are not easily revoked. Now, maintaining complete traceability is not likely with legacy access solutions, which is something I just covered. But yeah, again, moving between Kubernetes and seeing what somebody is doing in an exec session, and then following them to the next stop, which might be an SSH session to a Linux host, and then to the next stop to Windows server, a database. Maintaining that end-to-end traceability of what somebody is doing throughout their workday is not easy to accomplish. And that is something that somebody — that is something that a lot of progressive enterprises are definitely desiring today.

Equifax 2017

Blake: 00:03:24.197 Now, typical solutions today are going to lean more on the side of security when you're trying to balance out security and agility. And with that, you're definitely addressing your legal risk, but it's leaving the door open for reputational and financial harm. And that can really kind of metastasize in a number of ways. But if we take a look at Equifax in 2017, definitely some irreparable financial and reputational harm was done here. And this is all done because an admin/admin username and password were being used for a dispute portal Equifax had, and that provided public-facing customer data. And behind that, there's a lack of auditing session traceability. No multi-factor authentication — customer names, addresses, Social Security numbers, a lot of personally identifiable information was compromised. Ultimately, $700 million in fines were leveraged against Equifax, and it was pretty significant. And this worked as a case study to not only update your password beyond default of the admin/admin, but also increase just complexity in general. Implement your multifactor authentication. Find solid, robust auditing tools. But regardless of how strong your password is, a password is always going to be a vulnerability. So if we can just rather than changing admin to something that takes characters with two symbols and a capital and a number, what if we could just replace it and nothing could be brute-forced, and we had an easily revocable certificate used instead? And we can take a look at that in just a minute.

Colonial Pipeline 2021

Blake: 00:05:16.161 But I also wanted to talk about Colonial Pipeline back last year, 2021. And this was just a compromised password that was found on the Dark Web, which is very, very common nowadays, and VPN connection that didn't require multi-factor. So after this Russian hacker was able to buy the password, I assume, for very cheap, and was able to make a connection to the Colonial Pipeline network. There was no additional layer of multi-factor. There was no additional layer of security. So it's kind of once you get in, you're in. And what did that result in? That resulted in gasoline shortages, surge in prices. Five-day supply chain disruption for Colonial. And that came with a lot of reputational harm, a lot of financial harm, both for Colonial and for the world. But this, again, proved a couple of things here. This proved that any password is potentially compromised. They did their best to investigate. They couldn't find the source of the leak. They went back and reviewed his e-mails and chats, couldn't find any instance of social engineering, and they did determine that his password was plenty complex. So in the end, investigators were not able to find a root cause as to how the exploit of the password happened. So just again, there's just an inherent vulnerability anytime you're going to be using passwords for anything. And just to kind of circle around to current events, I mean, with the Russian hacking, and obviously there's a lot of traffic coming from over there right now as it relates to ransomware. Being able to step up to a Zero Trust approach to your access, especially as it relates to critical infrastructure, is really key at this point in time. If you aren't protected, you're leaving the door open, which is where Teleport can step in. Now, replacing passwords or short-lived certificates is fantastic, but these certificates, on top of just replacing passwords, are revocable. They are able to be used across multiple protocols. So as it relates to Kubernetes, SSH into Linux and Windows server access, database access, application access, many different AWS integrations, we remove passwords from all of these different scenarios.

How Teleport works

Blake: 00:07:41.406 And the way Teleport does this is you're going to start over here on the left and interfacing with Teleport through one of a couple of ways. There's tsh. That's our command-line utility. There's also going to be a GUI client we're going to take a look at. And then there's Teleport Terminal, and that's going to be a new product that's going to be released relatively soon for more of a unified Kubernetes and database experience. And we'll take a look at that, where that will come in in a minute as well. But after you reach out to the Teleport proxy, it's going to reach out to the Teleport authentication service, either authenticating locally or reaching out to your SSO provider. Based on those group claims from your OIDC or SAML provider, one or multiple Teleport roles will be applied. So each role has a time to live based on the principle of least privilege. The lowest time to live is going to be applied to the X 509 TLS certificate that gets issued. And that certificate can be leveraged for SSH into your Windows and Linux, your Windows server, Kubernetes databases, as well as applications and those AWS integrations I had mentioned. Now, with the certificate, one extra benefit, especially as it relates to traceability, is there's a lot of metadata about the user, and that's going to be that certificate that's going to be leveraged across the entire teleport session. So again, for that end-to-end traceability and finding out where your user went, it's really unbeatable. Now, as far as just-in-time access, up here we support a few different solutions out of the box, like Slack, Mattermost, Jira, PagerDuty, GitLab, and e-mail. And then it's just very, very customizable beyond that. So with Slack, definitely our more common integration. So I, as a user, have access to request specific roles. You, as a manager, have access to approve requests for specific roles. And if it's ever appropriate, we can require multiple levels of management to approve an access request before that elevation is granted. So I as a user can go ahead and generate the request. A notification will get sent off to a Slack channel. I, as a manager, can go ahead and click on it, hop in and approve. Now with PagerDuty, it's a little bit more comprehensive where you can schedule out people's shifts. So as people work on-call, as people are working on the weekends, it will automatically approve the elevated access request, at least whatever level you think is appropriate. Now, at my last workplace, I would have to go to ServiceNow, create a ServiceNow ticket, go to my password vault, supply the ticket, get the password, and then access my infrastructure. Now that same kind of workflow can be accomplished here. I did build out that integration where I provided a ServiceNow ticket to Teleport. It goes and checks to make sure it's within SLA, it meets priority requirements, and then it'll automatically approve. And if it doesn't meet requirements, it'll automatically deny, or if it doesn't sense a ticket number, it'll just skip and wait for manual processing. And I'm only saying that just to kind of illustrate how customizable it is and really show that whatever your access workflow is currently, we're very likely to be able to support it.

Teleport demo

Blake: 00:10:41.988 All right, I'm going to go ahead and log in with Okta. I will go in as my Admin account here. All right. So one thing that's going to be pretty consistent throughout your Teleport experience is this label-based access, and that really helps us break down that access as granularly as possible, especially when we're using the RBAC controls, and we'll take a look at that in a minute. But these labels can be simple like environment equals prod, or it can be a little bit more dynamic like host name equals up to the command host name. We also have integrations with AWS that we're able to help you build out where you can automatically pull in EC2 tags and turn those into labels. So as you onboard your instances, it makes permissioning really, really easy and seamless, especially when you consider that we also have EC2 Auto Discovery. Again where you're onboarding EC2 instances and they'll automatically be spun up with Teleport and connected to your cluster. So both things to consider, and we're going to circle back and take a closer look at SSH in a minute. And this is applications. So this is going to be covering a lot of our AWS integrations. So AWS-console access and CLI access and then more on like a base layer, it is just proxying public web traffic to your private instances. Now for apps like Grafana or any other JSON web token enabled app, if there's a like-for-like username — so if carlos.admin, for example, exists on both sides, we go ahead and pass that JSON web token over to Grafana for seamless login. And now with apps like Jenkins that natively have a reverse tunnel proxy or other apps with similar integrations. We can support those as well for seamless integration. Now with AWS, what we have here is we've deployed an EC2 instance within my VPC. That EC2 instance has a role that says it can assume a whole suite of roles. Now Teleport takes that access, breaks it down, and federates it across Teleport roles. So from Teleport, I'm able to access AWS console, and just assume any of these roles directly through my browser. Or I can also do the same from the AWS CLI. So any interaction with AWS, STS, API, I'd be able to do through Teleport as well.

Blake: 00:12:55.540 So let's go ahead and click here. Let's hop right in. And there we are. We're in our AWS console there. Now, as far as user administration, none needs to be done for carlos.admin on the AWS side. That's just all going to be handled through those IAM roles, Teleport, and your SSO provider. Now, if we move on to Kubernetes, like I mentioned, there is Kubernetes and database are both going to be kind of a less unified experience just for the short term. Right now, we are releasing Teleport terminal that's coming out, I think very soon, kind of within the next couple of weeks or month. But for right now within the GUI, you can see what you have access to. It's going to provide us some information on how to log in. And we'll do that in just a minute and take a look at how to access both database and Kubernetes. Now here's our desktop access. The way that we provide access, passwordless access, to Windows servers is going to be with an LDAP S-connection, leveraging a service account, and publishing Teleport CA to the [inaudible] container. And then we're taking all the user data during log in and virtualizing that with a smart card. So when we go in here.

Blake: 00:14:13.951 All right. So now in here is app support. As you see at the top, I have clipboard sharing disabled. That is just to make sure that my user cannot export any data. I've determined that this is going to be a secured terminal. So actually, in this case, I think I have — let's see MySQL Workbench in here. Actually, one second. Let's hop back out and we're going to log in as a different user. But I think I have set up over here. So let's log in as administrator. All right. So one thing I just wanted to kind of address, an additional use case here. So we are logged in. Oops. Let's go ahead and close that out. We are logged in as administrator on this Windows host. Again, I've described as a secure terminal. And the reason that we're doing that, well, we locked down clipboard sharing, so people accessing the database can't export that data because this looks like personally identifiable information that's including dates of birth, and what grades, and gender people are. So this isn't necessarily information that we would want somebody to be able to export, back up, or copy and paste out of here. So we've disabled clipboard sharing. We do have session reporting turned on, and we'll take a look at that as well. And then I also have this connection here. Let's see if we go back over here. This connection here, as you can see, is also going through a Teleport cluster. This cluster is exposed to my cluster through a — this cluster is exposed to my cluster privately. So I have to access — the only way I'm able to access this database is by logging into the secured server without clipboard sharing and with screen sharing — sorry, with session recording enabled. And that really ensures a maximum level of traceability and accountability as your users are accessing the most secure data in your environments.

Blake: 00:16:23.819 Now, if we keep going on here, we can take a look at our activity here, and we're going to circle back. So we have active sessions, we have session recordings, we have audit logs. We'll take a look at all those in just a minute. And then as far as roles, just to kind of give you an idea, we can hop right in here. Let's go to our access role. So access role is kind of showing you how the label-based access works. A lot of this is all wild carded for easy use on my end right now. But we have our app labels, our database labels, names, users, Kubernetes groups, labels, users, SSH logins, and the node labels that will allow the user to have access to as far as SSH-ing. We do have enhanced recording, which I mentioned earlier, is going to be that colonel level log in. So as far as being able to capture those commands that users are running within their session, have it stored in a JSON format, and export it to Splunk, or Datadog, or Elasticsearch, and really build out actual actionable alerting on that data. You'd be able to do that with enhanced recording. And then down here at the bottom, finally, we have our Max session time to live. That's the time to live that gets applied to my X509 TLS certificate. So let's see if we hop over here. Let me hide this real quick. Here we go. All right. We're going to hop back in over here, and we're going to take a look at SSH as well as some of the access controls here. So if we hop in, we're going to hop into a different browser here, and log in as a contractor, and walk through the whole access control workflow.

[silence]

Blake: 00:18:20.055 Okay, cool. So, whoops, I logged in as the wrong user.

[silence]

Blake: 00:18:44.375 Okay. So here we are. We're logged in as Joe, the contractor. Behind here I still have Carlos, the admin logged in. And we're going to pull up my command line because we're going to do a side-by-side here in just a second. But over here is Joe, the contractor. I'm going to go over to my access request and take a look at these. I'm going to go ahead and just send two access requests just to show you the difference and how they might behave. So here we go. Send this one that just says test. All right. Then we're going to put in actually a couple here with ticket numbers just to show you how this might work. And then let me grab one other ticket number here.

[silence]

Blake: 00:19:40.411 All right. So now as we can see here, we've just sent these on. If I go ahead and refresh here, we can see that I have this one is approved, this one's denied, and this one's pending. So 8111 is a ticket that does not meet priority requirements, so Teleport automatically denied the ticket. Now the one ending in 005 does meet SLA and priority requirements for this level of role that was requested. So it went ahead and automatically approved that access request. And then it didn't sense a ticket number in this request reason, so it left it for manual processing. Now over here with our Slack integration, the way that you can kind of see how this would work is you can see how the statuses are updated as they get updated. So you can see what it said here, "Automatically approved," because it meets SLA and priority requirements, and the other one was denied. Now if we go up here we can see the other one with the reason of test that's still in pending, and it is still pending here. We can hop in as Carlos and go ahead and approve, and again apply a justification if we'd like, and go ahead and submit that. Now if we go over here now we can assume this role. We get this nice little eight-hour countdown, and this is based on the time to live within the admin role. Now I have the same access as Carlos does.

Blake: 00:21:06.066 Okay. So now I'm going to go over to the command line and we're going to do a quick SSH demo. Now that we have Joe logged in. So tsh status is going to show me who's logged in. So it looks like Carlos is logged in over here. I've got 7 hours, 58 minutes left. So now tsh ssh, let's hop in as root over on this host, and we're going to start typing a command. So we'll leave our LS there and we're going to go over here as Joe, the contractor. Now as long as Joe, which we just know he does, has access to the same user and the same host as Carlos, and then he has access to active sessions, we can actually hop right on over here and take a look at what's going on. So if we hop in joint session, we can see this LS that Carlos has left off. So now over here on the right, as Joe's a contractor, I can continue to finish the command line, and then Carlos can also just pick up right where Joe's leaving off as well. So it's really nice for onboarding, shadowing, and also, in addition, with bringing on maybe external vendors. If you're bringing in anybody that you don't completely trust or know, and you need to provide them access to your infrastructure, you can actually enforce session moderation with that elevated access request. So as somebody logs into your infrastructure, somebody else's eyes have to be on the glass and watching what's going on. And again, just giving you that extra layer of safety and accountability. But now, if we go and exit here, we can go ahead and take a look at how this is all recorded. Let's see — it's got activity, audit log. And what we see here, if we go down a little bit, we can see Carlos' certificate get issued. So let's start there. This is all that information, that metadata that's tied to Carlos' certificate, all the information that we would need to know. And this is everything that's going to, again, add just an additional layer of traceability as you're moving between your various sessions.

Blake: 00:23:17.074 But we have that certificate that got issued. We can see when everybody got connected, when Carlos started the session, when Joe joined it. We can see when the session ended and we see a session uploaded here. These are all logs that are stored in JSON format, so like I was mentioning earlier, if you needed to send this off to maybe like Datadog or Elasticsearch, Splunk, these are logs that you would be able to ship off and build out actionable alerting on our data analytics. Now, again, if you turned on enhanced logging, all the commands that I would have ran would have been recorded here as well. But in absence of that, we do have session recordings. So session recordings are going to be what we can see Carlos and Joe right here on this host and we can hop right in. It's going to be this really nice copy-and-pasteable playback, which I know from my time in DevOps, that a lot of my time was spent building up knowledge-based articles from books documenting major incidents. So rather than relying on screenshots or transcribing, this saves a lot of time to be able to just copy and paste as you need to. Now, if we go back, we're going to take a look at Kubernetes and then Database next. So let's hop over here under the command line and we're going to go tsh kube ls. And that's my cluster that I have access to. So if we go into tsh kube login, we'll just hop into that development cluster there. And what that did there was that copied down and merged my kubeconfig locally. Now, at this point, if I wanted to go either from kubectl or from Lens, I can hop right in and leverage my TLS certificate in addition to my local kubeconfig, and get working on my clusters. And again, because this is something that is copied down and merged, and not replacing your kubeconfig on a day-to-day basis, as an engineer, I'm only going to need to do my tsh login, get my certificate, and as long as I already have that kubeconfig, I can hop right into my preexisting tools. But at this point, we can hop right in and go kubectl get pods. Whoops, I think we need to put in the namespace. Yeah.

Blake: 00:25:44.946 All right. And then we're going to just pop in and do a quick exec here. So kubectl -m. Whoa, couple too many T's there, and then exec -it, and let's hop into this pod here. All right, just running a quick, easy command, and we'll go ahead and exit there, and then we'll take a look at how that got recorded as well. So if we go over here, go to our audit logs, we're going to see any interaction with the Kubernetes API is going to be recorded in the audit log. So right here are get pods, the first one without the namespace, the second one with the namespace rather than kubectl — it's written as an API response because that's what we're recording — is any interaction with the Kubernetes API. So we see get pods here. And again, this would be something that would be shipped off to your team to build alerts on. And then over here is our exec. So, again, just like SSH, we don't see any additional audit logs for what happened in that exact session, but we do have a session recording. So we can see Carlos was on this cluster exec into this pod. And if we go back and play it, we're going to have that same nice playback that we had for SSH for our Kubernetes exec sessions as well. So, again, very, very nice as far as end-to-end traceability, and figuring out what happened during any individual session. So now let's hop back over here, we'll take a quick look. Well, actually, I'll just hop into Lens real quick and kind of show you how this is all set up. But as you see here, I have all these different clusters that are all getting proxy through my Teleport cluster. If we hop right in, we can take a look at our pods. And again, I just want to show you this just to show you that your preexisting tools do work. There isn't a ton of information for me to look at here, but you can see that my nodes and my pods are working.

Blake: 00:27:52.900 Whoops. But yeah, you can see that my nodes are up. You can see a little bit of information about my cluster just based on the little bit that I have set up here in my demo environment. Now, let me go ahead and close that. Let's go on to database here. So tsh db ls, like I mentioned, database and Kubernetes are going to be more of a unified experience. So that same look and feel that we have for the GUI over here is what Teleport terminal is going to feel like. Except you're just going to be able to access database and Kubernetes from the same pane of glass as all the rest of the features. Now as far as accessing from the command line though, we're going to go tsh db login --db-name and call up the schema here. All right. So what happened there was we actually copied down and merged a — or not a kube [inaudible]. Sorry, I'm still stuck in the last one. We copied down an SSL Cert, so with tsh db config, this is all the information I would need to set up any SSL-enabled IDE for my MySQL. So in my case, I'm using MySQL Workbench and that's down here. We can take a look at that in just a minute. But at this point, I can hop in and do tsh db connect, and we're going to call up the same stuff here. So let's go like that, and we can just run a quick Select, so. And let's go ahead and exit there just to show you how it gets logged real quick. So if we go over to audit logs, we're going to see Carlos running this select query on that schema on this database. If we open it up, see a little bit more information. I can see the address to my RDS instance. I can see the database user that was used, a little more information about who I am, what I did. Again, these are all things that we shipped out to your SIEM. Now one other thing to note on database access is just like EC2 auto discovery for AWS, we do have RDS auto discovery as well. So as you're spinning up databases, we can automatically add those to your Teleport cluster. So it really makes onboarding easy.

Conclusion

Blake: 00:30:09.172 I think we covered everything here. So we've covered SSH. We took a look at the auditing. We took a look at the session recordings, and how elevated access requests work. And then we took a look at applications, how JSON web tokens can get you seamless authentication, how you can use our Teleport to get you into your AWS environment both from the console and the command line. We did our Kubernetes access, and took a look at how interactions with API are audited as well as the exact sessions and same for database. One thing that I guess we can circle back on real quick is going to be our desktop session recordings. So now if we go down to here — what you're seeing here is our playback. And to limit the storage requirements needed, we're actually just capturing PNGs at every input. So whether it's moving a mouse around, whether it's typing, it's going to capture a single PNG at those different points, and it kind of compiles it all together, and gives you kind of a nice almost stop-motion playback for what happened during that session. So again, making sure that you have an eye on how your data is being handled when somebody is accessing a secure database, or just making sure your Windows servers are secure. This is really going to add that extra layer of traceability and accountability to the environment as well as eliminating passwords which is really the key here. Now I think we've covered everything here. Thank you for your time. You're here. You can reach me for any kind of technical advice on Teleport, or if you just want to get in touch and chat about cybersecurity, I always love to talk. My name is, or sorry, my e-mail address is [email protected]. There's my LinkedIn address. If you'd like to reach out to us and schedule a demo or talk about sales, you can always reach out to us at [email protected]. Again, thank you for your time, and I hope I said something of interest to you guys and have a great day.

Join The Teleport Community

Background image

Try Teleport today

In the cloud, self-hosted, or open source
Get StartedView developer docs