New White Paper: From Zero Trust to Agent Trust

Try for Free Contact Sales

Background image

Unshackling Productivity: Access Control for Modern DevOps

Published: December 16, 2024

Unshackling Productivity: Access Control for Modern DevOps

Engineers hate security processes that throw off their rhythm. As modern, ephemeral, and highly scalable infrastructure becomes the norm, your engineers feel the pain more acutely. They need fast, frequent, and secure access to the resources they need when they need it. This webinar explores the bottlenecks created by applying legacy access controls to modern infrastructure and illustrates three case studies of how real-world companies broke through the access barriers to make their engineers happier and more productive.

Key Takeaways:

1. Death by a million paper cuts - Replacing VPNs, bastion servers, shared credentials, and secrets vaults.

2. Deadlines don't care about access controls - How just-in-time access can help you launch products faster.

3. DevOps efficiency and lifecycle management - How to onboard and decommission employee (and machine) access with ease.

Transcript - Unshackling Productivity Access Control for Modern DevOps

Thip: Good morning, good afternoon, or good evening, depending on where you are tuning in from today. And thank you for joining this TechStrong Learning Experience with Teleport. My name is Thip, and I'd like to welcome you to our TechStrong Learning Program. This session will be available afterward on demand. We will also be sending you a link to the recording after we conclude. Please send any questions that you have to the Q&A panel on the right side of your screen, and join us in the chat by telling us where you're tuning in from, share your experiences, your comments, your understanding of the matter. We want to hear it all from you and engage with you. Any questions that we're not able to get to, we will follow up with you shortly after the program concludes. We also have handouts available for downloading in the Resources section. Let's kick off today's topic, Unshackling Productivity: Access Control for Modern DevOps in Three Acts. Thank you, everyone, for joining us today. We have Francesco Garofalo and Ben Arent with us today. Can you guys take it from here?

Francesco: Yep, thanks, Thip. So hey, everyone, thanks all for joining today. A little bit about me, I'm Francesco Garofalo, and I work on product growth here at Teleport. Maybe before we dive in, I'll give a brief introduction on my background and then, Ben, maybe you can as well. So a little bit about me, prior to joining Teleport, I was a platform engineer at an e-commerce startup called Wish, where I experienced kind of firsthand the challenges of managing developer access to infrastructure at scale. I've also been on the startup side of things, having co-founded and run a travel deal site, where I saw how even small teams can get bogged down by poor access management practices. So today's topic is particularly close to home for me because I've lived through both sides of this problem, both trying to implement proper access controls at scale and then also dealing with the productivity impact of manual access management on a small team. Ben, do you want to give a little bit about your background?

Ben: Yeah. Thank you, Francesco. Hi. My name's Ben Arent and I'm a Director of Product here at Teleport. I've been at Teleport a long time now, coming up to six years next year, and I've seen the evolution of the platform. And it's been interesting to work with many small companies, fast-growing organizations, to protect and secure their infrastructure. I'm going to be here for primarily the Q&A and also moderating the chat. So if you have any questions as they come up, I'm more than happy to answer them. All right. Back to you.

Webinar agenda

Francesco: Thanks, Ben. Yep. So maybe I'll just set the agenda here. So in this session, we're really going to focus on exploring three key areas. First, we'll look at the current state of infrastructure access and why the traditional approaches that we're using are kind of holding teams back. Then we'll examine how modern solutions like Teleport can actually improve productivity rather than slowing it down. And then finally, we'll look at some real-world examples of how teams are actually transforming their access management to allow their developers and enable them instead of blocking them. So with that, let's start by looking at some of the challenges that modern DevOps teams are dealing with today.

Infrastructure challenges

Francesco: So here, I'm just going to walk you through five major infrastructure challenges we're seeing across the industry. The first is managing ephemeral infrastructure at scale. The reality of modern infra is that we're constantly managing thousands of containers running across multiple regions. And that means that the traditional way of handling access simply doesn't work when your infrastructure might only exist for minutes or hours at a time. Second, as teams expand and companies go global, global access bottlenecks suddenly start becoming too common. And with developers spread across different time zones, coordinating access kind of turns into a constant back and forth that slows everyone down. Manual access management is also a problem that scales surprisingly quickly as well. Teams often start small with simple processes, but as they grow, managing permissions, Access Requests start consuming more and more hours of engineering time every week. And then you have compliance and audit requirements that add another layer of complexity here. What starts as a simple, "Who has access to what?" question ends up turning into a major project every time you need to demonstrate security controls to customers or auditors. And finally, there's obviously the quintessential security versus speed trade-off, right? This one's particularly frustrating because it feels like we shouldn't have to choose between moving fast and staying secure. But with much of the existing tooling that exists today, that's exactly a choice that our teams are kind of forced to make. So these are just the five challenges that we've identified, but maybe let me show you how these play out in practice with the real-world example that we dealt with recently here at Teleport.

Real-world example: ExtraHop’s challenge

Francesco: So I'm going to go over a recent engagement we had with a company called ExtraHop. For background, they're a network detection and response company that provides cloud-hosted security software to their customers. What I think is particularly interesting about their case is that this is a classic conflict between maintaining SLAs and security. So being a security company, they have to meet some pretty stringent SLAs for their customers. They're running cloud-native detection software, so they need to be highly available and have guaranteed uptime and pretty much table stakes for competing in this industry. But I think this is what makes their infra requirements pretty interesting. Since they're handling sensitive customer data, every customer needs to be isolated. So multi-tenant nodes are basically a non-starter for this type of product. Each customer session needs its own dedicated node. You need to start it up, run the session, stop it, delete it, no reuse allowed. And this, for them, scale to thousands of ephemeral environments. We're talking about a half dozen Kubernetes clusters with hundreds of nodes each, spinning up and down dozens of times throughout the day to maintain those SLAs.

Francesco: Now, if you look at their setup here, they had the usual bastion host routing traffic into their VPC with Kubernetes clusters, pretty standard. But here's where we relate back to the pain points we discussed. Their infrastructure team started using some shortcuts to move quickly between all these nodes. You probably all know how this goes. Shared secrets, hard-coded credentials, things that work fine with a small team, but become like a real headache as you scale. And because of this, their security team started raising two big concerns, one being that those shared secrets are floating around, and two, the fact that they had minimal visibility into who was actually accessing what. And then obviously, managing access for team members or offboarding employees became a real pain point across all these environments. And I would bet that what I'm describing here probably sounds familiar to a lot of you. You've likely seen similar patterns in your own setups. So now let's take a look at a modern approach to solving these challenges.

Francesco: So what do modern access requirements actually mean for DevOps teams? In our opinion, these five core requirements are table stakes for addressing the pain points we just walked through. So first, you need an identity-based access to be a foundation. When ExtraHop and other teams try to scale with shared secrets, they'll inevitably hit a wall. Moving to identity-based access isn't just more secure — it actually helps eliminate a ton of overhead from managing shared SSH keys and static credentials. Dynamic discovery also becomes essential when you're running the kinds of ephemeral environments we just discussed. So your access layer needs to auto detect resources as they spin up and spin down, and because trying to keep track of everything manually just really doesn't scale. Then you also need to build security directly into your workflows from the start. So for us, that means Zero Trust by default. Teams that get this right end up spending their engineering hours building features instead of patching security gaps down the line. And also, cloud-native scale isn't optional anymore. Obviously, your access solution needs to be able to handle everything from a single dev environment to thousands of distributed nodes without becoming a bottleneck. And last, protocol unification drives some of the biggest productivity gains in my opinion. So instead of jumping between different things like SSH, kubectl, Database Access, web apps, your devs can be more productive by working through a single interface.

Teleport’s approach

Francesco: So with these requirements in mind, let's actually look at how Teleport puts this into practice. So first up is that unified access that I just discussed. Looking at the CLI here, you can see that a single `tsh login` command gets you in. That's it. No more juggling different credentials or connection methods for different resources. Whether you need SSH, Kubernetes access, or databases, it all happens through this one entry point. You'll notice that we're using familiar CLI commands, so there's no need to reinvent the wheel here. If you know how to use SSH or kubectl, you already know how to use Teleport. The difference is you're doing this all through a single authenticated session. And then for auto-discovery and resource access, it's pretty straightforward. Just run `tsh ls`, and you get a complete view of everything you have access to. You can set up Teleport to automatically detect resources as they come online, and you can access them either through the CLI or through our web UI. And because everything goes through this single entry point, you can actually get comprehensive audit logging automatically. And every session, every command, every resource access — it's all tracked. And this is obviously great for compliance, but I think where it really shines for us devs is really in troubleshooting. When you need to figure out exactly what changed in prod during that incident last night, you can just go back to your audit logs. And I always find that looking at the architecture is helpful for understanding how this all pieces together. So let me show you how this all works under the hood.

Francesco: At the center of the design, you've got your proxy and auth services. The auth service really acts as your CA and serves as the foundation for our identity-based access model. So instead of managing static credentials, it issues short-lived certificates for both users and resources. And I think the real power of this design comes from how our agents connect back. So notice how the resources on both sides connect back to the cluster through reverse tunnels. Through these connections, the agents can run multiple services, including automatic resource discovery. This means that as your infrastructure grows, new databases, Kubernetes clusters, or SSH nodes come to mind, they can be automatically detected and enrolled in your access controls. And the proxy service here is what allows for that unified access plane that we just discussed. So it handles all of your protocols through a single entry point. And because everything flows through this architecture, you get a comprehensive audit logging automatically out of the box. Obviously, this design is built for cloud-native scale. Your resources can live anywhere, and certificate-based authentication guarantees that Zero Trust principles are maintained even as you scale thousands of nodes.

Francesco: So to tie all of this together, these are the core components that power everything we just walked through. The proxy service handles that unified access. Role-based access controls give you the granular information you need. And then auto-discovery manages the ephemeral resources, and the short-lived certificates eliminate the need for side credentials. And then SSO integration connects all of this to your existing identity stack. And because we know that you're not managing anything manually, let's look at how this integrates with some of your existing workflows.So for infrastructure management, we integrate directly with Terraform. This means that your access policies, roles, user configurations all become part of your infrastructure code, all version- controlled, reviewable, automated. And with Teleport's auto-discovery, your access controls automatically scale with your infrastructure. For CI/CD pipelines, we've built Machine ID, which is a way to give you automation like secure programmatic access through short-lived certificates. And once configured, your pipelines get identity-based access that works across all of your environments. This means that your existing CI tools can connect through by identities that you control, making access both secure and auditable. And ultimately, what this means is that you and your team can focus on shipping code while Teleport handles the access layer. Your automation will keep running, your security improves, and access management becomes part of your normal infrastructure workflow rather than a separate operational task.

Francesco: So we've covered how Teleport integrates with infrastructure and CI/CD pipelines, but what does this actually look like as part of your engineering team's day-to-day workflows? So as we've discussed, auto-discovery means that you and your team aren't manually registering every new database, Kubernetes service, or cloud resource. Everything just shows up in that unified access plane we discussed. On the developer side, I think this is where Access Requests come in. When you and your team need elevated access, maybe you need to debug a production issue or deploy a sensitive update, they request it through Teleport. Those requests can then flow into your existing tools like Slack or Jira, where approvals can happen right where your team already works. Access is time-bound, so it automatically expires, and you get that full audit trail without adding any operational overhead. So yeah, this is basically how we're turning access management from a bottleneck into an integrated part of your and your team's workflows. Your engineers get the access they need through their normal tools with all the security controls built in. And now that we've covered how Teleport works and how it integrates into your existing workflows, maybe let's just kind of wrap this up by taking a look at the actual impact that this has in production environments. And then I'll go over exactly how you can get started by implementing Teleport with your own teams.

Teleport’s Impact: ExtraHop

Francesco: So earlier, we took a look at some of the challenges that ExtraHop was facing with managing access across their infrastructure. Now let's take a look at some of the specific impact they achieved by using Teleport. So first, they were able to consolidate all of their Kubernetes and SSH access through a single proxy. What this means in practice is that they now have one entry point for all their access controls instead of managing access through their bastion hosts. And they were also able to completely eliminate shared secrets by moving to identity-based access tied to their Okta. And even better, they did this without having to make any major configuration changes directly on their nodes. So basically, their infrastructure team doesn't have to touch hundreds of node configs or manage authorized keys files anymore. They can just use Teleport. And for auditability, they now have complete session logging S3 in audit trails for every user action. And while I know this sounds more like a security team win, for us on the infrastructure side it means we can actually track down what changed during an incident without digging through scattered logs. So in my opinion, these improvements really enabled their engineering team to maintain the speed they needed for their SLAs while giving them proper access controls that actually work for their scale. And if you want to dig deeper into their implementation architecture, you can check out the full case study using the QR code on the right. I may pause here for five seconds before moving forward.

Teleport’s Impact: TigerGraph

Francesco: Cool. So here's another example of access control challenges with a company called TigerGraph. For context, they run a distributed graph database platform handling some pretty serious workloads. We're talking about things like fraud detection and anti-money laundering for major banks, processing several terabytes of data for these customers at pretty high speeds. So their situation was pretty similar to ExtraHop. Given their customer base, they have pretty stringent security requirements with each customer instance needing complete isolation. As they grew, their support teams spanned the US, UK, China, and they obviously need to access and troubleshoot these environments on demand. And so that growth led to many of the same issues we discussed, right? Shared private keys that, actually, we're having through a single point of failure with an engineering manager here in the US. I think we all know how that story goes. So after looking for a solution and implementing identity-based access with Teleport, things for them changed pretty quickly. Now their global support teams have role-based access to customer instances, and there's no more key management headaches or single points of access. Plus, with proper audit logging, they're meeting their SOC 2 and PCI requirements without adding any operational overhead.

Francesco: So maybe, yeah, let's wrap up with how to get started with Teleport. Let's maybe summarize what we covered in terms of scale and coverage. So starting with resource coverage, we support over 150 different resource types. Beyond core infrastructure needs like SSH servers and Kubernetes, this extends to databases, web applications, essentially any resource your teams need access to. Also, multi-cloud support is built into the foundation in Teleport. Whether you're running on AWS, GCP, Azure, Private Cloud, or a combination of these, Teleport will work across all of them without requiring special configs for each environment. And then for teams managing infrastructure as code, like we said, everything in Teleport integrates with your existing workflows. So your access control has just become another part of your infrastructure definitions. And underpinning all of this is zero-standing privileges, which is really just a fancy way of saying no permanent access. Everything is just-in-time, role-based, and automatically revoked when not needed. So this gives you and your team the security and controls you need without creating additional friction for your engineers. And I think the key takeaway here is that Teleport is built to handle real-world infrastructure at scale. Whether you're managing a handful of resources or thousands of them, whether you're on a single cloud or spread across multiple providers, Teleport's architecture will scale with your needs.

Francesco: And now to wrap this up, let's take a quick look at how you can actually get started with Teleport, and then I will open it up for questions. So I'll go ahead and share links for our trial signup and open source code in the next slide so you can try Teleport yourself. But to get started with Teleport, here's how I'd recommend approaching it. Start by taking a zero-trust approach and start with your crown jewels, your highest impact resources. Typically, these would be production databases or Kubernetes clusters that your team access frequently. In my opinion, this gives you immediate value where it matters most. Next, connect your identity provider so you can use your existing auth system rather than managing a separate set of credentials. Then you'd want to deploy the Teleport proxy service to service your unified access point. And once that's up and running, I'd go ahead and turn on the auto-discovery that we discussed. So this way, your resources will be automatically detected and enrolled in Teleport, making access management fairly automated going forward. And last, in my opinion, Teleport truly shines when your whole team's gaining access through the platform. So once you're familiar with the setup, start rolling it out to your broader team. By this point, you should feel pretty comfortable with Teleport and be ready to expand across additional engineers, teams, and resource types. And with that, I think we're ready to wrap up with some Q&A.

Q&A session

Thip: Thank you so much, Francesco, for that insightful presentation. You've covered some incredibly valuable points, and I'm sure our audience has found it as informative and engaging as I have. Before we open the floor to questions, I'd like to reintroduce Ben Arent, who is Director of Product at Teleport, and will be joining us to answer any questions. Please feel free to type it into the Q&A chat box on the right, and we'll address as many as we can. So let's go ahead and dive into your thoughts and queries.

Ben: Yeah, I can take this first one here. Thank you, Francesco, for a great presentation. And we can maybe do this together. So I think this question is — what's the easiest way to migrate from an existing solution, whether it's SSH keys, shared credentials, without breaking everyone's workflow? I'm just kind of interested here. What would you recommend, and I'll give my answer.

Francesco: Yeah. Sure. So personally, I think taking a phase approach that prioritizes maintaining your regular day-to-day patterns and productivity is probably the best approach. So I'd probably start small. Maybe in this example, like we talked about ExtraHop, they began with a small pilot project using a single team and a single type of access. I think it was just SSH access. In my opinion, this kind of lets you validate the approach with the contained team and environment. And then you can keep your existing systems running in parallel. So Teleport is designed to work alongside those environments, and teams can gradually transition while using familiar tools and workflows. And then roll things out in phases. So start with your most frequently accessed resources, begin with SSH, then move on to databases, maybe Cloud IAM. And then that pattern probably matches what we've seen with other customers. But I think the key is really just letting devs keep using their familiar tools and commands while Teleport handles the security improvements. But yeah, would love to take your opinion as well.

Ben: Yeah. I think there's a few phases. One thing that we didn't talk about is some of the security basics. So on the backend of Teleport is a X.509 certificate authority specifically for infrastructure access, which is also OpenSSH compatible. And when we talk about SSH keys, we're talking about public-private key cryptography that people create their keys and they upload them. By using Teleport for SSH access, for example, instead of using public-private keys, it issues shortlist certificates. And these are compatible with OpenSSH. And yeah, to your point, you can run the Teleport agent on those hosts and run both. And we even have a tool in Teleport Policy, which is one of products we didn't cover today, which can also discover authorized keys on hosts. And so we can also help teams think about, "Hey, are there shadow access patterns? How are you thinking about other access?" I'd say another nice benefit with Teleport as well is by using Teleport, you can just whitelist the Teleport server so you don't have to worry about opening up remote IPs or — but prior to Teleport, I would have to whitelist my home IP and my office IP. And this opens potential security headaches too. And so by using Teleport, everything kind of goes through reverse tunnels, and so you get a lot of security benefits.

Francesco: Awesome. Yeah.

Ben: All right. I think we have another question here. If you do have any questions, there's a Q&A tab, feel free to send them in. But I have another question here, which is — how does Teleport consolidate audit logs across Kubernetes, databases, and Cloud Access?

Francesco: Yeah. So basically, we have a unified audit trail. Everything is consolidated through a central gateway. We capture things for databases, SQL queries, connection events, user attribution. For Kubernetes, cluster access, bot operations, full SSH session recordings with playback. So everything is stored as structured JSON events. Those can be exported directly to your [inaudible] tools, and you can use built-in search through AWS Athena to query that. I think the key benefit here is that every action ties back to the user's true identity. So when you're investigating an incident, you have one place to see everything — a developer access across all of your infrastructure. Anything you'd add to that?

Ben: No, I think that's a great overview. I think another addition that's unique on Teleport, since everything is very deep in the protocol level, if you look at our database reporting, it's not just that the user is connected to the database. It's the query that they ran. And we actually get more low-level protocol information about which query they ran, which can be very helpful for figuring out which — it always ties back to the identity of the user or the machine, and you get reporting too. And I think we didn't really touch on it, but you can also use Teleport for your machine or service integration as well and get that same visibility.

Ben: I think we have a question here, which is — let me answer this one. What are the relevant challenges to take into account when team consolidates integrating, developing, and automating? Oh no, hold on. I think I saw my Q&A's gone back. I think I accidentally published one. Okay. How does it work in terms of on-prem setups? Does it require any agents to be installed on all your services? So I'll take this one. So Teleport — we have both a cloud-hosted edition, which I believe is here for the signup. But as you see here, we are an open-source company where you can also run this binary on-premise. So you can run both the proxy and the auth server on-premise. And we even have a FIPS mode. So we have people running in sort of highly regulated compliant environments, which are more or less air-gapped. And so we both support on-prem and cloud. Agents, we mostly recommend that you install the Teleport agent or the helm chart for Kubernetes. It makes setup much easier. But we do provide sort of agentless ways for AWS Systems Manager. And so there's a few workarounds if you don't want to install an agent on hosts. But by installing the agent, you also get a lot more visibility into what actions are happening on those hosts.

Francesco: Hey, I see a comment here that maybe might be relevant for you given that you've spent so much time here at Teleport, but what are the relevant challenges to take into account when teams are considering integrating, developing, and automating the new system?

Ben: I mean — I think this kind of goes into any kind of change management when you're acquiring any new tools. Most of our larger customers — I'd say people who do really well with Teleport — have a large amount of end users and a large amount of infrastructure. Nothing is relatively simple in dealing with multiple accounts, multiple labeling. And so we have sort of a proof of value process that our team sort of sits down. They work out the success criteria. They work out what's the basics you need to get done. And then maybe just like some house cleaning, "Hey, is there like a certain area of your databases that you think, 'Oh, everyone's logging in as one user, and how would you think about using roles and different permissions?'" Maybe your data science team only needs read-only. And so sort of planning this out and then mapping those to RBAC and labels within Teleport — starting there and sort of getting the value of the visibility and the security. And generally, once we have been successful in a smaller part of an organization, most other parts say, "Hey, this has been a great addition in improving our infrastructure access security." And then we sort of roll out into other areas of the business.

Francesco: Yeah. So basically, that same phase approach that we just discussed.

Ben: Yeah. Yeah. Kind of like tabletop it. I think it's good to start with your staging, but sometimes it's good to start with a really sensitive production system first because that's kind of what we're seeing. A lot of identity compromises. Engineers have longstanding access to sensitive data or sensitive systems and hackers know that. And so sometimes you can live with your staging being kind of insecure, but maybe it's better to start with your production systems first and kind of work back from there.

Francesco: Yeah.

Ben: All right. What else do we have coming in here? Okay. So we talked about Access Requests and zero-standing privilege. How does it work with other plugins? You want to give a start of that, Francesco?

Francesco: Sure. Yeah. So I mean, Teleport integrates directly with tools like Slack and PagerDuty. So you can kind of streamline these Just-in-time access approvals. At a high level, like for Slack, yeah, I think Access Requests automatically post to your designated channels. Reviewers can approve or deny directly from Slack. Each request will show who needs access, what role they're requesting and why. Workflow is pretty quick. So one click to approve and the requester can get immediate access. Again, for tools like PagerDuty, you can basically tie into your on-call rotation. So on-call teams will receive requests through the PagerDuty app and can approve or deny on the go. There's even, I think, an auto approval feature for team members who are already on-call, which hopefully should help reduce friction during incident response. And then, obviously, keeping in mind that these integrations — they keep detailed audit log of who requested access, who approved it, what actions were taken. So the permissions are also time-bound and they automatically expire. So no need to manually revoke access later.

Ben: Yeah. I think all the concepts of Just-in-Time Access and Access Requests go to the principle of zero-standing access or the principle of least privilege, giving people just the right amount of access at the right time. And it's a bit like using YubiKeys for your MFA token. It's much harder to have — if you have a phishing-resilient hardware token — it becomes much harder to fish your employees. Same. If you have zero-standing privilege, it becomes much harder to get access. And also, in addition to Just-in-time Access Requests, we also have very sensitive systems, Moderated Sessions. So you can say, "Hey, we work a lot with people in the crypto space. And if you're on a box that has hot wallets, the chance of insider threat trying to drain it could be very alluring." And so you can even have multiple people on a session, and if they see someone doing something kind of nefarious, they can terminate that session. And then this can be good for regulated environments as well. All right. So I know we're coming up here. I think that's most of the questions that have come in. Let me see the chat. If there's anything else, we're happy to take those questions.

Thip: I think that's all that we have for now.

Ben: Thank you, everybody.

Thip: Thank you.

Francesco: Yeah. Thanks, everyone.

Thip: Before we close out, please click the survey in the Resources section as your input is invaluable to us. Your feedback goes a long ways here. So please do let us know about that or any other closing or final thoughts that you've got. Once again, you will receive an email in your inbox shortly after we conclude today's session with a link to view it. I would like to thank Francesco and Ben for taking the time to join us. It's been a pleasure having you. Thanks for joining this Tech Strong Learning Experience. Looking forward to seeing you all again. Have a great day.

Francesco: Thanks, Thip. Thanks, everyone.

Learn more about Unshackling Productivity Access Control for Modern DevOps

Join The Teleport Community

Background image

Try Teleport today

In the cloud, self-hosted, or open source

Get Started View developer docs