Reducing Complexity to Increase Infrastructure Security
Reducing Complexity to Increase Infrastructure Security - overview
Many traditional methods of delivering secure access to infrastructure resources such as Linux & Windows servers, databases and Kubernetes clusters involve configuring multiple, complex technologies such as VPNs, secret vaults and privileged access management systems. These systems increase IT complexity and unnecessarily reduce security as savvy engineers find workarounds to the agility they impose. But it doesn't have to be this way — modern approaches to infrastructure access can be both easier to use and more secure. In this webinar, you will learn how consolidating the essential elements of infrastructure access — connectivity, authentication, authorization and audit — into a single platform yields not just productivity gains but also improved security and compliance.
Key topics on Reducing Complexity to Increase Infrastructure Security
- You need to securely deliver the right resources to the right users
- Traditional methods are overly complex and require significant maintenance
- The four criteria of good infrastructure access solution are: connectivity, authentication, authorization, and audit.
- There's complexity involved in trying to deliver these four criteria.
- Teleport is the simplest solution that delivers these criteria.
Expanding your knowledge on Reducing Complexity to Increase Infrastructure Security
- Teleport Server Access
- Teleport Application Access
- Teleport Kubernetes Access Guide
- Teleport Database Access
Learn more about Reducing Complexity to Increase Infrastructure Security
Introduction - Reducing Complexity to Increase Infrastructure Security
(The transcript of the session)
Cody 00:00:02.267 Good morning, good afternoon, or good evening, depending on where you are in the world, and welcome to today's DevOps and securityboulevard.com webinar, brought to you by Techstrong and Teleport. My name is Cody J. Brown, and I'm the host of Techstrong Learning. We have an exciting presentation ahead of us, but before we begin, I have just a few housekeeping notes. First, today's session is being recorded, so if you miss any of the session or you'd like to share it with a friend, the session will be available on-demand after today's webinar concludes. I'll point your attention to the right side of your screen where you'll find a couple of tabs. The first one I'll point you to is the Q&A tab. This is where we want you to send in any questions that you have for Colin today. And the chat tab is where we want you to just engage with each other and engage with us. Use this to tell us where you're from or just however you want to contribute to today's conversation. And finally, at the end of today's webinar, we will have a giveaway for four $25 Amazon gift cards, so stick around to see if you're one of our four winners. Onto our presentation today. Our topic is: reducing complexity to increase infrastructure security. And I'm joined today by Colin Wood, solutions engineer at Teleport. It's my pleasure at this point in time to turn the floor over to Colin to get us started. Colin, thank you so much for being here with us today.
Defining the problem: cloud-native security requirements
Colin 00:01:25.105 Thanks so much, Cody. Welcome, everyone. My name's Colin. I am a Solutions Engineer at Teleport as Cody mentioned. And today, I would like to chat with you for a bit about something that we see daily in our lives in technology, which is the need to reduce complexity. So without further ado, I'll dive right in. So let's start by defining the problem. So companies today essentially face the problem that they have employees, and they have resources to which those employees need access. Not too difficult to understand that, but you need to securely deliver the right resources to the right users, and generally speaking, the difficulty is that traditional methods are overly complex and require significant maintenance to deliver on this promise. And this was hard enough when we were in an on-prem world where you had everything in your data center, but security in a cloud-native world is very different. First, especially during this pandemic, work from home puts additional pressure on the corporate network. Second, employees at different levels need different access, and they need to be able to connect from anywhere. And finally, our infrastructure is increasingly complex.
Colin 00:03:04.006 It's not we just have VMs. Now we have VMs, now we have Kubernetes clusters. We have Windows machines. We have Linux boxes. We have all these different resource types that we need to be able to access to. And to address this, what organizations typically do is add more and more security devices from different vendors, and then therefore policies, which isn't necessarily better when it comes to network security. We can really end up subtracting by adding because what we end up with are too many surfaces to defend, too many tools to configure and maintain, and too little information about who is accessing what. Finally, when we talk about the problem, many of these traditional solutions exacerbate tension that exists between infrastructure engineering teams and security folks. So your people want seamless access to the infrastructure resources they need, but the security teams need to be able to make sure that those resources are protected and that the compliance requirements are being met. So before we take a look at a few examples of how complexity can reduce security and how by reducing the complexity, we can increase security, it's a good idea to define what success looks like.
Essential elements of infrastructure access
Colin 00:04:43.036 That is — what should a good infrastructure access solution look like? What are we striving to achieve? So we would say there are four criteria to a good access solution. The first is obviously connectivity. You need to be able to connect to the resources you need. But if access can come from anywhere and compute can happen anywhere, we need to be able to securely connect to any resource on the planet, regardless of network boundaries. So here I am, working remotely, I need to be able to connect to all of the various compute resources and application resources that I need during the day, regardless of where they're located. So whatever solution we implement needs to be able to deliver on this. So that's the first requirement. The second is authentication. We obviously can't just let anyone connect to any resource — we need to authenticate people. But we need that authentication to be identity-based, and we need it to tie into your corporate identity provider so we can know exactly who is accessing what resource. We don't want to be managing users in multiple places. We don't want to add users into the database and then add users into each application, perhaps with different usernames. We want onboarding and offboarding to happen in one place only, right, our corporate identity provider or single sign-on.
Colin 00:06:16.422 So now that we've got connectivity, we've got authentication, we know who's accessing things, we need authorization. We need now to be able to control who gets access to what resources. Okay? We need fine-grained access controls so that a senior SRE and an intern don't have the same privileges, and we need to make sure those privileges are automatically enforced. So now we're connected, we're authenticated, we are authorized for the right resources. Finally, the last requirement is robust audit. So in order to solve our access problem, we also need to know who accessed what resource, when did they access it, and what did they do. Without unifying these four elements, our systems will be vulnerable to attack. The productivity of our team will be compromised. We just won't have achieved our goal in reality. And generally what happens in the status quo is that in an effort to build a strong defense posture, teams mix and match technologies to try to achieve these four goals, and they create complex security sprawl. So then they end up with too many products, too many interfaces spread across their enterprise. So let's look at the complexity involved in trying to deliver these four criteria for a couple of straightforward examples using sort of traditional methods.
Traditional server access
Colin 00:07:50.590 So the first we'll look at is traditional server access, so we can think of SSH. So just for a fun exercise, I did a quick how-to search on how to best secure SSH access. And this yields an ever-increasing number of steps and best practices: 14 key management practices, open server, 20 open SSH server, best security practices, etc. And you can imagine if you open any of these links, what you get is a list like this. This is just a sampling, of course. These certainly aren't things I'm recommending, but these are the types of things that you find in those lists, right? So this list would be really cumbersome to implement for a small team only accessing a few servers. But imagine at the scale most organizations operate at today, with auto-scaling infrastructure, this can become really, really impossible and unwieldy to manage. Imagine user bases with tens of thousands of users, tens of thousands of systems. And if we go through the list, a lot of these items in and of themselves are really problematic. So, for example, if we start with use SSH keys, certainly they're better than passwords, but not much better. They're hard to brute force, but they also never expire. They don't include any metadata and they have no identity built into them. In reality, they're just sort of half of a math equation, and the only thing that can identify them is the file name once they've been distributed onto the servers.
Colin 00:09:36.920 If we look at number two, a common thing is use the “allowedusers”. So to do this at scale, that means every time someone joins and leaves, we have to update every system, which can be tough. Even if you have a config management solution, Ansible or Chef or something like that, that makes this easy for you, you're still touching every node every time someone joins or leaves or you're redeploying immutable infrastructure, etc. So hardening the SSH config, so that's something most organizations can do as long as you're willing to diligently stay up to date. Number four: regularly rotate the keys. This is a really difficult one. Again, it requires touching every system. It requires your users all generating new keys. It's really a painful experience. And number five is similar. The search that takes place when a user or an engineer leaves an organization to find all their keys and clean them up can be really exhausting to do on a regular basis. As you can see, I won't go through all the items on the list, but what we see is there's a lot of complexity that comes in here. I mean, even just to wrap it with number six and seven, you have to have someone on staff who's always aware of all the newest vulnerabilities and best practices for key size, for algorithms that are acceptable, etc. Right?
Colin 00:11:17.483 So if we achieve all this, we're accepting. So even if we have systems in place that do all this, what we're accepting is let's say you have one new engineer join, that means they have to generate a new key pair according to your minimum standards and name it according to your naming standards. Then you need to distribute the public key to only the appropriate systems. Then you need to update all those config and you need to maintain a hardened bastion host or VPN, those types of things. Right? So, of course, this is just an example. This is not the only way. There are some products that then will manage this process for you, but let's assume you're able to get all this right and you get it automated and it scales well. This already very complex solution only really delivers on connectivity. It doesn't address identity-based authentication because it's not linked to your single sign-on. It doesn't address authorization because really doing all this doesn't help you do anything about roles and permissions. And it doesn't deliver any audit capability. You have to find a solution to aggregate your logs as well. Right? So I'm sure I can repeat the same quick how-to search for Linux user management and server log aggregation and I could complete all those steps as well, but in reality, what we're getting a picture of is just a massive amount of complexity to not necessarily deliver on a highly secure solution.
Colin 00:12:52.835 And even in our attempts to do this, what we see is that the status quo really is broken because we all know that we need to rotate our SSH keys and we need to know whose keys are where. And it should shock us, but it probably doesn't. Over 90% of respondents to a large survey reported that they lacked a complete and accurate inventory of SSH keys, which is a huge security vulnerability. And nearly two out of three cybersecurity professionals state they don't actively rotate their SSH keys. So in our attempt to defend the network and critical access from cyber threats, what we've done is we've fallen into the trap of bolting on more and more security layers and policies, but the result is just that we've increased the level of complexity within the environment to the point where we have actually created risks. We can't even keep track of all the keys, let alone the hardening, patching, and other necessary tasks. We'll take a quick look at a second example. So that was server access. What about application access? So if you have an internal web application, how are we going to secure access to the web application?
Traditional application access
Colin 00:14:15.327 So again, we could do another quick search. And if we did that, we would see that we've got lots to do as well. So don't make your internal applications accessible on the internet, use TLS, don't use shared accounts, require a VPN or a gateway, use password managers for your users, control all access to APIs, etc. So what we're seeing is right away we see we need to manage adding and removing users from the VPN, manage adding and removing users from each app. To secure the apps with TLS, we're either going to need to use an internal CA and distribute the root cert for that to our end-users and maintain the CA. Or we could use public certs and some sort of split DNS, etc., but it's not easy. And in the end, when we do all this, all we're really gaining from those four requirements we've got is connectivity. Authentication to all our apps won't be linked to a central identity necessarily. Certainly, there are solutions for that that then we can bolt on as well, and then those are solutions that we have to manage separately. We haven't touched authorization, so that is now we have to manage authorization for each of these apps and individually add the appropriate users to the appropriate apps.
Colin 00:15:55.901 And again, we're still nowhere when it comes to audit capabilities. So it shouldn't surprise us then that it's the same story everywhere we will look. So we took a quick look at server access and application access. And just to get connectivity, we saw an incredible amount of complexity. So I won't belabor the point by going through more examples, but needless to say, the story is the same for Kubernetes access, database access, Windows access. To achieve our four criteria for successful access, we're going to continue increasing complexity by layering on more and more technologies, which means managing more and more users in more systems and more and more configurations. So the topic of the talk today was really about reducing complexity. So what's the different approach? How else can we do this? So if we take Bruce Schneier's word for it, and of course I do, then complexity is the worst enemy of security. And what we need to do is reduce the complexity to increase our security. So if we think about how we would design a solution from the ground up to do that, we should return to the four essential elements of access to do that.
The Solution: Teleport Access Plane
Colin 00:17:28.905 So we'll say if we wanted to design the simplest solution which would deliver this, it needs to have those four capabilities. Right? We need to be able to securely connect to any resource on the planet, regardless of network boundaries. We need identity-based authentication that ties into our corporate identity provider so we know exactly who is accessing what resource. We need fine-grained access control so that a senior SRE and an intern do not have the same privileges and that those privileges are automatically enforced. And we need robust audit: who accessed what resources, when did they access it, what did they do? And we need this in a simple solution that allows us to manage all of our resource types to reduce complexity. Okay? So Teleport's solution to that is the Teleport Access Plane. So the Teleport Access Plane is an open-source solution which consolidates those four essential infrastructure access capabilities: connectivity, authentication, authorization, audit. It's more secure, less complex, while delivering engineering and developer productivity. So let's see what makes it work. So in reality, what makes it work is also what we think you should look for in any solution.
Colin 00:18:53.945 So the first is instead of using SSH keys, instead of using usernames and passwords, we should start with short-lived certificates. So that should be our credential of choice. They have identity built in — they expire. So if you think back to the SSH example, if we leverage certificates, SSH certificates instead of SSH keys, we don't need to worry about our users generating secure keys anymore that meet our corporate standards. We don't need to rotate keys because certificates expire, so there are no keys to rotate. There's no key distribution anymore because we just use the trust in the CA. And there's no keys to clean up because, of course, there are no keys, it's certificates, and they expire. Certificates really are superior and reduce complexity, so they're the most important starting point for whatever solution you're going to implement to solve this problem. Okay? Let's also when we're using the certificates, instead of adding users to the system, let's centralize the identity and the SSO provider. So in order to generate these certificates, let's pull identity info from your identity provider. So let's not add another place where we have to maintain user accounts. Users should be added and removed in one place only, that is your identity provider.
Colin 00:20:26.567 Whether that's Okta, whether that's Azure AD, whatever it might be, that should be the place you go to add and remove users. So when a user joins, you add them in your SSO. When they leave, you remove them in the same place. No key cleanup, no removing them from databases, Kubernetes clusters, anything like that. It's simplicity first. The next thing about the certificates is there's metadata in them. So we can use this to provide rules such as developers can never have access to production data or contractors should only have access to XYZ project in GCP. So role-based access is going to let us solve a lot of problems. It also lets us stick to what we said about authorization, which is we can now assign roles in one place as well, the SSO. So not only when you join do I add you in the SSO and when you leave I remove you from the SSO, I assign your role there. So I say you are a senior SRE or you are a database administrator. When you join, we add them to the SSO, we assign your role, and then the solution should take care of delivering all the access you need solely based on that. Okay? So the next question is we've delivered on connectivity with the access plane, we've got our authorization, we've got our role-based access. Now we need visibility and control into what's happening on these.
Complete session view
Colin 00:22:13.239 We need to include an audit layer. But by basing things through an access plane through a proxy, what we can do is we can get a complete audit log of exactly what's happening. So run a command on a Linux box, it will be in the audit log. So if I run a sudo command, I'll have that in my audit log. If you run a query in a database, it'll be in the audit log. Okay? If you run
kubectl, you'll see that you ran
get on pods in the audit log, all through this single solution. So again, low complexity, but really delivering high visibility. Okay? So this solution is going to work really, really well for minimum least privilege situations, but what if we need to work outside our least privileged role? So in most of the places I've worked in the past, what that looks like is I join the organization, I'm given a very nice tight set of least privilege permissions, and then when I need to do something extra, I request that and they grant it. Usually, however, it doesn't always get taken away. So as time passes and I need to access more things and I help out on different projects, my privileges that start as least privilege just sort of grows and grows and grows over time, which creates a really big security risk.
Colin 00:23:48.488 But certificates make solving this problem easy because now instead of placing my keys somewhere or changing my account on a database to be a different role, things that have to be sort of undone after, I just grant a new short-lived certificate that's got those permissions and it expires on its own. So now we can restrict access as much as possible by default. So when I join, I do. I get that nice least privilege. But with the system, I can then make it easy to temporarily grant escalated privileges when the need arises. And because we use short-lived certificates, they just expire and I'm back by default to my least privilege model. If my role changes, it should be changed where everyone's role gets changed within the organization, in the SSO. So if I have a permanent role change, we change that in the SSO, and then when I authenticate, I'll have that new role as well. So what this means is your security folks really get to operate on nice least privilege zero trust principles, but at the same time, your developers, engineers, SREs don't have to jump through hoops to request additional access they need. The admins can quickly and easily approve/deny requests without even leaving their existing workflows because, of course, the access requests integrate with things like Slack, Jira, PagerDuty, Mattermost, etc. So we get a really nice ability to deliver just-in-time access.
What simplicity means
Colin 00:25:32.497 So when we talk about cybersecurity, we often stay rooted in all the moving parts of the puzzle without really getting to the solution’s essence, which is simplicity. So in the complex environments we work in though, what really does simplicity mean? Resources are now accessible across multiple private/public clouds or containers with various users accessing them. In that environment, the natural reaction has been to deploy a continually growing variety of technologies and solutions. We're going to throw money and people at the problem, but in reality, simplicity is key. We want to shrink the attack surface so that security is closest to the applications, devices, and data, no matter where they live. In that way, security becomes an intrinsic part of the process rather than an add-on after. So our solution for this is the open-source Teleport Access Plane. This solution is zero trust, certificate-based, provides advanced access workflows, and gives complete visibility into access and behavior. It's not only more secure, but simpler. It brings all the requirements we were looking for into your infrastructure at the protocol level.
Gaining visibility through Teleport
Colin 00:26:58.797 So, for example, Teleport server gives you the visibility into all SSH sessions down to the kernel level. Teleport Kubernetes Access can enforce second-factor authentication for all
kubectl exec commands. Teleport's Database Access can enforce table-level roles and show every query that hits your Postgres, MySQL, or MongoDB databases, for example. Teleport Application Access can provide a VPN replacement for internal applications like your AWS Management console, Jenkins, Grafana, Kibana, etc. And finally, Teleport Desktop Access gives you modern password list access for all your Windows servers and desktops, including complete session recordings. That's the overview of what I suggest is the best way to reduce complexity to increase infrastructure security when it comes to infrastructure access. I'd like to open it for questions right now if there have been any questions. Okay. I see a question about implementation. So this type of solution can be implemented both as a SaaS trial. So it's a SaaS offering. So we do have a SaaS offering and you can sign up for trials.
Colin 00:28:34.425 It also can be delivered on-prem fully within your own infrastructure. So you can deploy this fully within your own infrastructure. And for getting started, we do have things like Helm charts, cloud formation templates, Terraform plans, a Terraform provider that will make all this easy to get started with. I hope that answers that question. I see another question here. How can I get help when trying out Teleport? So we do have a community Slack. So you can visit goteleport.com, and that will take you — and there's links on the home page — to the community Slack which will allow you to interact with myself, my colleagues, and the rest of the Teleport team and Teleport community users to get help with your teleport implementation. I hope that answers that question. Okay. There's a pricing question in here asking how Teleport stands against — I won't name the other organization, but against what I would call a traditional PAM solution. The best way I would describe that is — the best way I can answer that is to really reach out and talk to the Teleport sales staff.
Colin 00:30:10.755 So I'm a solutions engineer, so I don't really talk about pricing generally, but that's something we'd be happy to do. We feel our value is excellent when stacked up against any competition. But again, you can feel free to reach out to me directly and we can chat about this, or feel free to reach out to Teleport's [email protected]. That was a good question. Thank you. Okay. Great question. Is there a security configuration guide or benchmark for how to implement Teleport? Absolutely. So first, we're open source. So if you go to github.com, you can find us at github.com/gravitational/teleport, and there you'll find an examples folder which will give you all sorts of templates. But the best place to start really is at goteleport.com in our docs, our documentation section. And there's a getting started guide right there which will lead you through implementing open-source Teleport from start to finish: deploying it in Kubernetes cluster, deploying it as Linux service, high availability, single node for testing. And then the other thing I'd encourage you to do if you're interested, is reach out. And we're happy to engage you in a no-cost proof of value where we work with you. And in our proof of values, I recently wrote a blog post about what makes a successful proof of value. And one of the most important things is the collaborative nature of it, and we're happy to work with you through that. I hope that answers the question.
Colin 00:32:27.813 So it doesn't look like there's any other questions coming through right at the moment. So what I would encourage everyone to do who's interested in reducing the complexity of your infrastructure access while at the same time improving security is to reach out. And whether that's via our community Slack, whether that's by visiting our homepage, visiting us on LinkedIn, whatever method you think works best for you. And we're happy to walk you through all the various options for trying it out, whether that be the open-source version or a trial of the enterprise software. Thanks very much, everybody. I appreciate you taking the time out of your day to attend today's talk. And thank you to our hosts. Oh, one last question popping through, which I'll quickly address. The average time to implement Teleport at the enterprise level. This is a great question, and the answer is a little complex because it depends. The average time is quite short. Most of our clients are able to do it in the span of — a POV usually takes a couple of weeks at most. A couple of weeks on average. And a full implementation is measured in small number of months with us working alongside, but a big factor in that is definitely on the level of automation that you have in your infrastructure as code and in your configuration management types of things. So if you have 10,000 servers and you're managing each by hand as if it's a pet, it can take a little longer. But if you have nice tight DevOps infrastructure's code pipelines, it's very fast to implement. Thanks for that last question and I'll turn it back over. There we go.
Cody 00:34:36.949 All right. I'm doing just one last search to make sure we're not missing anyone. I'll go ahead and read out our Amazon gift card winners, and any final questions that get sent in during that time, we'll address those afterwards. So our four $25 Amazon gift card winners are Amet E., Verne W., John K., and Shay G. So Congratulations to the four of you. You should receive your gift card via email. If you don't see that email, keep an eye on your spam folder. One last look at our questions, and I think that about wraps it up. Colin, is there anything that you'd like to say before we officially close out?
Colin 00:35:24.899 No, just thank you to you and thank you to Techstrong Learning for hosting.
Cody 00:35:31.719 Perfect. Well, thank you so much Colin for taking the time to be with us and for putting together the slide deck. A quick reminder to our audience that today's session was recorded. Following this webinar, you'll receive an email with the link to access the recording on-demand. You can also find it living on the DevOps and Security Boulevard websites. Just visit devops.com/webinars or securityboulevard.com/webinars and look in the on-demand section. Now onto the — I'd like to thank Teleport for sponsoring today's webinar, and my final thanks goes to you, our audience for being with us for the entirety of today's program. Thank you so much for being with us. Be sure to fill out the post-webinar survey, and have an excellent day.
Join The Community