Using Infra-as-Code, Not Jira Tickets to Pass Audits (Conf42 | DevSecOps | 2021)
Jira tickets are often seen as a necessary evil in order to satisfy compliance audits. However, infrastructure-as-code can replace tickets while providing real security benefits. Learn how Teleport utilizes Terraform to make developers, auditors and the security team happy!
More on Audit and Compliance with Teleport
Using Infra-as-Code, Not Jira Tickets to Pass Audits
Travis: 00:00:00.705 Hi. I’m Travis Gary, IT Director here at Teleport. And today for my talk, I’m going to tell you how you can get rid of Jira tickets in your organization by embracing infrastructure-as-code and going to give you some practical examples today, as well as kind of explaining why you should do this and the general philosophies behind it. So let’s get started. So Terraform not Jira Tickets: Pass your compliance audits the DevOps way and actually improve your security too. So quick disclaimer. This advice primarily applies to a SOC 2 Type II certification. And I can only 100% guarantee this advice works because that’s the one certification that Teleport has. But I can’t tell you I believe it most likely works 100% for PCI 27001 one because that’s about 90%. Should note that NIST and for government certification and SOXs for public companies are much stricter, but I do believe having gone through some of those processes that these same lessons can be applied. Should also note there’s a little bit of trash-talking of ticketing systems here, but they do serve an important purpose, and that purpose is primarily for support teams, and I’ll go into a little more detail about that before. And finally, if you’re really embracing infrastructure as code, you have to remember that IaC pipelines have really powerful API keys, and you have to take a lot of work to secure those. There’s lots of ways to exfiltrate API keys. And if you’re replacing dangerous console access with infrastructure as code, you have to make sure that you have a way to protect those API keys. And we’re not going to cover that in this talk, but there’s lots of resources you can look up for that.
Why Jira tickets became common practice
Travis: 00:01:45.351 All right. Let’s jump into this. So first, why are we doing this? It’s mainly because hackers don’t care about your change management process. No amount of Jira tickets is going to actually improve your security. So we have to think about why did we even start doing tickets in the first place. If they’re not helping security, why are we doing it? I mean, isn’t the point of doing compliance audits to actually have your organization’s security be improved? So how did we end up with this process that we do this via tickets? And there are some good reasons, but they’re a bit dated. So let’s look at some of the common reasons that people decide to start using a ticketing system. So often people say, “Well, we use Jira for planning, so we should also use it for change management.” You’ll hear, “Because IT or systems work is considered a service organization.” Or finally, you might hear from some older IT directors that, “We follow ITIL philosophies.” And we’ll dive a little bit into what that means if you haven’t heard of it before.
Travis: 00:02:53.145 So first time jumping in. Teams will say, “We already use Jira, so that’s just the way it is.” But GitHub is a fully-featured tool for both planning and change management. And GitHub has improved a lot in the planning front, especially recently. You can assign your issues to project boards and have Kanban for planning sprints and doing Agile just like you would in Jira, but it’s even better because they’re innately linked with pull requests. You can use tags for doing automation. They’re also great for doing auditing to query different types of tickets for different views. Great for release management, that you can set up milestones. All this is really built-in. And if you work for a large enterprise, they might have done a lot of integrations that are really linking a lot of the Jira functionality to GitHub, but why do we need two systems? That’s just more work to integrate. We should stay native using the tools where developers already are, and that’s GitHub.
Travis: 00:03:56.258 So the next part is change management. Change management often happens in Jira because that’s maybe where your businesspeople are but not your developers. But change management works way better in GitHub. A pull request is a change request, but it’s better than a Jira change request because you can’t just change what it is once you ship it. When you merge that, you approve the exact code or the exact infrastructure change described there. Which is very different than if you do an older form of change management where you are describing a change in a Jira ticket and then someone has to manually perform that change. Now, if they’re manually performing that change, they could make a mistake. Or if their credentials got stolen by a hacker, any change could be made with those credentials. Versus doing it the GitOps way. Only what’s approved via Git is what ultimately happens.
Travis: 00:04:56.573 So in that scenario, if you compromise a developer account, you’d have to submit a PR and then convince someone else to code review it. And it would also have to pass your automatic tests. And this is a huge improvement that we all should be very familiar with. In the DevOps world, automatic testing has already started to really replace QA departments. And these other DevOps lessons can replace kind of all of the other needs for this manual change management. Things like if you work within an ITIL organization, they might say, “Oh, you need a rollback plan in order to ship your change.” And of course, with GitHub, rollback is as simple as a revert. And finally, to make your auditors happy, GitHub is the best audit trail. We can see exactly what happened. Who approved it, who reverted it. Everything is right there. It doesn’t have the kind of reporting that a lot of people really like Jira for, but it has a very full-featured API. And you can write some easy scripts to pull out those kind of CSV files you need to make your auditors happy.
The new IT world
Travis:: 00:06:01.845 Let’s take a look at the next reason people often say that we can’t use GitHub and we need Jira. And so often, it’s because they say that IT is a service organization. And remember, I mentioned that ticketing queues are for service organizations. But that’s the IT world of old. The new IT and systems or platform DevOps way of thinking is that they’re a platform team. It’s that we want to make tools that enable developers to do their jobs better. We don’t want to do the work for developers and make those changes for them. We want to give them the tools so that it happens automatically. It’s all about automation here. Every team that you include in the process slows the process down dramatically. I think we’ve all probably worked in an organization where you have to go through a request process to make changes and you put in a request to IT on Monday. They finally get to it on Tuesday. Then, they got to send it out for approval. Then, it needs to get reviewed by the Change Management Board or Change Advisory Board. And before you know it, you’ve wasted an entire week just kind of waiting for the changes to get approved. And those things are really wholly unnecessary. The most we should do is have two people. Preferably on the same team. Two devs. This is just like your code review process that you’re used to. We can apply the same concept to all sorts of places that we used to do change request tickets for.
Travis: 00:07:34.676 Now, I should note that some very strict compliance requirements that you might find in NIST or SOXs — they do require approval from an independent second party. And this is often an application owner. If you need to access that application, you need to ask the application owner, not someone on your team, whether that’s okay. So it still fulfills the plus two rule. Using clever code reviews on GitHub and code owner files, you can actually make that process happen pretty automatically. There are also some other access management tools on the market that help to access approvals and things like that outside of Jira just quickly and easily, rather than having to do it in a ticket-based workflow. And finally, talking about why IT needs to not be a service organization is that request just don’t scale. You can’t have an effective dev team and follow the right DevOps philosophies of that we want to ship code fast. We want to automate things if there’s a manual process in the loop. So if you have to rely on an IT team to say, “Complete a DNS request for you,” and get that approved, it’s just not going to scale.
Travis: 00:08:53.225 Because if you’re shipping fast or you have developers all over the world, suddenly you need a really responsive 24-7 365 support desk. And that’s just way too much to ask from a lot of small companies. Building a global team is a lot of work. I know quite personally it’s hard work, and it’s also stressful for those teams. And it’s a bit of a fool’s errand of trying to develop this, especially at a small company, versus instead investing your time in creating tools and behaving more like a platform team. And that allows developers to self-serve, solve issues by themselves, within their own time zone, hopefully with another co-worker in that same time zone. And that’s what’s going to scale, and that’s what’s going to allow your organization to have a competitive advantage. And it’s only when you make your IT teams start developing or behaving like a DevOps team, rather than being a service organization. And that’s what really allows you to ditch the ticket queues that we’re all so familiar with that service organizations rely on.
ITIL — a philosophy based on the past
Travis: 00:10:00.632 So the final one. ITIL. Now, when you hear this word, I want you to think of a dead dinosaur because that’s what ITIL is. It’s a philosophy from the past for when IT people were racking servers and running bare metal compute. That’s not the case anymore, and we need to let it go. So ITIL was created — to give you a history lesson for folks that have hopefully the luck of not working in an ITIL-based organization, these were created to manage processes that are really manual and error-prone. So if you’re having to rack servers, you have to talk to a lot of teams. You have to talk to finance and procurement. You have to plan where it’s going to go in the rack. And doing rollbacks is not as easy as just saying, “Oh, we’re going to revert GIT and it’s fine.” No. A rollback is a lot of work, where it could take hours to move servers around, to reimaging servers, to change configuration.
Travis: 00:11:01.553 That’s what this was developed for. It was developed for another era. So trying to hold on to this is not going to help your developer teams — it’s just going to slow them down. So it’s time that you need to stop following IT philosophies from the pre-cloud era. It’s a different world now. We actually can deploy servers with the click of a button. We can roll back entire data centers’ worth of infrastructure just by doing a revert and watching Terraform taint and rebuild all the infrastructure you need to run a modern app. So we need to make sure that the entire rest of the tech stack that you have is as sophisticated as deploying your infrastructure would be with Terraform or other infrastructure as code tools.
Applying IaC lessons learned
Travis: 00:11:46.707 So let’s talk about actually applying some of these lessons. So a lot of people are familiar with using IaC systems like Terraform to deploy your AWS infrastructure or setting up a GCP or those kinds of changes. But it’s also really helpful for other parts of your tech stack. A lot of people don’t think about the SaaS apps that are really controlling this. So this includes things like GitHub and Okta. So a lot of times those systems are still controlled by sysadmins who are manually pointing and clicking within the console to make changes. But when you think about how powerful those systems are, GitHub controls everything if you follow the GitHub’s philosophy. And if you’re doing proper access controls with RBAC or the newer ABAC, attribute-based access controls, then your directory system like Okta controls the access to everything. So if we don’t let people manually deploy servers via the AWS console, why do we let people manually make changes to the GitHub console or the Okta console, which are arguably more powerful and dangerous because they control all the systems?
Travis: 00:13:00.303 So let’s think about this. If you use GitHub to manage your infrastructure, then a compromised GitHub admin owns your infrastructure. So it’s of critical importance that we get rid of GitHub admins. But if we’re getting rid of GitHub admins, then how do we do the admin work? Probably figured this out. It’s Terraform. We’re going to use IAC. So you can Terraform your GitHub instance on GitHub itself. So you want to apply these principles to kind of all the things in your tech stack, and this includes the Terraform Cloud itself. You can actually apply these lessons to the same systems they’re managing, and you should. So we’re going to look at a really short, practical example that we did here at Teleport about terraforming Okta. So we’re going to apply attribute-based access directory rules via Terraform to eliminate Jira tickets for what’s a really common thing in an IT department — is handling access requests. So this is just three easy steps. And you can apply this same concept to a lot of different systems.
Some practical examples
Travis: 00:14:09.866 So let’s take a look. We’re actually going to have some code in this discussion. So first, you want to understand what the schema here is of the relationship between kind of the users and groups. So first, you need to create a directory group for every single app. So I prefixed these with app– what the system name is. So we might have one that is app–GitHub or app-Salesforce. That group is used in assigning to an Okta application that lets users in through the front door. That authorizes them. That they can authenticate, hopefully with SAML and not password via the Okta Directory to go log into the app. And ideally, that login should have no entitlements. That should be a basic read-only role. The least privileged user that people want. And then for all the other users, we should create roles for each of those.
Travis: 00:15:11.084 So in our code example here, we have our basic group for Salesforce, and we’re writing in here some attribute-based rules to decide who should go in the group. So we’re looking at the user profile and looking at what the department field is to decide who should get access to Salesforce. And we say, “Okay. It’s the sales team or the marketing team,” in this simple example. And then for the bigger role entitlements, like who’s a Salesforce admin, we can then — again, we can use things like other attributes that you could say, “Hey, you’re in the IT department and you’re a manager.” Things like that. Or I wanted to call up this example because there often are weird exceptions — we can’t always use attributes. Sometimes we can just name names here and keep it easy. So if we wanted to add a new Salesforce admin, we could create a pull request and add a new person right here and have someone approve it.
Travis: 00:16:08.454 And I should note that you should make these groups and roles even for systems that don’t support the automatic provisioning of roles. So Salesforce does. I can actually assign the admin role to the two of us because we’re in that group, but not all systems do. But you need to still create these because it’s that important placeholder for change management. Otherwise, you would need to create a Jira ticket to keep track of this. So we have to keep track of it here. And it’s an important form of future-proofing that eventually this system might support automatic role provisioning or you might decide that it’s important enough for a critical system to write your own integration to make that happen.
Travis: 00:16:56.469 So a lot of good systems like AWS and Salesforce, Teleport also supports this kind of setup where you can map groups within Okta for certain roles and then assigning those to the roles within that group. And you can see the Terraform code here is quite simple. It’s just a quick loop to loop through all the different apps in here and then go create the groups and then the associated group rule that uses the attribute-based access controls we described to put the people in the group. So we mentioned you want to do this anyway, even when you don’t have an automation for it. And that way, we’ve created a request, approval, and audit system that lives entirely in Git, and we’ve eliminated that need for all access requests for Jira.
Travis: 00:17:51.437 So the next step is, once you do that, you want to remove the ability for admins to manage those groups within the console. And this is a DevOps lesson that a lot of people do in AWS. When you reach this happy DevOps nirvana, you actually take away console access from developers because they need to make the changes via Terraform. So in this case we’ll actually remove just the permission group admin from all the groups that are not managed by Terraform. And we should manage if we can, 100% of your groups in Terraform. But if that’s not realistic for you, you can at least do the ones that control some sort of access-based permissions. Because you don’t want to give the permission to, say, an IT helpdesk associate — that they should not be able to decide who gets AWS admin in your SaaS app.
Travis: 00:18:44.620 So step three. You want to alert on any changes made outside Terraform. So this is to make sure that nobody was able to circumvent your IAC process. And this is important in proving to your auditors that this was the only way that changes were made, and it’s also a great way to do security investigations if a hacker was able to find their way around your process. So you want to connect Okta to your SIEM — security information and events management platform. If you don’t have one, and they’re quite expensive, you can actually hack it together using Okta webhooks, and Zapier — are a really cheap low-code solution. So what you want to do is you want to write an alert to fire any time a group change is made by anyone other than the Terraform service user. So if someone were able to log in by any other means. For some reason, there was a misconfigured thing. You can also check for metadata on that. That did the request come from the IP we expected from Terraform Cloud? Or maybe someone stole our Terraform service user credentials, and they were able to use them elsewhere. So the SIEM really helps make sure that no one got around the process. Now, you should still do an occasional audit process. Going through your logs on a quarterly or annual basis to make sure that nothing slipped through the cracks. That you missed an alert on something that was maybe an unauthorized change that was not made through Terraform.
Travis: 00:20:16.547 So finally, any good loop has a step N. And you want to repeat this process until you reach 100% Terraform coverage. So you want to keep doing this for other resources you’d have in Okta. Your authentication policies, your application setup. Everything you can until you’ve reached 100% code coverage. And at that point, you get to the really cool thing of removing console access entirely. And at that point, you can create what’s called a breakglass user. So of course, if Terraform or the [inaudible] process breaks down, there’s an incident, and Terraform is down, you need a way to get in. And what you can do is create that service user that is your super admin. And we use 1Password as our password store, and I highly recommend it, especially now that in their really recent release, you can also connect it to your SIEM.
Travis: 00:21:09.521 And so we set up an alert that if the breakglass service users’ credentials are accessed, that creates an incident. Because the only reason we should ever be using those is during an incident. And if someone is using them outside of an incident, they’re either breaking the rules or they’re a hacker that’s trying to compromise your system and you want to know that fast. So that’s kind of the process. And if you reach that 100% coverage, you don’t need change management tickets at all for any admin functions within that platform. And you can apply these same lessons to other important systems in your tech stack like GitHub. Get rid of all your GitHub admins. They are so powerful and dangerous. You can do it to Terraform itself. You can do all sorts of SaaS apps and keep applying these lessons. And as you do, your ticket count will reduce. So you can’t just throw out Jira right now, immediately. You have to kind of slowly carve away at it and reduce that ticket number as you increase your code coverage.
Tickets only for changes made outside of code
Travis: 00:22:12.302 So if you’re going to remember one kind of lesson from this whole thing is that tickets are only for changes made outside of code. No changes outside code, no tickets. So remember that it’s like we develop tickets for service organizations and for these older philosophies where we have lots of manual processes because manual processes are very error-prone. So you have to come up with these systems to track manual processes, to come up with plans to make sure you don’t make mistakes. But when you do things in code, we no longer need to do that. GitOps has paved the way to remove all those manual processes. So you want to do this completely up and down your stack, including managing the SaaS [inaudible] in the realm of IT that is traditionally still done with Jira tickets.
Travis: 00:23:04.467 So I hope this talk helps empower you at your organization to realize that you can apply these lessons. And not only is it going to make your life easier for your developers, but you’re going to be more agile. You’ll be able to work across many time zones remotely. You’ll be able to get tickets done quicker because you don’t have to interact with as many teams. You’re going to be more secure because only the changes that actually happen in GitHub are what’s happening in your system, and it becomes very hard to circumvent that process. And finally, your IT teams are going to be a lot happier actually working as engineers, writing code, building systems and platforms, rather than responding to ticket queues and behaving like a service organization. So this is really a win-win-win for everybody. It does require some upfront investment, but I promise you — it’s worth it. It’s drastically improved our process. And we can’t wait to expand our code coverage to more and more systems because we’re already seeing the benefits. In the time that we no longer have to spend handling access requests, we’re now able to spend that time automating more systems, writing more IEC, writing more tests, improving our SIEM alerts, and all the other things that we enjoy doing as engineers rather than responding to ticket queues. So thanks for tuning in today, and I hope you can apply these lessons at your organization.
Join The Community