In this second episode of Access Control, a podcast providing practical security advice for startups, Ben Arent chats with Dave Mangot, Principal at Mangoteque, a consultancy focused on helping companies become better at delivering software. Dave is prolific in the DevOps space and has helped improve the lives of thousands of IT Professionals through his best-selling video course, Mastering DevOps.
Key Topics on Access Control Podcast: Episode 2 – Talking DevOps with Dave Mangot
- Not just developers and operations, but the entire business, needs to deliver value to customers.
- DevOps is a movement — a way of looking at delivering software or delivering anything else.
- Security is a huge, important part of delivering software — not building it in, early on, risks losing customers later when issues arise.
- Efficiently increasing feedback loops and continual experimentation, to ensure testing prior to deployment, is a win for business goals.
Expanding Your Knowledge on Access Control Podcast: Episode 2 – Talking DevOps with Dave Mangot
- Video: There is no such thing as DevSecOps
- Teleport Database Access
- Teleport Quick Start
- Teleport Access Plane
Ben: Welcome to Access Control, a podcast providing practical security advice for startups, advice from people who’ve been there. Each episode we’ll interview a leader in their field and learn best practices and practical tips for securing your org. For today’s episode, I’ll be talking to Dave Mangot, principal at Mangoteque, a consultancy focused on helping companies become better at delivering software. Dave is prolific in the DevOps space and has helped improve the lives of thousands of IT professionals through his best-selling video course Mastering DevOps. Hi, Dave. Thanks for joining us today.
Dave: Hey, man. Thanks for having me.
Getting Involved in the DevOps Movement
Ben: Hey, to get started, can you tell me how you got started in DevOps?
Dave: I guess I got involved in DevOps before anybody started calling it DevOps. I started my career as a software developer. And then when I moved on into — I guess we’d call it operations or sysadmin or whatever you want to call it back then, I was using software to solve my infrastructure problems which is sort of the Google definition of SRE. And so I loved doing automation. I loved monitoring. I loved metrics. I loved lots of things that were sort of core to DevOps. And then I was really lucky enough to be at Salesforce in 2012, that kind of time. At that point, the DevOps movement itself, like what people are now calling DevOps, was really sort of in its accelerating phase.
Dave: Famously, I guess, everyone says the DevOps movement kicked off either when Patrick Dubois and Andrew Shafer were trying to get together at an agile conference to talk about agile infrastructure, or 2009, Paul Hammond and John Allspaw giving the talk at Velocity about, I think it was 10 Deploys a Day, Dev and Ops Cooperation at Flickr. So I started kind of getting really interested right around then, 2010, 2011. One of the first DevOpsDays events and then spoke at Velocity the year after and just really got lucky enough to be at Salesforce when they were ready to embark on their DevOps transformation. And so I got to lead the DevOps transformation at Salesforce. I don’t know how I wound up in that situation, but I got lucky and sort of been very involved in the movement ever since.
Security as an Aspect of Quality
Ben: We started to see different spinning-offs of DevOps, talk about the DevOps starting, but now we have such things like DevSecOps. And I’ll provide a recording in the show notes below, but you recently presented at DevSecOps Rockies saying that there’s no such thing as DevSecOps. Can you just tell the listeners what you mean by there’s no such thing as DevSecOps?
Dave: Gene Kim has this great thing on the IT Revolution website. It’s also anyone who’s read The Phoenix Project. The First Way of DevOps is the systems-thinking idea. How do we deliver value to our customers? And at this point, it’s kind of silly to even think that it would just be developers and operations that are delivering value to customers. It’s an entire business that’s required to deliver value to customers. So if we’re going to talk about quality, engineering needs to be involved and marketing needs to be involved. I mean, how many things can we jam into the name DevOps? That doesn’t make sense because is DevSecOps one way of looking at it, but Dev More Ops is different and Dev Legal Ops is another? It just gets to be kind of silly. And so I understand where the name DevOps came from at the beginning. But at this point, I just look at DevOps like a brand name.
Dave: And if you watched the talk, it’s like Kleenex, or it’s like Xerox or any of those others. Like, it’s just a brand name. Who cares? DevOps is a movement. It’s a way of looking at delivering software or delivering anything. I’m working with a company on vaccine administration, and we’re applying DevOps principles. So it’s not even just software. So this idea of calling out a specific thing out of a brand name — you can use Kleenex to wipe your eyes or wipe your nose. You don’t call it something different. It’s all the same. And so security is a huge, important part of delivering software. I even say in the talk, if we’re all talking about shifting left in security, it would be SecDevOps anyways. It wouldn’t be DevSecOps. Instead of getting into these nomenclatures, my opinion is just let’s just call it DevOps, and then let’s focus on what are really the important things that we need to focus on in order to be good at delivering software, including security.
Ben: People say shift left, meaning security as sort of earlier in the process, as opposed to an afterthought once stuff is shipped.
Dave: Yeah. And there’s a couple of things to that. If you read the Continuous Delivery book by Jez Humble and Dave Farley, they make a pretty big point in there about how expensive it is to fix things when they’re in production. And so if you shift left with security, you’re bringing security much earlier in the process which turns out to be cheaper. And that’s important. Right? And so their example in the book has a lot more to do with the idea that, “If I’m a developer and I write code, and I run my unit test, and then I find out that I have a problem, I can just go back and fix it,” because “Oh, I wrote that code 15 minutes ago.”
Dave: If it’s two months later and the code’s in production, and now we have to go back and fix it, that’s a problem because I don’t remember what I wrote two and a half months ago. That was a long time ago. My life has gone on since then. To fix it it’s going to be a problem. And then it’s in production, so we might have to get an emergency authorization from a vice president. It becomes so much more complicated than just doing that early. And that also goes back to one of the things I talked about in the talk is you want to build quality in. You don’t want to go in at the end and try to bolt quality onto something after it’s already been built. I mean, imagine if Toyota or somebody who is manufacturing widgets tries to say like, “Now we’re going to put the quality in after the assembly line is complete.” You can’t do that. That doesn’t make sense. And so Deming, W. Edwards Deming, came up with this idea of quality has to be built-in.
Dave: I was lucky enough to work with Taher Elgamal. He was the CTO for security, or some CISO or some name like that, at Salesforce. I don’t remember the title. But he’s well known as the father of SSL. He invented that when he was at Netscape in the late ‘90s. And he always said security is an aspect of quality. And I thought that was just brilliant because if you take what Deming said about how you have to build quality in and you take what Taher said about security is an aspect of quality, there’s your shift left. It’s right there. We have to get security involved as early as we can in the process so that quality and hence security is built into the process and built into the product that we’re developing, especially in software. It’s something that we can certainly do. Security, being a partner early in the process and not having that adversarial relationship that sometimes security teams are known for — I think really will improve the quality of the product. But by extension, that will really improve the security because nobody likes to run around when there’s a fire.
Well-Planned Security to Retain Customers
Ben: You’re launching a new service. You have the developers, sort of DevOps, and security and then you all sort of work together, “Oh well, these are the sort of deployment strategies,” and it’s much easier to change earlier on and iterate than it is to try and move VPCs after development has shipped, whatever they have.
Dave: Yeah. The worst thing you could do, right, is to go back to your customer and say, “We built it this way, but it turns out that there’s security problems with that. Now we’re going to ask you to make a bunch of changes to the way that you’re using our product on your side in order for our stuff to be secure.” And I’ve worked with clients that have built hundreds of VPNs from the client’s sites to their installation, and then they were like, “Oh, we should change something about the way that these VPNs work.” It’s like, “Oh God. Oh God.” You don’t want to be calling up hundreds of customers and being like, “So you’re busy running your business. I’m going to need you to take a whole bunch of time out of the things that you’re already working on and change a bunch of stuff because we didn’t think this through well enough in the begin —” that’s a great way to lose customers, fantastically lose customers.
Dave: But yeah, you want to get those groups together in the beginning. And obviously, not everybody has all those resources. Right? Your suggestion was like, “If I’m building something new — like if you’re a start-up, do you have a dedicated security person?” I don’t know. Maybe you do. Maybe you don’t. And certainly it would depend on what you’re building, how important you feel like that’s somebody to bring on. I talk with clients all the time, and they’re like, “Oh well, we did this because we had to get to market, and we weren’t thinking about these other things.” And I’m like, “That’s a perfectly valid strategy. You don’t have the resources of some megacorporation at that point. So do the best you can and then move on and get better,” and that’s the point. That’s a huge part of DevOps, right, is this Kaizen principle get better over time.
Dave: When it’s important that you hire a security person, then hire a security person. But even before you hire that security person, you can think about, “How do we make sure that we are following best practices? If I’m going to use containers, which who is not using containers, I think, nowadays, where does that container come from? Do we just allow developers to randomly pull containers down off the internet?” “Maybe, if that fits our risk profile.” Security is mostly about risk. Maybe not, maybe we’re going to say like, “Hey, for the next month or whatever, everyone, use this image because we’ve made sure that all the dependencies in here are all up to snuff, and no one else is going to touch it, and it’s coming out of our Docker or whatever container registry. So if you’re on AWS, you can —” it’s that hard to get your own container registry. But then you know at least where this thing is coming from. And if there’s vulnerabilities or whatever, you can work around that rather than like, “Hey, we’ve got 15 different containers that we don’t know where the heck they came from because someone needed this, so they pulled that down, and someone needed that. They pulled that down.” And all of a sudden some CDE comes out, “Oh boy, we got to figure this out now.” And again, you don’t want to run around with things on fire. It’s just — it’s not fun.
Ben: Yeah. I know Google creates these sorts of hardened containers, and I know there’s sort of strip-down-based images. Do you have any tips for how you’d start picking which base image to go with? Or does that depend upon where you are in your journey?
Dave: It does depend on where you are in your journey. I get asked, by clients, all the time like, “Which tool should I pick for this? Which tool should I pick for that?” Number one, I don’t really care. I don’t mean that in a flippant like I-don’t-care-about-the-customer way, but that’s not going to be the thing that’s going to be the biggest deal, like which tool you pick. What we learn from the DevOps movement — and Nicole Forsgren is great about giving talks about this — is architecture matters, technology, not so much. And so it’s really about being deliberate about having these processes in place to either vet the images, or like in the case of — she talks a lot, I think, about continuous delivery. Having these practices and these processes in place is what’s the most important. Which tool you pick is sort of —
Ben: Yeah. We switch them out and pick the right tool at the right time.
Dave: If you have a good process, you can switch them out. That’s really important, right, this composable idea. Ultimately, my advice to them is, “Always pick the tool or whatever that fits your company’s culture the best.” If you’re 100% all-in on AWS or Azure or whatever, maybe the Google tools aren’t the right tools. Maybe the Google tools are built in such a way that they’re opinionated about the way that Google does things on GCP. That could be for anything. And so I sort of recommend to people like, “Look at your company culture, look at the way that you work now, and then try to figure out which of these tools is going to fit your culture best,” because that’s where you’re going to have the most success because you’re going to have the lowest barrier to entry, the least amount of friction. You don’t want to change your culture to fit around the tool. Right? You want to have a tool that fits into the way that you work so that everything flows smoothly because that’s ultimately what we’re looking at here, right, is flow. Right? We talked about it in the beginning when you and I started talking about this idea the First Way and the systems thinking. We want flow of artifacts, of work, of all these things to make it easy because that’s going to be where we’re going to get the best results from quality aspect, from security aspect, from all those other kinds of things.
Increasing Feedback Loops and Continual Experimentation
Ben: That touches something else you talked about, which is increasing feedback loops and continual experimentation. So if you can experiment more, quicker, and without a big overhead to the company, it’s a better win for the business goals that you need.
Dave: You just touched on the Second and Third Ways of DevOps. This idea of flow or whatever, and getting these artifacts out there, has huge benefits for security. If you’re looking at — we just used as our example, right, for the containers like, “Oh, I discovered a vulnerability in a container. Now we have to get new containers out there.” Do you want that to be like an all-hands-on-deck firefight, or do you want that to be something that’s just normal? I mean, the answer is kind of obvious. If I discover a security vulnerability in my PHP code and we have to patch that and get that out there, do you want that to be a normal thing, or do you want that to be an all-hands-on-deck firefight? I mean, it’s such a core concept about this idea of being able to ship software regularly, easily, not having a lot of gates in the way, but still with high quality. Right? So I should have unit tests or integration tests that are testing for security things so that it’s not required for everyone to stop the whole world in order to get those things out there.
Dave: At the same time, I need to make sure that I can get software out there so that if I do discover that PHP vulnerability, then the developer’s like, “Oh, I fixed this.” Or maybe it’s not even a PHP vulnerability. Maybe it’s something that they wrote in the code that’s a vulnerability. Maybe they didn’t check input, and now we’re vulnerable to SQL injection or whatever. The developers should be like, “I can handle this, not a problem. Here’s the fix.” Everyone agrees on the fix, code review, blah, blah, blah, shipped, boom, out in production. That’s great. That’s what you want. You want that for your regular software lifecycle, and you want that for your security lifecycle.
Dave: And it’s the same thing like if I’m going to invest in making sure that this is a container that’s blast, I should put some work into that Packer or scripting or whatever I’m going to use. That’s not an endorsement of HashiCorp even though they make great tools. But invest in that so that, “Oh yeah, there’s a CVE that came out for this thing. We have to upgrade the container or whatever.” Fine. We upgrade the container. We lay our software down on it. We ship that out through our testing environments. Everything’s fine now. Now we go do our blue-green deployment or however we like to do our rollouts. And now that container’s out in production because this is normal. This is like the way that we work. And for the DevOps Three Ways or whatever, that Third Way is — repetition and practice leads to mastery is one of them. If this is repetitive for me, I can just ship a brand new updated container whenever I want, then it becomes pretty easy. You can extend that Third Way in security into other things.
DevOps Game Days Explained
Dave: One of the things I like to talk about in that talk is game days. So game days are a great way of getting that feedback, which is the Second Way of DevOps. So just take the example you and I were just talking about. I have a CVE that got released for some container image that I’m using. In a Game Day, we could say, make it theoretical. Don’t wait until the actual CVE is out there. But like, “Hey, we want to rev our container. How hard is that? Let’s do it, boom, now. Let’s do it on a Tuesday afternoon.” Everyone gets together, and we go through the exercise of updating a container in production. That is the repetition and practice lead. That is a “Hey, I’m going to simulate a situation.” When I get into the actual situation, I already know, “Hey, we discovered that this part of the process sucks,” or “This part of the process takes too long,” so we can fix those things ahead of time instead of finding out that this part of the process sucks or this part of the thing takes too long when we’re actually vulnerable, and we’re trying to get that thing out. That’s a horrible time to find out about that. This is why companies do disaster-recovery exercises, right, because you don’t want to find out that your failover to some other data center is broken. The game day idea is the stuff that Google and Amazon and people like that were pioneering back in the day. But that’s what they were doing. They were talking about failing over data centers and things like that. So those things can apply to security just as well because you don’t want to find out that your process for X or Y is sort of busted.
Ben: And I think you touched on an important point there in which it’s the process can also include people, and maybe the DevOps team haven’t worked with the security people on something, and there might be communication and logistic issues that go beyond just infrastructure. It’s code and just technology, so.
Dave: Yeah. I mean, I’m a huge fan of agile. And in agile we have the idea of retrospectives. Run the Game Day. To your point, figure out if the DevOps team is working well with the security team. And if not, even if they are, don’t just do retrospectives for things that go badly. You should also do retrospectives for things that go well. But sit down afterwards — everyone’s got a different name for this. But sit down afterwards and be like, “Hey, what went well? What didn’t go well? What could we improve?” because any of those investments are going to be things that pay off for you much better over time.
The Largest Blind Spot for Companies
Ben: Mangoteque, you provide a range of services. One is a sort of assessment of the current state of a company. What’s the biggest blind spot you see in companies you’ve worked with?
Dave: “It depends” is the right answer to everything. Right? It really depends on the size of the company. But what I’ve seen — and it’s awesome because it kind of goes back to one of the things we were just talking about - is people don’t invest in the delivery of software as much as they probably could. And so our example, that you and I just talked about, was like, “Hey, I discovered we’re vulnerable to some SQL injection attack. Now I have to get the software out.” For a lot of people, that’s hard. For a lot of people — for a lot of companies getting software out into production is hard, and that’s not a good thing. You don’t want to be doing that. Erik Ries wrote this book called The Lean Startup. It’s pretty good. I like it, whatever. But it’s pretty good in that it doesn’t — I don’t even think it applies just to start-ups. I think it applies to lots of companies. And one of the things he makes this big point about is this idea of being able to run experiments, and an experiment for him is releasing code. And he’s like, “You should be able to run experiments very often because you can get that feedback from your customers about what works and what doesn’t.” And so for companies that are struggling to release software — and by struggling, “Hey, we release once a month,” or “We release once a quarter,” or whatever — you don’t get that feedback very often. Let’s say you release once a month —
Ben: [crosstalk] —
Dave: —you get feedback once a month.
Ben: That’s 12 times a year.
Dave: Well, are you going to be at my company? My company releases three times a day. So I’m getting feedback three times a day. You’re getting feedback once a month. Who’s going to do better in the marketplace? It’s not really fair. I’ll just crush you. So people have to be good at delivering software. And that includes the quality and all those other aspects, including security. Because if I release a bunch of security-riddled, holed garbage three times a day and my competitors release one really good release a month, I’m not doing myself any favors. Those things are important. I think companies, really, they focus a lot on, “Oh, we’re using Golang,” or “We’re using Java,” or “We’re using Python”. And they don’t spend a lot of time thinking about, “How do we make sure that we’re releasing very often, consistently, and with high quality.” That’s what I work with companies on. So I want to make sure that they’re able to do that, so they can ship those fixes, and they can have a lot of confidence in the things that they’re shipping.
What Systems Thinking Means
Ben: You kind of mentioned earlier, it’s not really your tools, it’s more architecture. Where does architecture look to ship software faster?
Dave: Architecture meaning the systems thinking view, right, the approach to what we’re doing. So that architecture is going to be — you can call it your software delivery architecture. It’s less about whether or not you’re running on AWS and a lot more about, “Do I have my unit tests? Are my unit tests going to give the developer feedback quickly? Do I have integration tests? Do I have tests that run once a day, whether someone checks in code or not?” But then there’s a whole bunch of other aspects to it. “What’s my monitoring look like? How do I know when I release something whether that was the thing that broke the camel’s back? Maybe I was right near the edge on some queue, and now that we changed something in the software that queue’s going to explode? How do I know? How do I know that it exploded? Do I have monitoring on the queue side? Do I have monitoring on the CPU that —” and then there’s all these questions that come in. “Well, if something’s going wrong and we don’t know what it is, how quickly can we instrument that? How quickly can we get that feedback?” And then it keeps going.
Dave: And then it goes into, “What happens when something blows up. What does our incident response look like?” And obviously, incident response can be a technical thing, a deep technical thing, let’s say a developer thing, or incident response could also apply to security. Right? I mean, in security, we have to practice incident response as well because that’s the nature of the work. At some point, you’re going to have an incident and you don’t want to be making that up as you go along. You want to have a documented procedure. “This is what we do. We create a Slack room. We name it this thing. We have this person is going to take on this role. This person is going to communicate with the executives.” I mean, there’s all kinds of things that we can do to get really good at those things as well.
Tips for Monitoring Security Incidents
Ben: Kind of a good segue into my next question — what are your tips for monitoring security incidents. You’ve covered a lot of it. So it’s all around — you can also include it as part of your Game Days. You run through all this other stuff as part of your Game Day to be like, “How do we communicate? What do we share?” Do you have any other tips for security incidents?
Dave: So I like to hang out once a week with the folks associated with the Lund University in Sweden, like Safety Science Graduate Program, because they’re all about incidents. For them, it’s not just security incidents. It’s nuclear power plant accidents and aircraft accidents, all kinds of stuff. But I think the funny — you’ll probably enjoy this. One of the things that they love to talk about in terms of tips for incident responses is, “Don’t let the executives participate.” It’s such a weird thing to talk about. But—
Ben: And that’s because the most important person in the room sort of extracts what the root cause is.
Dave: The HiPPO, the highest paid person’s opinion or something like that. It’s really about the fact that people don’t feel free to propose ideas or solutions or things like that when the executives are there because it’s — the executives are there. “Well, I don’t want to sound dumb in front of the executives because this is my career. This is my job.” Who wants to sound dumb in front of an executive? And to your point, the executives are like, “Well, maybe we should try turning off all traffic to the internet,” and then you have to sit there and get into an argument with the executive. “Well, that’s not going to solve anything because this that and the —” they’re not helping. Right? It’s the people — and this is something we learn from DevOps. It’s the people who are closest to the information who have the most information. Executives are not close to the information, not the relevant information. They’re close to the information they get from the board. They’re close to the information that maybe they get from the other executives, but they’re not close to the “How is that access control list configured?” If an executive knows how your ACLs were reconfigured, that’s a serious problem in your organization. They’re not contributing at that point, number one. And number two, they’re actually inhibiting that free flow of information and that free flow of ideas.
Dave: And so there’s a famous study that Google did internally. If you go to their re:Work website, I think they have a whole bunch of stuff about it. Duena Blomstrom actually has a whole company about this. They found the highest-performing teams were the ones that had the most psychological safety. It wasn’t who had the most experience or the most senior leadership or who had the most PhDs. It was the ones who had psychological safety. And if you have that psychological safety in an incident where you feel like you can propose any kind of idea, “Maybe they’re doing this,” or “Maybe we should inspect the TCP packets, see if we can identify a signature, and then reach out to people that we know at other companies like ours and see if they’ve seen these signatures, or they’re blocking this stuff, WAF,” or whatever, if you can have those kinds of conversations and feel really safe proposing those ideas, then you’re going to be a higher-performing team, which is going to mean shorter incidents, probably less incidents, all kinds of things like that. So having the executives there is not useful. In fact, it’s actually harmful and counterproductive to the very thing that the executives want, which is how do we get this incident finished as quickly as possible and protect our reputation and the customer damage and all the other stuff? It’s pretty funny that’s the advice to give to people.
Ben: Have you seen any tips for making these sort of spaces in which there’s no dumb questions?
Dave: Like I said, there’s a whole company that this is what they focus on. As an engineering leader having led global SRE organization of lots of teams, it’s allowing people to really — it’s allowing people to understand the other people as people. Right? Because one of the things I always made a big deal about with my team, I hold the team accountable. I don’t hold people accountable. That means that the team is responsible for supporting the other people on the team. If somebody came to me and said, “Oh well, we would have delivered this thing on time, but Ben didn’t get the thing in on time.” “Well, why didn’t Ben get the thing in on time,” and they’ll be like, “Well, we told him to.” That’s not enough.
Ben: Yeah. It’s not a team. [crosstalk] —
Dave: You don’t tell somebody on your team what to do. You say, “Hey, it looks like you’re not going to get this thing in on time. What can we do to help you?” or “Is there something we can clarify for you? Are there resources that you need, or what is it?” Being on a team, where we support each other is a huge step toward psychological safety and then also understanding who these people are and where they come from even on a distributed team. We used to fly people in once a quarter or a few times a year, whatever happened to be depending on the teams, so people would spend time with each other, so that when you go back to your Slack room and you haven’t been with somebody in six weeks or two months. “Oh, such and such, something, or other,” you, “Oh yeah. No, Dave wouldn’t mean it like that. He probably meant it like this,” because you know them. You know them as a person.
Ben: Yeah. The sarcasm is sometimes lost in text space communication.
Dave: Yeah. Those things are huge. And obviously, you can go look at the re:Work website from Google and look up psychological safety, and I’m sure there’s way more stuff in there even. But for me, as an engineering leader, when I was running those organizations, that was something I really emphasized because architecture matters, technology doesn’t. If you have a good architecture of people trusting each other and things like that, then you’re going to come up with the right answers more often than not.
Ben: So you mentioned flying in these remote people. As teams grow, you might have people around the world, distributed. Do you have any best practices for Follow the Sun on 24⁄7 support?
Dave: This kind of comes back to your team idea, the idea of teams that we were discussing. I think one of the most important things is if you’re going to do something like that, Follow the Sun or whatever, make sure that the people who are doing that are on the same team because you really want people’s interests to be aligned. Otherwise, if you don’t do that, then it becomes handing off instead of, let’s say —
Ben: It’s like the US team [inaudible] the UK team as opposed to the ops team.
Dave: Yeah. Like, let’s say it’s midnight my time, and I know that someone’s going to take over at midnight. The bleeping thing starts bleeping. Do I jump on that, jump in on that? Or do I just wait five minutes? And then it’s going to be Ben’s problem. And so, “Hey, now it’s Ben’s problem, and I could go to sleep.” That’s a problem if the incentives aren’t aligned, if people aren’t all being held accountable for the same stuff. Because “Hey, that might not even count against my downtime numbers because now it’s the UK team’s problems. When they finally fix it, then we’ll look at whatever. So what do I care?” Kind of a contrived example. Really, you want those people to be on the same team, so they work together so that, “I will jump in on that problem, and I’ll start trying to fix it.” And then when Ben comes on five minutes later, he’s like, “Hey, I noticed this thing is happening. How can I help? Like, what do you need me to do?” Because this way we’re not handing off in a lean way — handoffs are waste. In this case, it’s not — I don’t even know if it would be really waste as much as, in this case, you want this collaboration. You want so many DevOps talks. The secret to DevOps is collaboration. But yeah, you want that collaboration. You want people to work together. And that makes those sort of situations flow a lot better because everybody is trying to reach the same goal.
Running a Security Incident Retro and Communicating It Externally
Ben: And we sort of touched on this earlier. But let’s say the incident was a security incident. And we’ve talked about the internal communication, not bringing in executives. What about communicating it externally? How do you work as a team and the executives, “This is something for me to report to our customers or not”? How do teams decide that?
Dave: It’s funny because you and I were talking about retrospectives or learning reviews or post-mortems or whatever the word is or whatever. John Oslo has a great thing where he talks about post-mortem analysis or whatever. And there’s an internal post-mortem where we talk about what really happened. And then there’s an external post-mortem, which is more marketing than anything else. Because the internal post-mortem, you’re going to get into the details, and those details are going to be a lot of really important things that you can learn from that incident. I think it was John Oslo also who said something to the effect, “An incident is something you’ve already paid for. If you’ve already paid for it, you might as well get the most learning out of it that you possibly can because you don’t want to go through all that and not learn the most that you possibly can.” And in your internal review or whatever, you’re going to want to talk about what everybody’s perspective was during the incident. What were the things they were seeing? What were the things they were thinking about? Why did they go look at the database? Or why did they go look at the LAF logs? What was the thing that made them decide to go do that?
Dave: Because ultimately what you’re trying to do there is build an understanding of people’s mental models, what the shared mental models were between the different people. Obviously, this doesn’t apply just to CPU load problems or database problems. This is going to apply to security stuff, too. “When we were trying to figure out where the intrusion was coming from or how they got into our supply chain or whatever, why did you choose to do that?” Maybe there’s something that somebody else on your team didn’t even understand about the way a system worked. Building those mental models as a shared mental model are the things that are going to enable your team to have resiliency.
Dave: John makes a big distinction between resiliency and robustness. And robustness would be being able to respond to things that you already anticipated were going to happen, whereas resiliency is your team’s ability to respond to things that they didn’t expect to happen or didn’t know were coming, things like that. The teams that are very resilient are going to have shorter times to recover from any of these incidents because there’s a lot of shared understanding around the team. So there’s not a lot of relying on Ben to show up. “Oh God, thank God Ben’s here because he’s the only one that knows about X.” That’s bad because Ben might want to go on vacation or sleep or eat dinner or all kinds of other things that Ben might want to do. And you don’t want your ability to recover to be solely based on Ben. Oh, it’s fun to pick on you.
Dave: For what you’re going to share with the customers, you’re not going to talk about Dave’s Mental Model when you’re talking to your customers. They don’t know who Dave is. They don’t know anything about Dave’s Mental Models. That’s not going to be important to them. What they want to hear, “We’re aware of the problem. These are the things that we’re going to put in place in order to make sure that this same thing doesn’t happen again.” And maybe you want to make an allusion there, “And we’re going to do these other things, review the last four incidents once a quarter to make sure that everybody understands what happened.” You can say stuff like that, too, but that’s more towards building resiliency than it is to John’s idea of building robustness.
Ben: Yeah. Ultimately, it’s going to happen. It’s only a matter of time when. Has to be ready to have these skills in place?
Dave: We’re very superstitious about that in the ops world. Right? It’s like people, like, “Oh, we haven’t had an outage in three months,” and you’re like, “Ah, don’t say that out loud,” because it’s a matter of when. It’s not a matter of if. I’ve talked to clients or even potential clients, “Oh, we’ve never had an outage.”
Ben: Because at any time —
Dave: I’m like, “You’re the first company in the history of the world that’s never had an outage. So congratulations. That’s pretty cool,” because there’s no way. It’s impossible. That doesn’t happen.
The Three Things Startups Should Focus On
Ben: Startups often have limited resources. And I guess startups can be a two-person company, or we see these sort of IPO companies, which are still kind of classed as startups. This is a bit of an open-ended question. But if you were to have three things they should focus on, what do you think they should be, and why?
Dave: We talked about it — a bunch of them already. Right?
Ben: Or any of the highlights of what we’ve discussed.
Dave: The first one would be speed of delivery. Right? Low lead time for changes. If you look at the — and I have an entire assessment product for software delivery that’s built — Accelerate or DORA. I don’t know why everybody likes to call it, but there are four key metrics that the Accelerate book talks about, the 2019 State DevOps report talks about. They talk about lead time for changes is one of their four key indicators of high-performing software delivery teams. That’s huge. We talked about it. Like, I discover a SQL injection attack. I need to be able to fix that, need to be able to get that thing out there is probably no.1. And then no.2 to focus on would be quality, dovetails perfectly into that in terms of security is an aspect of quality. I don’t want to release junk out there. Those two things, I think, are — those are the foundations of DevOps need and quality. I think those are kind of the number one and number two. And then I think the third thing is — well this is bringing it back to what you and I were talking about — was when we talked about shared mental models. Right? It’s communication. I want to make sure that I have understanding within my team, first of all, but then also understanding between my team and other teams. And this is where lots of DevOps conversations spiral out into people talking about silos and all that other fun stuff.
Dave: I give a talk a few times around the world called The Cognitive Neuroscience of Empathy, You’re a DevOps Natural. And it’s really one of the things that we know from neuroscience is that we treat people who we consider to be in our in-group differently than we treat people that we consider to be in an out-group. We have different mental models of what those people are like versus — people in our group are like. So if I’m on the — let’s pick on the security team this time. If I’m on the security team, but I don’t spend any time getting to understand what the ops people are doing, or if I’m on the ops team and I don’t spend a lot of time understanding what pain the developers have, then we’re just not going to be as effective. What we need to do is reach out to these other groups and spend time with them. Maybe we do brown bag lunches together. One of the things I love to do with my engineers, the SREs, was pair them up on a project with a developer and, “You’re going to be responsible, you two, for getting this new storage system that we’re testing out tested. You’re going to help them spin up the things. You’re going to set the block sizes correctly,” whatever the thing happens to be. And they’re going to ship code, and they’re going to run tests and you’re going to help them run the test or whatever.
Dave: Getting people to work together across teams is a great way of breaking down that in-group out-group wall. Because now, just like you and I talked about earlier — about getting to know people and flying in and spending time together, we’re breaking down these walls. We’re getting an understanding of those other people with those shared mental models and all the other stuff. That communication, I think, is a good third leg to this [inaudible] for three things to focus on. Because it sounds like super hippies like shocker or whatever you want to call it. In the tech world, we’re so used — “What’s the metric? How many transactions per second?” Like that’s all the things we care about. Ask anyone who’s ever been an engineering manager, the tech stuff is easy, the people stuff is hard. This stuff is — not only is it not something that you measure, but it’s also something that’s incredibly important. If we want to have high-performing organizations, whether that’s for security response or development or anything else like that.
Recommended Resources to Learn More
Ben: Yeah. Just to wrap it up, you being, obviously, a prolific speaker, sharing, educating DevOps practitioners, which sites and resources would you recommend for people to learn more about everything you discussed? Obviously, I’ll share a range, show notes. Anything else you’d like to add?
Dave: Sure. Obviously, blog.mangoteque.com is the best place. I do write a bunch of stuff, but I really love the IT Revolution books. Gene Kim started that publishing company, I guess you would call it. The Phoenix Project, The Unicorn Project, The Project to Product Agile Conversation, there’s just so many excellent books that the IT Revolution puts out. The Accelerate Book that we talked about, it’s from IT Revs. I would definitely check out their website. Kind of anything that they publish is gold, I guess I would say. Another thing is find your community. I’m on a few Slacks. Like, there’s Slacks for CTOs. There’s Slacks for SREs. There’s Slacks for security. Find your community and get involved because the worst thing that you could do is sit in isolation and not have exposure to other companies the way that they do things, other ways of thinking about problems, people that you can ask questions to.
Ben: Any tips for finding these Slacks?
Dave: I guess ask your —
Ben: Ask your CTO if they’re in one.
Dave: Yeah. I was about to say those at Twilio has some ad that says, “Ask your developer.” I would ask people in your community. For security people, there would be like the ISSA with one great resource. Find your local ISSA chapter. See what’s going on in there. Ask people in that community where they go, who they talk to. The DevOpsDays kind of follows a similar pattern where it’s all very local. So like, you can get involved with the people in your community who are in your community. Right?
Ben: The B sites to RSA. It’s always the practitioner’s fee that vendors and CIOs, and you get the real content on the B sites instead of the —
Dave: Oh yeah. That is a great example. I was like, “Yeah. RSA is great,” but at RSA it’s not local people that you get involved with. Those relationships will be really important to you even over the course of your career. “I want to move on to some other company or whatever. I’m bored here. They’re not challenging me enough. I’m going to call Ben and be like, ‘Hey, what do you have going on over there? Because I’m looking around for my next role.’” And being plugged into those communities is a huge part of that.
Ben: Well, thank you, Dave, for joining us today. It was a pleasure. Thanks for listening to our first episode. If you have any suggestions for guests or topics, please send an email at [email protected] This podcast was created by Teleport. Teleport allows engineers and security professionals to unified access for SSH servers, communities, clusters, web applications, and databases across all environments. To learn more, visit us at goteleport.com.