In this tenth episode of Access Control, a podcast providing practical security advice for startups, Developer Relations Engineer at Teleport Ben Arent chats with Max Burkhardt. Ben recently came across Max’s work via the Figma blog, in which Max used off-the-shelf AWS tech to secure a collection of internal web apps at Figma. After sharing internally, Teleport’s IT director mentioned he had worked with Max at AirBnB, and we had to chat with Max. Max is currently a security engineer at Figma.
Key topics on Access Control Podcast: Episode 10 - Protecting Internal Apps at Figma
- In hyper growth companies, hyper growth itself is one of the key assets that need to be protected.
- It’s important not to draw too many lines between security roles in different subfields (securing engineering, data security, production security) since there are increasingly crossover points between infrastructure, security, and application security in the cloud age.
- There are differences in how B2B and B2C companies think about scale and about compliance.
- The desire to have nicely designed, effective internal web applications (such as a web UI to support various operations) is definitely growing. Figma decided to invest time in this area and built a really well-structured, effective approach early on.
- Some functionality works best as a command line tool, and in certain cases, it’s the right approach.
- Figma uses AWS for most of its cloud infrastructure, and uses Okta for employee authentication and authorization.
- Application load balancers (ALBs) are powerful reverse proxies that Amazon provides as a service, basically giving you an API to configure them.
Expanding your knowledge on Access Control Podcast: Episode 10 - Protecting Internal Apps at Figma
- Inside Figma: securing internal web apps
- Teleport Application Access
- Teleport Quick Start
- Teleport Access Plane
Ben: 00:00:00.981 Welcome to Access Control, a podcast providing practical security advice for startups. Advice from people who’ve been there. Each episode, we’ll interview a leader in their field and learn best practices and practical tips for securing your org. For today’s episode, I’ll be talking to Max Burkhardt. I recently came across Max’s work for the Figma blog, in which Max used off-the-shelf AWS tech to secure a collection of internal web applications for Figma. After sharing the blog post internally, our IT director mentioned that he’d worked with Max previously at Airbnb and said Max would be a great addition to the podcast. Max is currently a security engineer at Figma. Hi, Max. Thanks for joining us today.
Max: 00:00:37.384 Hi, there. A pleasure to be here.
Current and last role
Ben: 00:00:38.480 So to kick it off, can you tell me a bit about your current and last role?
Max: 00:00:41.375 Yeah. So I’m a security engineer at Figma. I tend to have a little bit of a focus on infrastructure security here, but being a security engineer at a small startup, everyone on the security team, there’s now five of us. We all wear a lot of hats and wherever the business’s need is greatest, we go there. So it’s a bit of a mix of infrastructure, application security, corporate security, anything you can possibly think of. I came here from Airbnb, where I worked more on the AppSec side of things, that being a little bit of a bigger team with a little more opportunity for a specialization throughout my time working on sort of the defensive side of things. It’s been fun to see these lines get a little more blurry and infrastructure, AppSec, all kind of coming together into a sort of unified practice.
Security at hyper growth startups
Ben: 00:01:25.309 Both Figma and Airbnb, I’d categorize these as the hyper growth startups going — probably going from like 10 to 100 to 1,000 employees within a couple of years. How are security challenges different in these type of startups?
Max: 00:01:40.907 I think the first thing you really have to come to grips with is that that hyper growth is one of the key assets you need to protect. It’s not just the company’s ability to protect the data it’s holding — it’s the company’s ability to continue growing at that rate and to really achieve the success that it has the potential to achieve. And so any sort of security projects or initiatives or policies that could really negatively harm that growth could be fatal for your team and your security program. It’s about thinking, well, what’s the best thing for our users? But also what’s the best thing for maintaining the pace at which we can grow? And then also think — realizing that the assumptions you have about how security works at a place or how the business model works, or what is the most important thing to protect right now, that’s going to change, and your team is going to have to flex with it. Protect the growth and realize that the ground underneath you is going to move a little bit — try to enjoy that. Be along for the ride.
Ben: 00:02:46.164 Yeah. And as the ground — I guess it changes beneath you and from up above. Do you find you’d get more pressure from the same — more engineers joining, or more CIOs, CISOs giving top-down recommendations?
Max: 00:02:58.319 In my experience, it’s always been much more on the bottom-up side of things, right? If you have one leader who has certain goals or plans for the organization or the security program, you know that’s somebody you can talk to and build a strategy with or have a dialog. And it’s sort of something that you can manage. But if you have 50 engineers start one Monday and then 50 more the Monday after that, and 50 more the Monday after that, and you get this mass that maybe doesn’t agree with how you limit access or believes that there’s — they can get more work done faster if they kind of find new and innovative ways to change the infrastructure. It’s not something you can go and have a meeting with somebody and tell them what to do, right? There are difficulties in these sort of hyper-growth scenarios where you need to be able to have a security program that scales to a ton of new people joining constantly and is able to integrate the best ideas they have and bring them in, but also sort of gently guide folks away from things with sharp edges, right? How can you make it so that when you have all these people joining who are going to be coming from all over the place — experienced employees, new grads, people from radically different security cultures — how can you make sure that you’re integrating them into your security program in a way that’s efficient?
Ben: 00:04:14.800 I can also imagine that in these hyper growth companies, things change so quickly that you might still have some projects from the early days still hanging around. And it can be a lot to sort of update and keep those maintained.
Max: 00:04:26.331 Yes, tech that is a — it’s a universal constant, right? But in my experience, actually, you always think that your tech debt is the worst. But in these sort of younger — more bottom up companies, I actually think that the tech debt is generally getting better and better, because something that is built today quickly, in three years it might be tech debt again. But still, the foundations that it was built with today are probably more stable than they were like 5 or 10 years ago, right? We look at companies that have —
Ben: 00:04:54.949 It’s probably built with more automation.
Max: 00:04:56.496 Yeah, and if we look at a company that has tech debt from 10 or 15 years ago, we’re talking about a server that’s sitting under someone’s desk that is running some sort of critical components, right? And that still exists at a lot of companies and it’s — including ones in Silicon Valley. That sort of stuff is a lot harder to deal with than an EC2 instance that isn’t particularly well-managed. Tech debt is always going to be there, and it’s always going to be frustrating. But as we kind of improve those foundations that we’re building stuff on, I think it is getting a little bit better.
Security roles in three subfields
Ben: 00:05:23.504 You’ve worked in a range of security roles in the different sort of subfields. I believe you can sort of cover and describe what’s the difference between three of them — I think you quoted security engineering, data security, and production security. What did you want to kick it off with? Was it security? Securing eng?
Max: 00:05:41.955 I try not to think about these fields that differently. I think that my general approach is to kind of have this model of at the end of the day — there’s some data that you’re trying to protect, whether that is rows in a database or content in a web page in the case of AppSec or data on laptops or in local databases, something like that. And then you’re working in some system which has some set of rules. And this might be how does the web browser security model work? How does the same origin policy affect the security of that content you’re trying to secure? Or it could be the rules — like how does AWS allow you to secure your content or govern access to it? Then there are all of the people who are going to be working on that system, and those are your engineers or your support folks, or could be many other people.
Ben: 00:06:30.646 So you kind of have these three components, which is like, where is the data? How do you understand where it is? What is the model that you have to work in? And how do you make sure that as that system evolves, it gets more secure as opposed to less?
Max: 00:06:44.537 Yeah, yeah. I try not to draw too many lines between them, because I think that also we’re seeing more and more kind of crossover points where things like infrastructure, security, and application security can get much closer in sort of the cloud age, right? Now, instead of building out a new service by purchasing hardware and then setting up a database on it and then setting up all the stuff, you’re just making AWS API calls and suddenly infrastructure appears. And so there is almost like this abstract component to it of, well, it’s all just — rests under the hood and that is what’s creating your infrastructure. I find that sort of abstraction of, okay, what’s the data? What is the series of security assumptions or variants that we have set up in order to protect that? And how are we making sure that those security assumptions remain valid? And so that kind of leads you into the feedback loop of what do we think is the protection? How do we test that that is working correctly? And then how do we make sure that stays true as long as we possibly can?
Ben: 00:07:42.454 Do you have any recommendations for sort of testing and verifying that these assumptions are true?
Max: 00:07:47.371 This is an area where you can really sort of sabotage yourself by making things too manual. I think that there’s a blend of techniques that you have to bring in that address a different kind of levels of validation. So first of all, any sort of constant or automatic validation that something continues to work correctly can be pretty useful. And if you could express that in code, that’s better. So a great example here is, if you have an AWS S3 bucket, or a GCP object bucket or anything like that where you have some sort of hosting that is somewhat at risk of being publicly exposed, periodically checking that it’s not publicly exposed and having some sort of alert on that that’s totally automatic is probably worth your engineering time. That’s kind of the base layer. My team also does what we call this breaker week, which is where basically once a quarter or so we will think internally about what our sort of expectations are for the infrastructure and what we think that those assumptions are and try and test those ourselves. And this is an interesting exercise.
Ben: 00:08:49.271 You’re sort of your own internal threat actors.
Max: 00:08:51.287 Yeah, I wouldn’t really call it internal threat actors, because we’re not necessarily doing the whole attacker simulation thing. I think there’s definitely a place for that in some security programs for a lot of small startups. I think in general, it’s not too useful. But really just thinking, of all the things that I know about whatever should fire when I do this, or when I — if I do this misconfiguration, I should be blocked in this way. Kind of like, try those things out and make sure those matches you expect. And that is in addition to occasionally having external folks come in and do something like a pen test or some sort of offensive assessment. I think doing sort of contractor work to get outsiders to look at your stuff is really, really critical, because they don’t have those assumptions that you have about how things work. But in the same way, your own team will know some of the tricky intricacies that it would take a pen tester eight weeks to find. And you might know, oh, if I can just test this one thing or if I can just somehow affect this one setting, then I could just do this really big compromise. And so there’s this kind of — benefits to those, yeah.
Max: 00:09:55.699 Fundamentally, it’s a lot like testing code, right? You have your sort of automated layer, which would like your unit testing CI. Things that are running all the time. They’re lightweight. They’re just a constant. Then there’s sort of your integration testing, which is how your team thinks about how do we make sure that these things are valid? And then occasional more in-depth tests, which might be a pen test or a more focused engagement. Maybe there’s bug bounty hunters or something like that to really bring in some new thoughts about how you might compromise the security of your system.
How security challenges differ across startups
Ben: 00:10:25.464 Yeah, no, that’s a great answer. So obviously you’ve worked in a range of different startups. How do you think security challenges different between startups, or do you think they’re all kind of generic based on company size?
Max: 00:10:36.628 I don’t think they’re generic. One of the big differences that I’ve seen, at least, is that my last company was much more focused towards consumers, whereas Figma is primarily businesses or at least commercially it’s primarily working with businesses. And this changes a little bit how you think about scale, and it also changes how you think about compliance. Yeah, I think that I had an overly pessimistic view of compliance coming into the security industry, because the compliance situation is famous for box-checking and having these controls that are not valuable and that sort of thing. And that’s definitely a risk. I think that in working at a company that’s more business-focused, I have come to realize there are ways in which you can do compliance that are truly helpful to your security program. It’s very easy as a technically minded security engineer to be like, okay, well, we’re just going to go and build all this stuff. And fitting a specific compliance regime is not something that we care about, etc. And then your company scales a lot and things change, and then you end up talking to some engineer and they’re like, oh, I didn’t realize that we had to have 2-FA on connecting to servers. And you think like, wait a minute, of course you do. We’ve always had that. And then you realize you built all this technical stuff to add 2FA once, but you never made a policy about, oh, you should have 2FA. And now when you have an engineering organization that’s building things in a million directions at once and the security team isn’t going to be involved with every single project, suddenly your lack of any documentation about what you would expect to be the security baselines in an environment means that you have this really big gap.
Max: 00:12:18.714 Certainly, working more on the B2B side has taught me a little bit about the value of having a compliance program that’s well thought out and kind of gives you the structure in which to keep building security products in a way that’s sustainable. And I think that’s kind of cool.
Ben: 00:12:34.055 Do you go through any formal compliance programs like SOC2?
Max: 00:12:39.950 Yeah, so Figma is SOC 2 and ISO 27001, and we’re continuing to kind of evaluate what the compliance landscape is for — for what’s valuable there.
Ben: 00:12:50.241 And so how does that compare to your own internal compliance recommendations?
Max: 00:12:55.805 I think that generally these systems are pretty broad. And so it’s really sort of a matching exercise of how do these things that we all agree are good ideas, like having an asset inventory or making sure that code is well reviewed and well tested before shipping it, and then sort of mashing that up to existing compliance. Your internal compliance should in many ways be stricter than what the — sort of like public compliance regimes are — because you should be holding yourself to a high standard. But there will also be cases in which compliance systems tend to have a lot of very strict requirements around documentation or writing processes for things. There’s cases where you’re going to need to do a little extra work to fit with those sort of public standards, which is just sort of work that you have to deal with.
Ben: 00:13:45.117 Many people would think Figma is just a web app for sharing sort of squares. Why is it important that Figma has such strong security?
Max: 00:13:54.894 The way that I think about it is that in many ways, Figma is about taking something that used to happen on people’s desktops, building designs or mockups with [inaudible] client local tools and taking that to the cloud. And so in many ways, we’re doing to this creative process what Google Docs did to Microsoft Word, right? And that leap to the cloud can be scary in many cases. You’re talking about all of the designs for your company’s next application. With FigJam, which is our new collaborative whiteboarding product, we’re talking about the brainstorms that your team has over what it’s building for the next two years, or basically all of this internal conversation. In many cases, it’s rather private intellectual property or designs, plans, what’s happening next. That’s all stuff that is really well-suited to the Figma platform. So making sure that that is as well-protected as possible is really critical for us, because that sort of confidence is something that customers absolutely deserve, and it’s something that they should have, is the knowledge that when they make something in Figma, it’s going to be safe and only the people who should be allowed to see it will be allowed to see it.
Ben: 00:15:02.007 I have some friends at Apple and there’s always infamous secret cloths that they have to put over their hardware prototypes.
Max: 00:15:07.959 Yeah, the secrecy of one’s intellectual property varies per customer, but it’s kind of best to assume that everyone wants to keep it under wraps until it’s ready to go.
Ben: 00:15:17.081 Yeah.
Max: 00:15:17.916 So that’s something that we strive to provide.
Ben: 00:15:19.793 Shifting gears, I know in your resume you talk about like minimizing toil, and we sort of touched this bit about CICD for checks. How else do you think about sort of minimizing toil and especially as you grow, sort of being more efficient?
Max: 00:15:34.629 The framing that I like to think about this in is that when we are looking at tasks that you might have to do, whether it is triaging CDEs or handling HackerOne bug bounty report something, there are certain things that robots or code is really good at doing. And then there are other things that humans are sort of required to do. And trying to think about things carefully in between those two categories can possibly highlight some improvements in how you do processes. Like an example would be a really classic, high toil task is doing some sort of patch triage, right? Periodically making sure that the underlying infrastructure that runs your application is well patched. If you have to have somebody who goes and reads every CDE entry and then looks at every place that it is on a server and makes a big Excel spreadsheet, that’s an extremely high toil task, and it’s the sort of thing that machines are really good at doing. That’s something where you should be investing in automation. Trying to avoid doing things twice is another good way to look at this, I think. Like, how can we make it so that humans are doing creative and interesting work that they do once and then the answer for that is saved?
Max: 00:16:49.239 The final thing that I like to think about, and I think this is just sort of like a Figma values thing, is that I think there’s a lot of value you can get out of trying to make certain manual processes that have to be manual a little bit more fun. An example of this is that processing vulnerability reports, sometimes there are human elements. If you are reading a CDE, you need to decide like, okay, is this — does the severity associate with the CDE in the database truly match our internal assessment of that? Because we might use a particular package in a different way or there’s some variation there. And so that sort of needs a human to look at it briefly and then determine like, what should we do here? Or our last maker week at Figma, which is a sort of like week to explore and try making new things, I built a plugin so that given a list of CDEs, it would populate a Figma file with a bunch of these little cards, and it would auto-generate little bug images for each one. And then team members could collaboratively go and stamp on these little cards whether or not they thought it was important to be fixed or not. And then the tool will extract that information and then produce machine [inaudible] reports that could then be fed into our patching automations. It’s nice. How can we make something that is like this annoying process, a little more Figma-y and fun, and something that might be almost like a team activity, as opposed to this horrible drudgery that somebody has to do?
Ben: 00:18:18.744 At the beginning talking about patching systems, and that’s sort of similar drudgery. What’s your current process for this? I know it’s always a constant battle, but how much do you update your — do you use AMIs, do you use Packer, do you have any specifics?
Max: 00:18:34.849 We are switching to containers. That was our solution. Stop having instances, and re-spin everything every day or whatever it is. So, yeah, we’ve jumped wholeheartedly into the container lifestyle. I mean the big difference there. Containers have to be pass like anything else, but the validation process is so much smoother, whereas on an instance fleet, it can be a challenging operation to upgrade a package on some percentage of the fleet and then see how that’s doing and identify errors or whatever it is. That is something that might be a scary operation. In a containerized world, patching a package is just like pushing any other code, right? You have a new image, it goes out. There’s a blue-green deploy. You can analyze the same metrics before and after. You can analyze log messages and see if any — see if things are working. And as soon as you have confidence, you roll it out. And so it’s not that the issue went away with containment mechanism to be patched, but development tools around that process are now so much better with the containerized world that it’s not much of a concern for us. So I think it’s such a good validation. Good DevOps means good security, right? The ability to have good introspection into your infrastructure and think about things in sort of this automated way or this stateless way is so powerful.
Ben: 00:19:57.030 Do you have better options for smaller machine image? Hard-earned ones that are better practices than just getting a —
Max: 00:20:03.653 That’s right.
Ben: 00:20:04.280 — Ubuntu off the internet.
Max: 00:20:05.492 And their support for a variety, right? Trying to maintain standard gold images of something among EC2 instances can be a real damper on eng productivity. If you have all this support for Ruby web app because that’s what you used to have. And then a new team is like: “Hey, we can do this way better if we were writing Go or Rust?” And supporting that, if you were trying to stick with this kind of stable gold image approach, can be pretty challenging with Docker containers. Things get a lot easier, right? Everyone kind of invent or make those language tradeoffs on their own time.
Securing internal web apps
Ben: 00:20:38.213 You are invited in here because you’re great at securing internal web applications. Does this sort of come up as a problem at Figma?
Max: 00:20:46.808 The desire to have nicely designed, effective internal web applications is definitely growing. As the team grew and we needed more tools that would support various operations and we needed more sort of paved road capabilities so that instead of telling an engineer like, “Oh, to do this, you SSH in here and then like, run this thing and use this bash script. Can we just have a nice web UI that has the right buttons, right? That’s an investment, and it’s something that is a process, but we saw that this is something that we’re going to want more of. There is no way that as the engineering body scaled, it would want less of these things. And so we figured this is an area where we can invest some time and build a really well-structured, effective approach early on.
Max: 00:21:37.002 A long time ago, the initial approach that had been taken to build some of this was using mutual TLS certificates, so engineers would have a client certificate on their machine, and this would allow them to connect to this private web app in the Figma infrastructure. And that was actually, I think, a pretty good move for a very small startup that just had a small number of employees and wanted to — and didn’t have a corporate VPN and wanted to make sure that the connections to these apps were secure. But the problem is that that really didn’t scale with the engineering body, right? Distributing these certs was hard. Client cert support in browsers is pretty good, but tends to have some bugs, and it didn’t give us the flexibility that we wanted. That’s why we kind of undertook this project of how can we have an infrastructure that allows us to sort of keep scaling up these internal applications, integrates with all the right [inaudible] providers, and provides a bunch of other benefits.
Max: 00:22:29.039 A lot of what we did when starting out — the Figma security team has been growing a lot recently and is still relatively new. A lot of what we’ve done is thinking about, okay, what are the things that are going to have the most leverage as the rest of the company scales? What can we build now that’s going to keep being useful and is going to kind of get ahead of some of these trickier security problems that we’ve all seen at more established companies?
Internal apps in the context of a high bar for UI/UX and product experience
Ben: 00:22:53.831 Yeah. And I can imagine Figma has a high bar for sort of UI, UX product experience, so I imagine that probably crosses over into your internal applications.
Max: 00:23:03.299 Definitely. Trying to make it smooth, and having the security not be something that you notice, was pretty critical to us. It’s been well-validated in that there’s been a few cases where people have been surprised when they lose access to something, and because maybe their account got locked out by accident or they’re accessing from a new device. And they just never knew that there was actually authentication on this web app because it was all just so transparently handled with redirects and nicely live sessions and stuff like that. And they just assumed that it was always open to them. And then it turns out, yeah I know. Yeah. All right. We are actually protecting it with Okta or an ALB or something like that.
Ben: 00:23:42.595 So Okta is your identity provider for these apps.
Max: 00:23:45.134 We use Okta for — though really we’re just using Okta as a SAML provider. There’s nothing that’s that Okta specific. I guess one thing that is very Okta specific is that we’re utilizing Okta’s Device Trust feature. A nice, easy on ramp into getting some better validation around the identities of the machines that are accessing our systems.
Ben: 00:24:03.503 And then do you want to talk about sort of —
Max: 00:24:04.797 As opposed to —
Ben: 00:24:06.200 — the primitives of AWS that you use to make this?
Max: 00:24:08.888 Yeah, sure. So sort of the core one was the application load balancer, ALB. And ALBs are pretty powerful reverse proxies that Amazon provides as a service, basically giving you an API to configure them. And crucially, they have an action called Authenticate, which allows an ALB to use OpenID Connect or OIDC in order to plug into some sort of identity system and authenticate a session before sending its traffic onto the backend. So that’s sort of the core primitive that we’re using here. We then used Cognito, which is another AWS service that’s built around user management to plug into our identity provider. And so that provided the SAML to OIDC Bridge. It also allows us to create sort of on the fly or temporary users if we want to have a user that can access a web app without necessarily being in our Okta.
Ben: 00:25:05.084 And what’s the use case for that?
Max: 00:25:06.395 A good example would be sharing something with a third party.
Ben: 00:25:10.149 A contractor or a —
Max: 00:25:11.317 Sometimes a contractor. Another good example would be, let’s say, that we’re building out a new product feature that a particular customer asked for. We have a really early alpha version running in a dev environment we might want to say like, “Hey, customer, we have this very early thing running here. Do you want to hop on this dev environment and see how it works and see if this fits your requirements.” The code isn’t really ready to go to production, so it can be shared too widely. But we want to give brief access to one of these sort of reusable dev environments for this third party to use. So stuff like that.
Ben: 00:25:45.832 That’s a great use case.
Max: 00:25:46.617 We’re looking into some cases where we want to test preproduction websites on a giant list of random devices, right? And it’s understandable if the employee doesn’t want to log into Okta on 30 Android phones, right? Let’s just provision them with something like these temporary credentials that are just for this testing web app and kind of give them access to that. So that’s been sort of a benefit as well. The final thing, which I think is sort of the part of the project I had most fun with, was setting up sort of automated CLI authentication plugins for this system. This is all built around protecting web APIs, and our V1 was about protecting web apps that you interact with in the browser. Fundamentally, if you’re on an infrastructure team, it’s not going to be long before somebody says, like: “Hey, can I control this with a CLI tool,” right? I don’t want to open a browser to do management of my service or something like that. Can I just make API requests? And at first glance, this is actually kind of hard to pull off with this architecture because there’s a lot of things in there that CLI tools are bad at. Handling redirects and doing web authen when somebody has to authenticate your Okta. It’s sort of awkward. But using an ALB feature that allows it to call a lambda function, we’re able to sort of put this standard API onto every one of these ALBs that allows us to have a CLI authentication page that basically, when you browse to it, will reflect cookies down to a local web server that then the CLI tool can use to attach to future requests and then authenticate through the ALB.
Ben: 00:27:17.215 For people who have filled internal CLI tooling, it’s just sort of a small feature. They just sort of add into it, or?
Max: 00:27:23.439 Yes, it’s a single method called that basically, you call it. And then when a user uses that tool, they will see a web browser pop up that says, like, “Hey, your CLI tool is trying to authenticate you to the service. Do you want to proceed?” Assuming that you have an active session. And if you don’t, you’ll get rejected to an off-flow. Click okay, and then that one method call in your CLI tool returns a set of cookies that you can attach.
Ben: 00:27:48.404 How long is the duration of your sessions?
Max: 00:27:50.469 It depends on the application. I think we generally use it for around eight hours. Yeah, it’s generally one business day for most of these things. The joke is always like, well, after eight hours, you should go home anyway. We don’t want you working too much longer than that.
Ben: 00:28:04.878 Probably the next thing is — do people also try and use this for automated tasks? Or do you have another flow for making sure that people’s scripts and various things have a different authentication token?
Max: 00:28:15.919 Generally for something that’s purely automated, we would push them to some sort of other solution. I found that trying to meld the authentication needs of humans and machines can be really challenging. When we’re looking at automated stuff, we generally are pushing teams. Okay, well, this should probably be a service or some sort of scheduled task that runs in AWS, as opposed to something that is going through the public internet through these ALBs to accomplish something. Generally, we’ve pushed people away from that, and I think that’s actually somewhat of a benefit. Creating this system that — it all works really nicely, assuming that you have an Okta credential and an MFA session that is fresh, right? And so somebody would have to do a lot of work in order to make some sort of long-lived access to these apps. And so it really discourages people from the script-running-under-their-desk kind of approach to doing things. They’re more likely to think like: “Oh, maybe I should run this in the server and then we can set something up for that.”
Ben: 00:29:15.950 What do you do with access logs?
Max: 00:29:18.212 ALB has some pretty nice basic things set up there. You can just deliver logs to an S3 bucket. At that point, it’s kind of up to you about what tools you want to use. We use a tool called Panther pretty heavily here.
Ben: 00:29:30.707 Oh yeah, we use it too.
Max: 00:29:31.636 It’s nice because Panther is fairly straightforward in what it provides, right? It’s basically a way to efficiently schedule running Python detections across a lot of logs, sources, and cloud security resources. For a team of engineers like ours, that’s perfect, right? We want the full power of Python to be able to go write arbitrarily complex detections and systems for identifying bad behavior. And so it sort of fits naturally in there because Panther can do things like processes, access logs, if we wish.
Ben: 00:30:00.516 Is there any specific bad behavior that you’re on the lookout?
Max: 00:30:04.274 Not specific. I think that the thing that we’re sort of relying on here is that because we are generally collecting data from users who are out somewhere on the public internet and these users are generally touching a lot of different web applications at once, right? Like a user signs in in the morning, they probably talk to Gmail. They probably might use an internal app, they might access Indesk, etc. We can collect all of those logs and see patterns between them and also see anomalies in there. It’s not that we’re necessarily looking for like, oh, somebody has sent us a null byte in a request. And that’s clearly an attack, right? These things are on the public internet. They’re getting scammer traffic all the time, but we can make smarter decisions around like, okay, we know that this user has — was like using Gmail on this IP address like two minutes ago. And they also tend to be in this particular area. And then suddenly we saw this request to an internal application from somewhere totally different. That might be something that’s interesting to look into. This kind of continuing to get those signals can help you build a model.
Advice for startups building security teams
Ben: 00:31:10.866 Over probably your past experience, you’ve helped build a lot of security teams. Do you have any advice for people who are startups that are building security teams?
Max: 00:31:19.157 There’s two big pieces of advice, I guess. The first is that security ideology is pretty key. As a business leader who’s looking to invest in a security team, you need an idea of what sort of security team that you want. And that should be one of the key things that you’re hiring a security leader to come and implement, right? There’s a lot of different approaches to how one might build a security program that are out there in the tech world. Many of them are valid for different sort of situations, like if you are a contractor working on highly-classified secret stuff for businesses with very strong compliance requirements, that’s probably going to require very different security culture than if you have a really fast-moving social media startup. Mixing a security person who came from one of these into a different environment can be pretty harmful. And so really think about — so what do you envision as the best type of security team and make sure that the security leader that you’re hiring is someone who aligns with that and can build a team and culture around that, and also be on the same page as executives that they want.
Max: 00:32:29.168 The other thing that I like to see when I am sort of evaluating companies that are trying to start security teams is startups that are opening more than one security position. I have seen a lot of cases where a startup will be getting off the ground and they say, okay, we just need a strong security engineer who can kind of bootstrap things and be in IC while also thinking about what’s kind of next for the program. And as an individual contributor, I see that a little bit as a red flag. This company clearly has security needs. They’ve scaled to the point where security is a concern, but they aren’t necessarily ready to invest in having like a full program.
Ben: 00:33:11.124 Or maybe they didn’t have the education of — well, it’s almost like you hire one UX designer. Expect them to be an illustrator, a brander, everything else.
Max: 00:33:19.209 So starting with opening two positions, at least, is a good is a good way to go, right? It’s like show that there is a desire for a team, and also give the first few people that you hire some folks who are kind of on their side from the very beginning, right? Especially as I see who’s interviewing a company and might be interested in joining. It0 can be sort of concerning if you’re joining as the first security hiring and you don’t know, are you going to be fighting uphill battles every day? Is the rest of the engineering team aligned that security’s important and this is something that you need to achieve? Or are you just going to kind of be on your own? And so starting a small security team, but clearly a team, I think is a good way to show this is something that’s a priority for the company. You’re going in and you’re going to have support. And also that just gives you a little more bandwidth that somebody can be thinking about hiring and growing the team and somebody can be thinking about like that bug bounty report that came in yesterday. And there’s a little more bandwidth for all the interrupts that are going to come up for.
Ben: 00:34:19.422 For a startup that is thinking about security team or obviously you’re applying to these jobs, when do you think it’s the right time to hire and start building the security team?
Max: 00:34:29.690 I think it’s really dependent on the business. I think that there are a lot of businesses where for a long time, the goal is about finding what the product is and really building the core experience that’s going to make your company an actual company. Trying to invest in security too early, if that has a risk of slowing down that process, can be really bad. I think there are certain things that — sort of external factors that can mean something. You need a security team very quickly. Compliance can be one of these, right? You find that you’re selling to businesses and those businesses have more stringent security requirements for suppliers. You made just need a security team in order to close deals. And the other one, of course, is that if you find that suddenly your product is growing quite successful and you have a lot of people’s sensitive data, right? You have a, I would say, a moral obligation to protect that. That can sometimes be an external factor, too, right? You didn’t know that you were going to go viral, but you did. And now you have a lot of stuff that people might be interested in stealing from you, and that’s a time that means you need to get on it. But I don’t think that there’s any sort of ratio that can really be here.
Max: 00:35:38.648 I guess one thing I would say is that if you are a security company, you should have a security team from day one. I think that that is a — it sounds obvious, but I think there are some examples where —
Thoughts on when to buy off-the-shelf tooling vs. buying
Ben: 00:35:51.180 Drink their own champagne, as it were. Thinking about growing team, I think it’s also at what point do you purchased the solutions. This is the eternal question of when do you buy something v. build it? And sort of what’s your thought process on purchasing a solution v. building it?
Max: 00:36:06.598 When you can identify a product that is going to remove a lot of that toil or where it can do something that is clear a lot of other companies or entities have needed, then it makes a lot of sense to buy, right? When it’s a product that is solving a sort of generic or sort of classic problem, then it’s likely that you can save yourself a lot of work and not reinvent the wheel by purchasing that. I think the place where you have to be careful is if you are looking at a product and finding that you’re going to need to being doing a ton of custom work or really investing technically in a particular solution, because you don’t know how that’s going to evolve, or how your business is going to evolve, or how their pricing structure is going to evolve. That sort of lock-in can be really scary. I think the Panther example is a good one because one of the challenges that product solves. How do you process a lot of logs all at once? And that’s something that obviously everyone has to do, and it makes sense that there’s sort of this component that’s reusable there. But the detections themselves are just Python. That’s not really a lock-in item, right? A Panther detection that is scanning a Caltrol log, I can work on any cloud Caltrol log even without Panther. It felt like it was a good middle ground of, okay, this is going to automate a lot of stuff and save us from having to write like a log analysis engine, while still we’re not really locked into Panther with like a custom query language or something like — it’s just Python at the end of the day.
Ben: 00:37:37.689 Yeah, it is interesting, but I think another elephant in the room, which — maybe it’s not a big lock-in which would be harmful, but I guess HDL, while it’s its own domain-specific language. it’s pretty generic enough that if you really wanted to switch, you could do it.
Max: 00:37:53.200 Yeah, that’s a tough one. I mean, I write a lot of Terraform every single day, and I have a lot of feelings about it, I guess. My general opinion there is that some sort of infrastructure as code automation platform is absolutely necessary, whether which one you pick is going to have some impact, of course, but you’ve got to pick one of them. Something that I like fantasizing about, which is not true for reason I understanding, I thought it would be really cool if one of the infrastructure writers like Azure GCP or Amazon could offer some sort of mode or approach where you can only use infrastructure-as-code to configure a system. A lot of the challenges that you —
Ben: 00:38:39.598 Yeah, and get zero kick-offs.
Max: 00:38:41.249 Something like that, but especially a lot of the challenges that you encounter with systems like Terraform or GCP or cloud formation or something like that —
Ben: 00:38:49.794 It’s like state verification.
Max: 00:38:50.662 It’s state verification, and it’s the fact that it’s an unclean interface in between what’s the code and what is actually applied. And how do you transfer it 10 places in the API calls. And if a cloud provider could take the lead and say, like, actually, no. The only API is this infrastructure-as-code declarative things. Actually build the backend with infrastructure-as-code in mind. I think that’d be so powerful. No longer had to deal with all these questions about like, okay, well, what happens if in the middle of your Terraform [inaudible] you hit a quota? And what are the implications of that and how do we deal with unclean slate? If you could solve that problem, I think that’d be really, really cool. I don’t think it will ever happen, because I know why that’s challenging, but it’s still something that I like to fantasize about.
Ben: 00:39:33.109 I know there’s weird quirks. I was setting up a demo environment yesterday and I was like, “Oh, I need a domain name for this endpoint,” and have a domain name fresh accounts. And as the easy way, each domain name has to be registered for 30 days or something, you can’t easily cycle and you can get subdomains and do subdomains, but you need something to start off.
Max: 00:39:53.636 Yeah, the bootstrap problem is tough, but it is rewarding when you are able to extract a lot of that stuff into reusable modules and reusable states that you can say, I want a new Figma that is clean now and you can get there, it just — it is a lot of work and it is a lot of discipline among an infrastructure organization to build things in in [inaudible] code and reusable principles.
Ben: 00:40:17.050 How much access do you give to the — AWS customer, the AWS console for engineers and developers?
Max: 00:40:23.565 It just depends on sort of what you need to do your job. We use AWS SSO to manage roles, which has been really great because we don’t have to really worry about long-lived access keys. Everyone is just Okta authenticated if they have web authen. We require, right, device [inaudible] authen for all of our AWS authentication and that’s really powerful. And then depending on what sort of stuff your job demands, we will assign associated AWS permissions. Something that I’m really looking to do soon is get to a point where we can actually, with policy, ban certain changes in the console and make it so there’s certain things that have to be applied through Terraform or other infrastructure as code mechanisms, especially for cases where we see resources where it’s easy to make mistakes at the beginning. I think a classic example would be spinning up an RDS instance without encryption enabled, right? That’s not something you can change later. You’ve got to do it right the first time, and so having things at the front that can say like, “Actually, you just are not allowed to spin that up,” can save everyone a lot of time.
Ben: 00:41:27.331 Yeah. I know Travis was chasing me down the other day because I had a — one without encrypted, but it was a very quick Panther alert and it was [crosstalk] —
Max: 00:41:33.700 Panther has been helpful there, but yeah. It’s still like, “I thought you did that 15 minutes ago and sorry, you’ll need to do it again.”
Ben: 00:41:41.275 I was speaking to another guest and he sort of saw that as kind of half the shift left, which could be a huge topic. Just give the engineers generally want to do the right things to give them the alerts early, and it’s much easier to change it within that 15 minutes. But he took eight minutes to fire it up to go back and change it again.
Max: 00:41:57.137 Yeah. And unfortunately, here is my long list of requests to AWS. It would be great if we could provide that context in something like a service control policy as well, right? I can write an SEP that will probably ban people from creating an RDS instance that’s not encrypted, but when they hit the create button, they’re just going to the access denied and then they’re going to file it. IT take it like, “Hey, I need access to create RDS instances and IT will be like, “You have access. I don’t know why it’s not working.” We just need a custom message that says your access was denied because you didn’t click the encrypted box like that. That was what it was.
Cybersecurity advice for the class of 2020
Ben: 00:42:29.805 Changing gears a bit. I was reading your LinkedIn prior to recording this, and I saw that you were a member or founding member of Berk Elite, which is the — super cool name. It’s a student-led cyber security club. Just tell me more, and I’d love to know some of the advice for the class of 2020.
Max: 00:42:48.354 I went to UC Berkeley. Go Bears. I was super interested in security from the beginning, found a few like-minded folks who also wanted to get more into security than necessarily we could get through normal class work. UC Berkeley has a really great security class, but there is just one of them, and it tends to fill up with juniors and seniors. So it’s like our freshman or sophomore, we were really looking for more. It sort of grew organically out of connections. People knew somebody who was a hacker or wanted to get more into that. And so we ended up building this team, and we focused on participating in the National Collegiate Cyber Defense Competition, which is a pretty cool event, and it’s definitely a sort of a wild experience. I could talk about it for a long time, but basically it is a game in which you are trying to defend a network against a professional red team that’s trying to break into it and step in. And everything’s already really misconfigured and messed up. And you and your team of seven other people are working furiously over the next 24 hours to try and fix it up and kick out the attackers and move on, and.
Ben: 00:43:55.920 So it’s kind of like a capture the flag, but —
Max: 00:43:57.604 It’s less like a capture the flag and more — because there aren’t really flags, it’s more that you’re trying to keep services up and protect data. So you lose points if your services goes down, you lose points if attackers are able to steal certain types of information, and you can also gain points by doing certain upgrades and challenges throughout the time. It’s actually a pretty complex set of rules. I actually think it’s like other technical competitions like Formula One, where, well, it sounds simple. It’s like you’re trying to race a car. But there’s a bunch of these very specific technical regulations in order to kind of maintain the game aspect. But that does end up making it fun and exciting, and that was a pretty cool thing that we were able to do.
Max: 00:44:34.700 The team has continued on since I graduated, and I think it pivoted to focus a little bit more on the CTF side of things. My takeaway was that those sorts of student led organizations are really fun and actually really valuable in terms of not just the technical things you learn, but sort of the organizational and sort of interpersonal things about what does it take to get a bunch of people excited about security and then fundraise and then schedule practices and then figure out how to get your team all over the country to do these competitions. There’s a lot of cool, valuable stuff there doing it all among students at school.
Ben: 00:45:08.233 I guess that goes back to your “never join a security engineer team of one”.
Max: 00:45:11.765 Yeah, I mean, it’s having backup is great. Right? It’s always really good to have your team of folks, because that — it’s not just bandwidth, it’s also a diversity of thought and perspective and background that is going to be what actually gets you there, right? Everyone has seen such interesting and different configurations and systems and organizations and dysfunctions throughout their time that you’re really going to get a lot better results if you can have a few solid folks who can collaborate on that.
Ben: 00:45:44.279 And I think you said you learned a lot through working in IT for UC [crosstalk].
Max: 00:45:47.831 Yeah. Yeah. So I was lucky enough to — UC Berkeley has a really strong student IT program where I think — yeah, starting in my sophomore year, I was working on a team doing security stuff for student affairs, which was like a large branch of the university administration. And this is a little different than many of my colleagues who basically were doing extracurriculars related to research and sort of more — a more academic path. My path was a lot more I would — I guess I would say practical. And yes, it involved more like dragging servers around from time to time and kind of getting your hands dirty. But it also meant that I was getting very valid job experience from day one that was not just about — again, it’s not just the technical stuff, but it was things like what’s the experience like of having weekly team meetings, building roadmaps, and understanding how office politics work and working with the high-up administrators and how to kind of navigate that and that sort of thing. And I was super glad that I was able to make a bunch of mistakes back then when I was a senior in college and thought I was really smart as opposed to a little bit later in the workforce. So that sort of exposure to IT work, I think, was really, really valuable and kind of getting me ready for a career in security otherwise. And so if IT is a career path or path or job area that is more accessible to you and you have your sights set on security at some point, I think IT is a really, really awesome stepping-stone. And you might even like it. It’s an interesting problem, and I think there’s a lot of opportunity for people to do really useful and effective stuff in that space.
Ben: 00:47:31.671 It’s changed a lot in the last decade or so.
Max: 00:47:34.682 Yeah.
Ben: 00:47:35.133 IT.
Max: 00:47:35.532 Absolutely.
Ben: 00:47:36.028 [crosstalk] an endpoint. To sort of close up, do you have any advice for the class of 2020?
Max: 00:47:40.907 The biggest advice I would give is it’s more than just a fun competition to do while you’re in school. Security is a really awesome path to be on because the need is not going away. And the challenges you get to work on are varied and interesting and always changing. And so it’s just a really exciting field to be a part of. It’s not as stressful as it might sound, and there’s a lot of wins along the way as well.
Ben: 00:48:06.292 And then do you have any last closing thoughts?
Max: 00:48:08.252 Keep ripping at that toil. Find ways to reuse stuff because continuously reinventing the wheel in security over the last 20 years — I don’t think has gotten us very far. Build secure building blocks that everyone can collaborate on.
Ben: 00:48:23.453 Awesome. Well, thank you, Max.
Max: 00:48:25.491 Thank you.