Securing Internal TLS with Ben Burkert and Chris Stolt of Anchor Security

In this episode of the Access Control Podcast, Ben Arent chats with Ben Burkert and Chris Stolt, the founders of Anchor Security. Anchor provides developer-friendly private CAs for internal TLS. Ben and Chris share insights from their extensive experience working at companies like Heroku, Cloudflare, GitHub, and Engine Yard.

Key Discussion Points:

Ben and Chris discuss their motivation for starting Anchor, stemming from frustrating experiences with certificate management and outages caused by expired certs throughout their careers.
The evolution of web cryptography is covered, from the early days of SSL to the modern era ushered in by events like the Firesheep exploit, Heartbleed vulnerability, and the emergence of Let's Encrypt.
Ben and Chris explain the benefits of using an internal PKI and private CAs rather than public CAs for back-end infrastructure. Private CAs enable shorter certificate lifetimes, protect information about internal infrastructure, and allow customized issuance flows.
To help improve the developer experience with local TLS, Anchor launched lcl.host which provides an easy workflow for developers to use real certificates during local development.
Security best practices are discussed, including using name constraints to limit certificate scope, employing a multi-layered security approach, practicing key rotation and disaster recovery scenarios.
Advice is given for teams new to PKI and MTLS, emphasizing the importance of hands-on experimentation in dev environments to build understanding.

Ben A: 00:00:00.070 Welcome to Access Control, a podcast providing practical security advice for fast-growing organizations, advice from people who've been there. In each episode, we'll interview a leader in their field and learn best practices and practical tips for securing your org. For this episode, I'll be chatting with Ben Burkert and Chris Stolt. Ben and Chris are the founders of Anchor Security. Anchor provides developer-friendly private CAs for internal TLS. Ben and Chris have over two decades of experience working on some of developer tool heavy hitters from Heroku, Cloudflare, GitHub, and Engine Yard. Anchor recently launched lcl.host, helping devs have a better local host setup and helping teams get closer to dev/prod parity. We plan to discuss why it's important, what CAs are, and the history of MTLS and PKI. I think this is my first other Ben on the podcast, so it's great to have you. Do you want to just give a quick introduction to yourself and then sort of what the seed of starting Anchor was?

Chris S: 00:00:58.879 Yeah. Sure. This is Chris, I'll kick him off. So Chris Stolt, I formally ran-- both Bens, as you know, formerly ran customer success and support at Heroku and had a stint at Engine Yard before that and General Electric before that. One of the things that interested me in starting Anchor was a thing that plagued me through my career, through all those different jobs and companies, was running into issues with certs and cert management, trying to set up certificates either locally or for production, certs expiring and causing outages. And that either sucks as an engineer or someone in ops that has to go in and fix it, or it sucks when you're in a customer-facing role and you have to go explain to customers why the platform is down again, and it's down again because of a cert that expired. And that's really not a good position to be in because you're not like there was some anomaly here. It was like we just forgot about this thing, and now you have to suffer as a result of it. I was really interested in solving this problem of certificates and encryption because it's plagued me time and time again in my career. And I don't want other people to have to go through that.

Ben A: 00:02:10.751 And how about yourself, Ben?

Ben B: 00:02:12.879 Same as Chris. I've been an individual contributor in my career and kind of everywhere I went dealt with problems with internal CAs, external CAs, public CAs. I've definitely been woken up multiple times because of expired certs. I've definitely been woken up like a year to the day after previously having an expired cert. And nothing angers me or annoys me more than getting woken up because I forgot to solve a problem a year ago and the tooling's not good to solve that problem. Finally, kind of got annoyed enough with the loss of sleep to say like, "Something needs to be done about this." I started thinking about it, talking with Chris about it. And Let's Encrypt came on the scenes about - I don't know - 7 years ago and really changed how people provision certs for the public web. They created ACME, which is this awesome protocol for getting certificates. And I saw that, and I think I was mostly doing stuff with back-end encryption, and ACME was awesome but just wasn't really as applicable to what I was doing at the time and was really hoping that there was something that would bring ACME to kind of the back end and internal TLS. So Chris and I started working on it, and that's what we're doing.

Ben A: 00:03:23.727 And I know we've been throwing away lots of acronyms at the moment. I think we've had certs, talked about PKI, CAs, MTLS. I think you kind of touched a little bit there is when we're talking about certs expiring, what exactly are we talking about expiring?

Ben B: 00:03:40.999 Certificates have a lifetime to them. They have a start date and an end date. When you go past that end date, that's what's called an expiry, or it's when the certificate expires. It doesn't matter if you're Google or my own personal domain, everyone gets certificates for a certain length of time. And that certificate just will not work. Browsers won't trust it. Clients won't trust it once you go past that point in time where it expires. So it's really important that you stay on top of certificate rotation. Otherwise, you get these outages, and you kind of hear about them all the time. There is a rather famous space company that had a certificate expire in space, and that's a really hard thing to deal with, is rotating a certificate in satellites. So it's not like it's confined to just a small set of companies doing back-end hosting stuff. It's like everyone has to deal with rotating their certificates and that sort of stuff.

Ben A: 00:04:37.440 I think you kind of mentioned Let's Encrypt really changed it because previously, you could basically buy longer certificates. So you could buy a member 1-, 3-, 5-year, even like 10-year cert. And so you could really punt the problem down the road to some degree. But I think Let's Encrypt really revolutionized it by making-- is it every day Let's Encrypt, or is it every week they renew certs?

Ben B: 00:04:58.568 They give you a cert for 90 days, and you're allowed to renew it as quickly as you want, but they do have API limits, so. I'm not sure if you could renew it every day. That's a good question. But you certainly can't renew it every couple minutes.

Ben A: 00:05:13.508 Yeah. You get kind of rate limited. And so probably people are quite familiar with like Certbot that's available from Let's Encrypt or the EFF available for issuing.

Chris S: 00:05:22.924 Yeah. And the other thing with the short windows with Let's Encrypt is it's fine if you're like, "Hey, I want to maybe renew this one certificate once a day or something." But if you have a larger infrastructure and you have many services and servers running and each of those are making those API requests multiple times a day, that's when you can start to-- when you're trying to scale something is when you run into some of those more API limits. And yeah.

Ben B: 00:05:50.617 Yeah. The history of how long certificates are valid for is kind of an interesting evolution over time. There's a group called the CAB Forum, which kind of set the rules for CAs and among other things, how long they're allowed to issue certificates for. Let's Encrypt standardized on 45 days. But before that, it was usually like 1 year was kind of the normal time that CAs would issue certificates for. But that could be up to 10 years. You saw certs in the wild for like 5-year expirations. Since the advent of ACME, allowing a lot of this to be automated, it became possible for the CA, CAB Forum kind of regulators to say, "Okay. Let's drop the length of certificate time." So now operating systems, browsers won't accept certificates that are issued for-- I think it's something like 500 days now is kind of the max that you can get a certificate for. I think we can kind of expect that to keep going down because there's really no good reasons for certificates to live for many years. The kind of shorter you can make that window, the more reasonable that window, just the better security for everybody.

Chris S: 00:07:01.192 Yeah. Again, if you think about, "Oh, I have the cert, and it's valid for multiple years," without necessarily the ability to do a revocation on that cert, if you have some kind of compromise, a key leak, an employee that had access to the key that no longer works for the company, something of that nature, you just have to kind of hope for the best until that cert expires. And that can be a really long time where you have this potential period of vulnerability. Having that long window is nice. You don't have to worry about the cert expiring and then an outage because of that, but it kind of puts you at risk. So yeah. Getting those windows short and tight and having the ability to change and rotate those certs consistently and reliably is sort of an end goal for the public web as well as what we're working on with the internal TLS.

Ben A: 00:07:52.236 Yeah. Definitely. And I think it's also an evolution too. I've been around long enough to remember buying-- we talked to them about SSL certs, fee, TLS, and sort of the changes also in TLS, is like TLS 1.2 or 1.3, all these different standards. From a protocol level, I don't know if you can give a little background of the history of sort of SSL to TLS and sort of where we sort of are now in the sort of underlying technology of transport layer security.

Ben B: 00:08:23.362 I kind of think about it in three phases. And really, we're kind of talking about all of PKI and X.509 and all those things because TLS is kind of like an umbrella that wraps up all these various different things. There's kind of like the pre-web period where public key encryption was invented. You got RSA created, Diffie-Hellman, the algorithms that allow for public key cryptography on the web. They started kind of in the '70s and '80s. You saw like X.509 kind of proposed and started to be standardized. And then in the mid '90s, you had the browser wars, and you saw SSL come out of Netscape. And actually, SSL was renamed into TLS because Microsoft, I believe, didn't want to just adopt the marketing that Netscape was using for this technology. So they kind of said, "All right. We're going to adopt this, but we're going to call it TLS." So SSL, TLS, same thing.

Chris S: 00:09:24.906 Yeah. We're all in on this tech. We just don't want to use your wording. We need to have our own wording. We need to put our own brand on it.

Ben B: 00:09:31.975 Exactly. Yeah. As soon as SSL comes around, it's immediately useful, but it starts to just kind of instantly ossify. And you really don't see many kind of improvements to SSL and TLS. So it kind of enables encryption on the web. So we can now start charging for products with credit cards on the web, which is great. But the security just kind of remains the same until around 2012, 2011. You have two things happen. Firesheep was the first one. That was an interesting-- I wouldn't even call it an exploit. It was a browser plugin that somebody wrote that would allow you to sniff all of the HTTP traffic happening on the network that you're sitting on. So you go to a coffee shop, start this browser plugin, and then see what everybody else in the coffee shop was doing, working on. It really kind of overnight drove the demand to move.

Chris S: 00:10:26.605 Yeah. And just to chime in on that real quick, and it was more than just like, "Oh, I can see what websites people are going to." You can read the data that they're-- the packages that they're sending to those sites. So a website that didn't have encryption on a login page, you can sniff someone's password, etc.

Ben B: 00:10:44.868 Exactly. And then the second big thing to happen was Heartbleed, which was a vulnerability discovered in OpenSSL's code base that allowed you to send a specially crafted message to an SSL server, and you would get back a portion of that server's memory. Soon after it was discovered, people figured out how to read the private keys out of servers just on that little ability to read memory. One of the big things it did was it required everyone to rotate all of their SSL, TLS certificates like overnight. And so it's like these two kind of big, dramatic things for operators and people running websites. It kind of kicked into gear this re-examination of TLS and SSL. And that's when we kind of started seeing improvements being made and some focus being put into just making these things better. So around that time, you started seeing focus on improving TLS. That's kind of when TLS 1.3 got started. Let's Encrypt kind of started a little bit after that. We're in the modern period now where things are getting better. There's a lot of activity around TLS and PKIs and CAs. And so it's kind of the same old technology, but we've now got renewed focus and interest in development on it.

Chris S: 00:12:02.413 A couple other details there too, right? Firesheep happened. And around that time, it was pretty common for a lot of websites to say, "Hey, because of overhead reasons," be it true or not, "we're not going to encrypt our entire site, just the parts with sensitive data that's traveling back and forth. So a login page, maybe somewhere where we accept a credit card, but all these other pages, we're just going to run over HTTP, and that's fine." And once Firesheep happened, there was a big effort to really push like, "Hey." Phones, browsers, computers these days, networks, they're all fast enough. There's no reason not to encrypt your entire site. So there was a much wider adoption of TLS, which is the underpinning for HTTPS running across entire websites. So leading into even Heartbleed, there was a really big adoption of this technology. And then with the Heartbleed happening and Let's Encrypt coming out and saying, "Hey, if everyone's doing this and we really want to push encryption on the open public web, we need automation." And that led to the things Ben's talking about with ACME and what Let's Encrypt-- it's an automation protocol for creating and provisioning and delivering certs. Yeah. So that all slowly coalesced into this direction and to where we are now with some of the newer technologies that we're able to leverage to, as mentioned, bring some of that stuff to the back-end infrastructure.

Ben B: 00:13:31.515 And also, Let's Encrypt made certificates free. Previously, it would cost you 100 bucks to get a TLS/SSL certificate. And so that was another huge driver in seeing adoption.

Chris S: 00:13:43.723 And one of the other amazing things that Let's Encrypt did, this is really widespread now. There weren't a lot of hosting platforms at the time. Heroku was one of the bigger ones. But it allowed hosting platforms like Heroku and all the ones that have come since to say, "Hey, we want to deliver secure websites for our customers when they deploy them. We can now have an integration as a platform with Let's Encrypt to get certificates. So if you deploy your app to Heroku or whatever platform, point your own domain name at it, we can give you a certificate for free right out of the box with zero configuration." And that was a really big step forward in helping encrypt and secure the public web.

Ben A: 00:14:26.581 Yeah. As lots happened. I mean, I think I'm an early enough Heroku user that I think I remember paying $9 a month for the add-on for certificates. But I guess now we've come a long way in that period of time.

Chris S: 00:14:37.969 Yeah. That was pre-Let's Encrypt when we had to-- there was reasons why and as soon as we had the integration, at Heroku, we were able to just be like, "Yep. This is free now."

Ben A: 00:14:50.437 And so we've talked a lot about, I guess, someone in a coffee shop surfing a website, the checkout page. It makes a lot of sense of why you'd have encryption and the web of trust, verifying that the site is who it says it is. What are some reasons that people might want to think about internal PKI as opposed to these external PKI structures such as Let's Encrypt or other CAs available?

Ben B: 00:15:16.884 Yeah. That's a great question. The public CAs, the web PKI CAs, they're great. They do some amazing stuff. You can use them to secure your back-end infrastructure as well so that if you've got services architecture and you've got your front-end service that's accepting web requests and then it kind of makes a couple back-end API calls, you can make sure that those back-end API calls are encrypted. They're going over HTTPS. You can use public CAs to get certificates to secure that traffic. It presents some challenges. We mentioned earlier that there can be low API limits. ACME itself, it's a great protocol. It defines a workflow for provisioning a certificate. That workflow can be on the slow side because it's intended for CAs that don't necessarily know who the end user is that's requesting a certificate. They need that end user to prove that they own the domain for the certificate that they're requesting.

Ben B: 00:16:16.996 So I can't just go to Let's Encrypt and say, "Hey, give me a google.com certificate. And trust me, I won't do anything bad with it." There's a whole portion of the protocol that deals with how end users prove that they own the domains that they're requesting certificates for. And usually they have to do something like publish a DNS record that proves that they have control of DNS or respond to an HTTP request. There's a couple of different ways of doing it. They're all kind of slow because the workflow ends up working like I go to Let's Encrypt. I say, "Hey, give me a certificate for benburkert.com." They'll say, "Okay. You need to prove it. Publish this DNS record." I'll then publish that record. Let's Encrypt will then kind of pull my DNS records until that shows up. And then at that point, they can issue me a certificate. So that can take anywhere from a couple of seconds to a couple of minutes. And so when we were talking about multi-minute workflows, you can't really move that sort of workflow into a container or a service startup time. So you can't really get fast certificates with a public CA.

Ben B: 00:17:16.503 There's another kind of concern with public CAs, is that whenever you provision one of these certificates, it goes into what's effectively a public ledger. It goes into a public log called Certificate Transparency. And that's a great program that started-- I think it started around 2015 or 2016. And what it does is it kind of allows all the CAs to make sure that everyone's issuing certificates properly and no one's cheating and creating certificates for, let's say, Gmail and not telling everybody about it. So you don't have issues with certificates for very high-profile sites getting leaked or created by a CA accidentally. So that means that if your back-end services are using public CA certificates, that could potentially leak information about your back-end systems. Or if you're using, let's say, a subdomain per customer, you're effectively publishing your customer list by provisioning a certificate per customer. There's some great reasons to use public CAs. They mostly deal around with it makes clients really easy. You don't have to configure clients. And it makes provisioning great because you have ACME. And so that's kind of where Anchor comes in. We give you that kind of nice public CA experience without the downsides of those public CAs.

Ben A: 00:18:33.495 And then I think the third type, which we've not really touched on yet, is sort of the developer experience. And I think the importance of most relevant - I'm [thinking?] about front-end developers - is that now you have CSPs, which is like content site policies. You have certain things, like even GraphQL endpoints, which all require HTTPS. And so some front end has become so locked down. It requires HTTPS to work. And so developer workflows also have to match kind of what is production, even just to get their job done. I don't know if you can talk about your experiences sort of helping increase developer productivity.

Chris S: 00:19:11.794 Yeah. Totally. Things like CORS errors, mixed content, some of the things you're mentioning. You can develop locally without worrying about that, without triggering some of those issues. Especially if you're working over localhost. So the browsers are sort of set up to say, "Hey, if this domain name's over localhost, we're going to fudge a few things and kind of give you this sort of fake secure context. We're going to approximate it to let a bunch of things work." So you don't necessarily trip off all the warnings until you push that code up into production or staging somewhere where you're using a real domain name, and then these issues can pop up. They're primarily plaguing front-end developers. And then they're hard to troubleshoot because if you want to troubleshoot that properly, you're like, "Well, why didn't I discover this in the testing that I did in my dev environment?" And that's because of this approximated secure context. So stepping back into the dev environment and saying, "Well, let's set up a real cert here." That's historically been pretty hard. Originally, you had to do a big, long, complicated OpenSSL command at the command line to generate the cert sign-in request and then get the certificate and then do a bunch of other steps to just get the server to actually boot up and read that certificate. And then some things more recently came out, like MakeCert or mkcert that allows you to more easily get a certificate, but you still have all the complications of setting it up inside of your app. And you might need to run this multiple times as certs are expiring and all of this stuff. And then the flip side of that is since using those tools, you'll end up with what's called a self-signed certificate, which means that the thing that invented the certificate is yourself. By default, your browser and your systems aren't going to trust this. So then there's all these other steps to go through to change this.

Chris S: 00:21:01.042 Finally, once you do that and you have a domain name, you have to go edit something like ETC host to point everything locally. And then later on, if you forget that you edited ETC host, you end up hitting issues. Most developers have been there and done that with ETC host. It's a pain. So there's a lot. When Ben and I have been working on Anchor, we're really thinking about, how do we make a really good developer experience? Especially having spent, collectively spent time at Heroku, some places that have been known to have good developer experiences like [Cordas?]. And so we're like, "Hey, can we take all of this complexity and boil it down to a single command?" And that's where we recently built and then launched lcl.host. So if you install the Anchor tool chain using Brew, or you can download it directly off a GitHub, you can run a single command from your application called anchor lcl. It'll walk you through a really quick workflow where you don't really have to give any input, make any decisions. It'll give you a fully qualified domain name, a subdomain of lcl.host, so like projectname lcl.host.com. Those subdomains all point to local hosts in the global DNS entry, so you don't have to edit ETC host. At the end of that command line, you'll have a cert that you can boot your app up with. And then we give you a very short instruction set and an open-source Anchor provided package into your application that will allow your application to work with ACME to provision that certificate from Anchor and boot up automatically over HTTPS locally with a cert, minimal configuration, and just doing the things like that to bring that developer, that tight, smooth, easy developer experience to developers that might want to just run some search locally and run HTTPS locally.

Ben A: 00:22:52.804 Yeah. Actually, I asked our development team, and I think of the 10 people that replied to me, I think there was like 12 different ways of doing it. So I think even just standardization and lots of wacky, weird OpenSSL commands. And they're like, "I've been using this for the last decade. This is my trusted thing."

Ben B: 00:23:11.367 Oh, yeah. I have that just stashed somewhere that, yeah, has the right set of flags to spit out a certificate in just the right way.

Ben A: 00:23:19.621 So I can imagine it probably helps a lot, just having a standard one flow. You don't have to worry about it or forget like, "Oh, what was I kind of doing for my setup?"

Ben B: 00:23:27.533 For sure.

Ben A: 00:23:28.133 Yeah, I've definitely run into the ETC host problem before, forgetting that I've done things.

Chris S: 00:23:32.642 Yeah. It's a pain when you're like, "Why doesn't it work? It works on my phone. It doesn't work on my system. What is happening here?"

Ben A: 00:23:38.145 Yeah. Always works on my machine. It's the easiest excuse of any developer. So kind of going back to, I guess, certificates or CAs in general and Anchor, how does Anchor ensure the security and integrity of the certificates issues? And also, what measurements do you have in place in case of misuse or unauthorized access?

Ben B: 00:24:00.016 We have a few kind of approaches to that. First off, we're different than a public CA in a big way. We build an entire CA chain just for your account, as opposed to using a common CA for all of the users. And actually, that's per environment or realm as well. So your staging environment will use an entirely separate CA than your production environment, which means that your production systems will not trust the certificates that are issued by your staging infrastructure or issued for your staging infrastructure. So that gives you built-in isolation that matches kind of how you already draw boundaries around your own infrastructure. The other big kind of way we've designed this to be secure is through as at the X.509 layer. There are some parts to X.509 that aren't used all that commonly in public CAs. There's a set of constraints called name constraints that allow you to set up what's basically a namespace for the certificates that the CA is allowed to issue certificates for.

Ben B: 00:25:05.580 So if you use like mycorp.it as your internal domain for all of your internal DNS inside your infrastructure, you can actually create a CA that's only allowed to issue certificates that are subdomains of mycorp.it. And then you can kind of layer that so that you can have a sub CA or an intermediate CA. It's kind of like the next level down in the CA chain. You can have one of these intermediates that's only allowed to issue certificates for staging.mycorp.it. And then you can kind of keep going with that to where you can say only this service is allowed-- the CA is only allowed to issue certificates for this service. So you can really get fine-grained scoped access kind of built in at the X.509 layer. And since it's at the X.509 layer, that means that the clients that are connecting to these services, they're going to be validating. They're going to be performing the same sorts of validations and checking these name constraints whenever they connect to a server. Even if there was a situation where a signing key for a CA leaked from Anchor and someone used that key to mint a bad certificate - maybe it's for gmail.com - that certificate wouldn't be trusted by any clients because it wouldn't fall under this name constraint, under this kind of set of approved names. You can really use these to kind of lock down the settings or the rules and access inside your network.

Chris S: 00:26:38.003 And limit blast radiuses of any potential compromises and things of that nature.

Ben A: 00:26:43.749 Do you have best practices on naming or how to think about intermediate certs and root certs? It seems quite open-ended. People can build up, which is, I guess, both a blessing and a curse. Do you have any best practices that you'd like to recommend to people?

Ben B: 00:26:57.107 The way we think about it is that you can think of an intermediate as a template for the leaf certs that they're going to issue. We think about it as there's the root CA certificate. We call that an Anchor CA cert. And then that will issue an intermediate CA certificate. We call that a sub-CA certificate. And that's really the template for the next layer down, which is the leaf certificate. Those are the certificates that go onto your servers. And so when you're creating one of these intermediates, you're doing that because you've got a new service that you want to add encryption to. We'll make sure that we provision a sub-CA, that intermediate for that service. And we'll make sure that we lock it down kind of at that time so that all of the-- there's some homogeneousness to the leaf certificates that are all issued by that same intermediate. They'll all have the same expiration amount of time. They'll all live for maybe four weeks. Or they'll all have the MTLS extension, the certificate client extension use enabled. So we really think about it as when it comes to internal TLS, you can really use these constraints combined with intermediates to build out templating. And it has this really nice property when combined with ACME because we can issue API tokens for your ACME workflow, and we can tie those tokens to these intermediates. So the token inherits all of the kind of ACL, RBAC rules that you've built into this intermediate. And it just kind of falls out rather naturally.

Chris S: 00:28:31.759 Yeah. And also, generally, regardless of what services you're using, generally try to avoid getting wildcard certificates. That kind of is-- any compromise there can be very big. So try and scope things down to as narrow as it makes sense for whatever the use case is. Maybe provision multiple certs for various different services rather than having one cert to rule them all, so to speak.

Ben A: 00:29:00.011 Makes sense.

Chris S: 00:29:00.717 And things like ACME and some of the toolchain even that Anchor provides helps make that possible and a lot easier and more approachable.

Ben A: 00:29:09.195 Yeah, I think that's always the thing, is like it's always-- I mean, we see this with like AWS IAM. People get so frustrated with it. You do like the star, star, and then suddenly you realize you've actually given too much access because people get so frustrated with the tooling. They don't realize how to set the right ACLs on the permissions. Since we're talking about AWS IAM, how do you sort of think of Anchor playing with, let's say, AWS private CAs, and other cloud providers have some form of CA as a service? How does this sort of play in with cloud ecosystems or people on-premise?

Ben B: 00:29:42.638 I think the big advantages of those is the kind of security context that those give you for storing root keys. They have great HSM support built in. Amazon has their KMS product, and it integrates with their CA product. So you already get FIPS-compliant, I believe. Key storage, that's great for the kind of what keeps you up at night. It's leaking that root key. It's kind of losing the full chain of trust. And those products focus a lot on kind of, I would say, the security engineer's use case or maybe just the interfaces that they're used to. They tend to not think about developer problems with certificates and what a developer wants out of a CA. So I believe Google's CA product, they've added ACME support. There are some automation improvements coming along. But there's not really a dev experience or a DevX kind of thought.

Ben A: 00:30:40.838 That's very friendly. Yeah.

Chris S: 00:30:42.112 Yeah. They're kind of focused on like, "Hey, you still have to deal directly with certificates and keys and how to manage them. And we're kind of taking this approach." And if you think about that stuff from a developer standpoint-- and when I say keys here, because keys can mean a lot of different things given the context, talking about the encryption keys or the signing keys, if it's a cert authority, and these are the things that they're making you manage. And we're trying to think of this in terms of, "Hey, what's the developer know like API keys," a.k.a access tokens? I have this API key very easy for me to roll. This is the thing that gives me access to this API. What's that API? It's an ACME endpoint. So I can offload a lot of this key management stuff from myself as a developer. I do not have to rely on my ops team or my security team to manage it and make sure all that stuff's in place. I can just go to a service, get an access token, or an API key in our case with the ACME protocol that's called EAB tokens, use them in my environment like I do with all my other access tokens. And then I have access to this endpoint that provisions certs with the constraints that have been previously set up.

Ben A: 00:31:55.531 Yeah. I think it's sort of the shared responsibility model that Amazon has. Turns out if you actually want to manage it, there's a lot of responsibility on you. You kind of take a lot of the responsibility off. And also the organization problems, I guess dealing with teams, and you can kind of-- it's a good developer experience, but also, it's more secure as well, which I guess is a win-win.

Ben B: 00:32:14.600 Yeah, they're definitely great at the securing-keys part. And we do want to leverage that as well. We do want to make it easy to use your keys in your KMS or in your HSM. One thing we've thought a lot about is, how do we allow people to kind of keep their root keys while still providing this nice ACME interface to actually provisioning leaf certificates, end user certificates. And that's one of the reasons we've built this structure where you have intermediate CAs. That means that you can keep your root key offline in your HSM. And we'll keep these intermediate CAs online so that the ACME services will always be available to you. And then when it's time to rotate those or do maintenance, you can bring your root key online, do the maintenance, and then lock it back up in the HSM. So you kind of get a hybrid where you get this nice ability to use a ACME service without all of the concerns of root key exposure, those sorts of very serious, very important security concerns.

Chris S: 00:33:20.574 Yeah, exactly. And if you think about over the past decade, decade and a half as cloud has sort of developed into an early thing into a mature thing, into a widely adopted thing, one of the biggest criticisms is, "Well, with all the cloud stuff, we don't own the things that we should own. We're now renting space," etc., etc. And so this approach that Ben's talking about allows the end user to still own the entire search chain. They own the key. We don't have access to it. We don't see it.

Ben B: 00:33:51.010 Yeah. We don't see your root key.

Chris S: 00:33:52.145 While still being able to provide them the automation, that high level of automation and experience that you want to get out of something like this. So it's kind of like that best of both the worlds where it's like, well, it's cloud-based, but you still actually maintain and retain ownership.

Ben A: 00:34:08.306 Maintain the key. Yeah. Yeah, it's definitely something that we could probably see more of. I think there was one version of Slack maybe a couple of years ago for the enterprise that you could hold the encryption key for Slack. And if you wanted to rotate it, you could, which is sort of an interesting kind of concept. People can't read your messages, or you could change the encryption key, which is sort of interesting. Going down to the SaaS provider as an end user, being kind of responsible for it. And we talked a little bit about protecting, securing intermediates or the root key. And I think we've kind of alluded to this, that there can be security incidents that can be leaked. And often this leads to what happens. Often people want to do revocation. And I talked to Filippo, who maintains the gossh library. He has an interesting stance on revocation for short-lived certificates. What's sort of Anchor's take on revoking certificates?

Ben B: 00:35:01.703 I don't know what's take off the top of my head, but I would expect it to be pretty similar to ours. We see revocation as kind of a best effort, but really, like the past 20, 30 years have shown that it's really just not as effective as keeping renewal windows and certificate lifetimes kind of as short as you can. So we fully support CRLs, OCSPs, all those things you need for a fully functional CA. But we really recommend short-lived certificates, locking down these certificate namespaces with name constraints, using some of these tools that are available in X.509 that aren't available if you're using public CA.

Ben A: 00:35:43.899 And that's because of the blast radius is if it's a one-day or week certificate, worst-case scenario, if something was to happen, it was only that short period of time for the short-lived set.

Ben B: 00:35:56.392 Exactly.

Chris S: 00:35:57.188 Yeah. One of the things it's worth considering - right? - is if the cert's expired, nothing's going to trust it anymore. That's pretty widely accepted, right? Unless you're completely just not validating certs or something. But if you're revoking, that cert still-- there's nothing inherent in the cert that's revoking. It's like a different service, so to speak, that says, "Hey, don't trust this thing anymore." So now you have to hope that whatever clients are connecting to it and getting that cert, that those clients are smart enough to go check the revocation list or the OCSP endpoints to say, "Is this cert still valid? Should I still trust it?" Even though it's still technically valid because it hasn't expired yet. And the distributed nature of this means that it's not really possible to just go out and say, "Well, we know every cert that ran everywhere and it's been revoked." And if you think about a key compromise type scenario, you might not actually have access to where that cert's running anymore. So that short-lived certificate makes it sort of a non-issue versus we're hoping that everybody is following best practices across the board here.

Ben A: 00:37:03.821 And then for teams that are new to PKI and MTLS, what are some initial steps for people who haven't fully adopted sort of an internal CA to get started? And what other resources would you recommend about learning more about this topic?

Chris S: 00:37:18.511 Well, maybe, Ben, dive into that a little bit. But before we do that, I want to maybe take a step back and just talk about really quickly about an overarching just approach to how do you approach security and think about security. So maybe a little more philosophical or theoretical in terms of what you should do here. It's to always think about approaching security from a multi-layer security effort. So you don't want to have just one security thing in place. As multiple things have shown over the years, it could be a key leak, but it could just be, "Hey, there was something in the-- there was a bug in this protocol that opened up." We've seen this in Postgres core. We've seen it in OpenSSL. We've seen it in browsers and systems. And we've even seen it baked in with Spectre and Meltdown, where it's baked into the chips themselves. So these things can happen all over the place. Take a multi-layered security approach. Don't just say, "Hey, all of our stuff from the client over the public web to the server is encrypted. Also, let's think about, how do we secure things on the back end?" Well, we want to put sensitive stuff beyond a firewall and want to put things over a VPN in a secure network so that the network and the pipes themselves are encrypted. We want to make sure that the data packets that we're putting on those wires inside of those encrypted tunnels are also encrypted. And then your data when it's at rest and it's stored on a disk somewhere, it doesn't matter that that's in a secure vault somewhere. Make sure that's also encrypted at risk. So you're sort of protected on many layers. And yeah. I guess with that as a backdrop, you have anyone to dive in specifically about how to approach PKI and TLS as one of the layers.

Ben B: 00:39:07.263 Yeah, I think if you're trying to-- if you're a developer and you're trying to either understand this at a protocol level or at just how you use this in an application to kind of make things secure, maybe you've got something you're trying to secure. You have something in mind. I think it's good to not be so concerned about the actual security of the certs and keys that you're working with immediately and just get familiar with just kind of how certificates work, what public key encryption is all about, and don't think so much about, "Well, how do I generate a key securely? And then what do I do with that key material?" And there's so much surface area to working with this stuff. And then you add the security context, and it seems to complicate things even further. I think as a developer, you want to start really digging into how this security stuff works.

Ben B: 00:40:08.104 I say just don't even think about kind of the operational security of stuff. Just figure out how to get some certificates, get some keys, and then start trying to plug it into your application. Start figuring out how this stuff works. You can use tools like WireGuard to look at the traffic that's happening. You can kind of peek behind what's going on and the HTTPS traffic and really start understanding things at kind of a building block level. And then you'll have a good idea about when it comes to dealing with keys and the sensitivity of them. That'll kind of maybe feel a little bit easier or less overwhelming once you have a strong fundamental understanding. I don't know if this is the best analogy, but it kind of reminds me of learning Git. Git is just so overwhelming the first couple of times you use it. But once you really kind of understand the internal data structures of Git, it really starts to click why the command line is kind of as not straightforward as it is. And I think a lot of this security stuff, once you can kind of have a mental model of what's happening, it really simplifies things as a developer.

Chris S: 00:41:21.607 One quick point of clarification, Ben, you mentioned using WireGuard to sniff and monitor.

Ben B: 00:41:28.603 Sorry. I meant Wireshark. Yeah.

Chris S: 00:41:31.444 So if you want to view traffic on your local network, use Wireshark. WireGuard, also awesome for setting up point-to-point VPNs.

Ben A: 00:41:39.107 And then anything from yourself, Chris, for any tips of learning more about-- or maybe another question would be asked like-- let's say I have a development team, actually my own development team, and we have like 10 different ways of setting up local certificates. How can I kind of convince my team that using a tool like the localhost would be a good option to make people more productive?

Chris S: 00:42:07.375 To answer the first one, so just turn it in two parts. First, similar to what Ben was saying, I think one of the best ways to do is just start doing this stuff, start playing with it, using it. Don't try and figure out how encryption works by deploying encryption to production. It's going to be hard. That's not the place where you want to do some experiments. You want to do that locally in your dev environment. You want to be able to change things and see what happens. So set up encryption locally. I think lcl.host is a very fast and easy way to do it. It removes all the barriers of all the different things you have to learn. And then you can just start diving in from there once you have it set up. And once you have a developer or two on your team doing that, I think what you'll find is that, hey, we've set up our application now so that it can serve HTTPS traffic, even if the thing that's terminating the connection from the browser or the client on the public web at the edge is already terminating TLS. Everything on the back end, you still want to have that be encrypted so the app itself can serve this TLS traffic.

Chris S: 00:43:13.656 Once you follow the lcl.host setup guide, your application is configured to work with ACME, and it will automatically grab a certificate when you boot it up. This is a really straightforward way. I guess, I would say challenge people to think about, what's the better way of like, "Hey, I have an application. It's already baked into the code base. There's a new developer joining the team or existing developers. I can just download the repository. I can provision the EAB tokens, the ACME tokens that you need to connect to ACME, fire up the app. And it just has encryption, and it's the same across the board for everybody." So you're not like, "Well, this was a nuanced way for developer one versus a nuanced way for developer two and how they set it up, and they're running into different issues and things like that." And then from there, as you deploy things out into stage in production, your app's already set up and configured if you need the TLS and stage in production to work with ACME and directly service certificates so you can get full end-to-end encryption.

Ben B: 00:44:14.077 Yeah. A lot of these tools that developers use are really focused on how to set up certificates in their immediate environment. How do I get a certificate in this folder right now? lcl.host is really focused on applications. It's focused on making sure that your application server is going to get a certificate when it starts up. Your team, everybody's got their own favorite way of getting a certificate onto their development machine. lcl.host is more focused on, okay, how do we take this application and get it ready for ACME so that it doesn't matter if it's on my machine or anybody else's machine, it's still going to do the same ACME workflow to get a certificate? So it's much more app-centric than individual developer tool-centric.

Chris S: 00:44:56.286 Yeah. Totally. Thanks for bringing that up. And I think the way to think about this - right? - is what do developers care about? Is it like, oh, they want to know everything and dive into encryption and how to manage and set that up? No. It's like, "Hey, I'm building this application."

Ben A: 00:45:10.931 It sounds like a yak shave to me.

Chris S: 00:45:12.584 Yeah. It's like I'm building this application. This is the thing I know about is the app. So let's take an app-centric approach and center everything around the thing that you're interacting with and building.

Ben A: 00:45:24.342 Cool. Let's have one closing question. I know we have lots of practical tips already. What's maybe one other sort of practical tip for teams looking to encrypt the infrastructure?

Ben B: 00:45:35.569 I think my practical tip is-- I think it helps to take an adversarial approach to thinking about this stuff, meaning zero trust architecture is big right now. Chris talked a little bit about you need multi-layer security. You can't just add a VPN and then expect everything behind that VPN to be secure. As we're moving into a world where you really need different layers, security at different layers, figure out what you're trying to secure. Is it data in a database? Is it an event stream or some sort of Kafka job queue? I don't know, something like that. Figure out what business value you're trying to secure. And then think about things as an adversary and how they would try to get at that very valuable data. Maybe that means thinking about, okay, would they start with getting into the VPN? And then where can they go once they have VPN access? Or maybe it's some sort of cores or some sort of-- they can get in through the front door with the API. And once they have API access, can they somehow get a root access on the box? And then once they have root access on one box, where can they go from there? So I think it helps to really put yourself in the shoes of this theoretical actor who's trying to get at this data. And then you can design for maybe a more realistic or just a better scenario as opposed to kind of going kind of the other direction.

Chris S: 00:47:04.499 Yeah. And I think one thing I would add to what Ben just said is a little bit of orthogonal to what he's saying is made famous by AWS, was a chaos monkey. So rip a thing out of the infrastructure. And I don't think you have to necessarily go to that extreme, but think about it in terms of not just, how do I get at this, which is what Ben was saying, but what happens if this key leaks? We would have to rotate it. Can I rotate a key without taking down our infrastructure? Do I know how to rotate a key? Can I do it quickly without causing a massive outage or data loss?

Ben B: 00:47:39.098 Absolutely.

Ben A: 00:47:39.844 It's a bit like test your backups before you--

Chris S: 00:47:42.313 Yeah. Test your backups. Can I renew a certificate? All of these things, the first time that you are rebuilding something from a backup, you don't want that to be right after a massive outage. And you're doing it live for the first time with a bunch of real live customer data. That's scary. And yeah. That's not when you want to be doing that. So practice those things. Practice makes perfect.

Ben B: 00:48:06.421 Otherwise, you'll get woken up in the middle of the night because a certificate expired.

Chris S: 00:48:10.226 Exactly. And you're going to have to be doing it then. And that's not when everyone's thinking clearly - right? - in the middle of the night where they're thinking through all the edge cases. You want that to be like muscle memory when it happens. Just be like, "Oh, we already know we can do all of this stuff." So yeah. Practice it. Think through the scenarios where something could happen and what would you have to do? Can you roll your entire fleet? Things of that nature.

Ben A: 00:48:32.301 Yeah, I like that you close the loop there, Ben, on being woken up with the certificate being revoked. I actually think, throwing back maybe like a decade ago, with my ops guy, and we're trying to think about, what's the correct way of concatenating two intermediate certificates that you get from DigiCert back in the day to actually make them work and then try to reboot the server to kind of keep things going? Always fun times.

Chris S: 00:48:55.999 Yeah. As I say, it's always DNS except for when it's an expired cert. [laughter]

Ben A: 00:49:01.462 All right. I think that's a great place to end. Thank you, Chris. Thank you, Ben, for joining us today.

Ben B: 00:49:07.231 Awesome. Thanks, Ben.

Chris S: 00:49:08.255 Yeah. Thanks for having us on.

Ben A: 00:49:12.117 This podcast is brought to you by Teleport. Teleport is the easiest, most secure way to access all your infrastructure. The open-source Teleport Access Plane consolidates connectivity, authentication, authorization, and auditing into a single platform. By consolidating all aspects of infrastructure access, teleport reduces attack surface area, cuts operational overhead, easily enforces compliance, and improves engineering productivity. Learn more at goteleport.com or find us on GitHub, github.com/gravitational/teleport.

Try Teleport today

In the cloud, self-hosted, or open source

Get Started View developer docs

The Teleport Access Platform

Featured Resource

Integrations

Featured Resource

Use Cases

Featured Resource

Industries

Featured Resource

Compliance

Featured Webinar

Strategic Partners

Featured AWS Webinar

Read

Experience

Discover

Featured Blog Post

Company

Explore

Featured Webinar

Try Teleport today