Overview
For this 19th episode of Access Control Podcast, a podcast providing practical security advice for startups, Director of Developer Relations Teleport Ben Arent chats with Nikhil Jha. Nikhil is the site manager of the Berkeley Open Computing Facility (OCF). OCF is an all-volunteer student-run organization that provides access to compute resources and that has been around since 1983. This episode dives into the role and services of OCF, and how they use open-source Teleport.
Access Control Podcast: Episode 19 - University Access Control
- OCF is dedicated to free computing for all University of California, Berkeley students, faculty, and staff.
- OCF members of today are the DevOps purchasing decision makers of tomorrow. Alumni work at every major tech company, and often transfer the key technologies of OCF's tech stack to their future Fortune 500 employers, most notably at Yelp and Facebook.
- OCF chose Teleport when UC Berkeley decided that SSH would no longer be allowed if the SSH access was password only.
- OCF is migrating away from its legacy tech stack which runs on plain SSH to Teleport, and found deploying Teleport really simple.
- The future of OCF can take two paths: providing more technical services and providing more user-facing services.
- The guest's practical advice for university labs and startups seeking to enhance access control is twofold: firstly, ensure the implementation of access control measures from the outset, and secondly, invest effort in effectively segmenting access control.
Expanding your knowledge on Access Control Podcast: Episode 19 - University Access Control
- Teleport Application Access
- Teleport Machine ID
- Teleport Getting Started
- Teleport Access Platform
- Why Teleport
- Get started with Teleport
Transcript
Ben: 00:00:00.532 [music] Welcome to Access Control, a podcast providing practical security advice for startups, advice from people who've been there. Each episode, we'll interview a leader in their field and learn best practices and practical tips for securing your org. For today's episode, I'm chatting with Nikhil. Nikhil is the site manager of the Berkeley Open Computing Facility. The OCF is a student-run organization and provides access to compute resources and has been around since 1983. I recorded this podcast in person and I'm still getting my own site recording dialed in. Welcome, Nikhil. Thanks for joining us today. Can you tell me a little bit about yourself?
Introducing the guest and OCF
Nikhil: 00:00:33.642 Hi, I'm Nikhil. I am the site manager for the Open Computing Facility, so that kind of means I decide the technical direction of the organization, what services we provide, and more importantly I guess for my role, is how we provide those services, what compute resources, and how do we keep all those compute resources secure and accessible. And I've been bouncing back and forth between this role and the general manager role, which is more of an administrative — dealing with the school bureaucracy and things like that.
Ben: 00:01:00.257 And how long has the OCF been around?
Nikhil: 00:01:02.519 The OCF has been around since, I think, 1983, somewhere in that range. So it's been around for quite a while, the early ages of the internet.
Ben: 00:01:09.930 Obviously, we're in Berkeley and you're in the computer science program. We were just talking before this — there's some prominent computer scientists in Berkeley. I don't know if you know of any projects or interesting things that have happened?
Nikhil: 00:01:20.477 Yeah, OCF is actually where the GNU Image Manipulation Program was created, and also vi was created at Berkeley, so quite a few notable software projects that have their roots kind of starting here. An anecdote that I like sharing is that the file format for the GNU Image Manipulation Program is .xcf and that's because the OCF was actually created out of something called the Experimental Computing Facility, which was XCF.
Ben: 00:01:44.869 Oh, very cool. So what did people do before? I mean, I guess in the '80s, there wasn't much computing resources available or servers. Was that why it was created?
Nikhil: 00:01:51.837 Yeah, pretty much. People needed a place to check their email, if email was even a thing there, or have access to a bulletin board or whatever, and they didn't necessarily have a computer at home or even in their pocket. There's an email chain that we have saved somewhere in the OCF where they were threatening to shut the OCF down because of a lack of space and there was just an outpour of support where it was, "Oh, I cannot check my email. Bad things are going to happen if the OCF is closed."
Ben: 00:02:16.866 And it's an all-student-run and orchestrated body?
Nikhil: 00:02:20.523 Yeah, we get funding from the university, as well as the ASUC, which is our student government. And we use those funds — entirely student-run. There's no paid staff involved here. We maintain all the infrastructure and provide useful services.
OCF running mirrors for popular software in the San Francisco Bay Area
Ben: 00:02:35.056 And I think I became aware of it — I was doing a weekend Raspberry Pi project, and I saw that the mirror that it had picked up was one of the OCF's mirrors.
Nikhil: 00:02:45.309 Yeah, mirrors are, funnily enough, one of our largest recruiting pipelines kind of because students who come to Berkeley and are already using Linux and they are looking through the list of mirrors on their distribution, sometimes they see OCFBerkeley.edu and they're like, "Oh, what's this?" And they'll come check us out.
How people are automatically opted into the nearest geo mirror
Ben: 00:03:02.863 And then for people who sort of aren't that familiar, can you just sort of describe what the service — or what mirrors do?
Nikhil: 00:03:09.374 Mirrors basically — people write open-source software, and they need to distribute binary versions of those open-source software so people can use them without having to compile everything. And the way that that distribution happens is they compile it once and upload it to their server, but there are many people around the world who need access to the software. So a bunch of people donate their bandwidth, basically, by making a copy of the upstream mirror, and then — or the upstream place where all the binaries are located, and then people who want the software instead of going to — all go to one server, they'll go to their nearest mirror, which for people in the Bay Area is often the OCF.
Ben: 00:03:45.514 And do you have your own IP space as well?
Nikhil: 00:03:48.088 Yeah, we do. It's a fairly small /24 v4, and then we have a /48 v6.
Ben: 00:03:54.484 That's pretty cool. And then bandwidth — this is on the university network?
Nikhil: 00:03:57.870 Yeah, so this is a fairly recent project where our main uplink is actually only gigabit because we like keeping our servers in our computer lab, which is an active student union space where there are a bunch of students walking around and stuff. So that only has gigabit uplink, but recently, we managed to get our servers into the campus data center where we take advantage of their 10-gig uplink.
Core services run by OCF
Ben: 00:04:17.812 Great facility to have. You run lots of open-source software. I know your GitHub, OCF — you have all of your projects there. Can you say what are some of the core services that you run?
Nikhil: 00:04:27.050 Most of the software that we write these days is dogfooding for our own infrastructure, so not a whole lot of use outside of that. Some of the projects that I'm particularly excited about are Transpire, which is our Kubernetes helper library. Basically, what it lets you do is you can plug in your own CI system, you plug in your own secrets management system, and then some config generation system, which you can either do entirely in Transpire through a Python domain-specific language, or you can plug in existing Helm charts and other config generation things that exist into it transparently, and it just combines all of these and makes sure that the state of a Git repository is the state of your Kubernetes cluster. And as far as I know, there's nobody that actually bothers to combine all the different possible things and gives you a nice packaged all-of-your-config-goes-in-one-place type thing. So that's pretty cool.
Ben: 00:05:17.926 With Argo CD pull, but Argo misses secrets and that aspect of it.
Nikhil: 00:05:22.315 Yeah. Argo CD solves the one specific problem. Actually, the way we have it set up right now is that for CD, we actually shell out kind of to Argo CD and that takes care of diffing the cluster and making sure the YAML that we have in Git repository is the same as what's in the cluster. But kind of the magic of Transpire is that you can just replace that with whatever you want. And you can have an entirely different system for content management or an entirely different system for CD. And in theory, it all works, but we are currently the only user, so. [laughter]
Ben: 00:05:49.081 But it's available, and it's on GitHub.
Nikhil: 00:05:50.532 Yep.
Why did you pick open-source Teleport to solve your problem?
Ben: 00:05:51.038 And then I know Teleport is another project that you run for accessing infrastructure. Can you talk about why you initially reached out to use Teleport to access infrastructure?
Nikhil: 00:06:00.054 Actually, it started when UC Berkeley decided that SSH would no longer be allowed if the SSH access was password only. It must be password plus 2FA or SSH keys. And we have a large user base of people who probably wouldn't know how to set up an SSH key, even if we give them documentation. I mean, they're not going to run SSH-keygen, and then we also need to build some system for adding SSH keys. And either way, two-factor authentication is probably something that's — what we want.
Ben: 00:06:28.484 And then what were these people doing on these boxes?
Nikhil: 00:06:30.916 Yeah, people need to access these boxes to access their files or update their website, for example. So yeah, there's a lot of use on this, and a lot of people picking really bad passwords and a lot of emails from UC Berkeley security saying, "Your website is compromised," and we're like, "We know. Well, it's not our website, it's our users' websites because they use bad passwords and have outdated WordPress," and so on.
Ben: 00:06:53.971 So that's a service that you provide if anyone wants — it's kind of like a virtual hosting, I guess you can kind of [crosstalk]?
Nikhil: 00:06:57.997 Yeah, it's virtual hosting. It's pretty much your typical web stack web host. 99% of people just do WordPress, but we also offer actual containerized application hosting for people who need that.
Ben: 00:07:13.445 People who were on the journey of replacing — or you were saying passwords aren't good, and then you're like, "What are the other options?" How did sort of Teleport solve that access problem?
Nikhil: 00:07:22.369 So Teleport kind of just has two-factor out of the box because we can plug it into our authentication system and our authentication system already has 2FA that was very nice out of the box. Other features that turned out — they were not the reason that we initially wanted Teleport, but they turned out really nice, were the audit logging capabilities. We haven't needed them for security yet, but just because, "Oh, what did I type in last week?" That is something that you can do by looking at an audit log. Another feature that we really like is collaboration. Whenever we're working on infrastructure, having multiple people access the same shell without messing with the team with shenanigans is actually also very nice.
Infrastructure concerns
Ben: 00:07:56.752 And then what are some of the other infrastructure concerns you have?
Nikhil: 00:08:00.186 I think I kind of alluded to this slightly earlier. People having outdated WordPress. We need to make sure that they don't get compromised websites, but at the same time, we want to give them control over their own WordPress. So it's kind of a fine line between, "Oh, we're just going to automatically update and rake your website," and also, "We need to give you some control."
Ben: 00:08:17.684 For when you run mirrors, are there any concerns of software supply chain attacks or people trying to upload a malicious package to the Berkeley mirror, for example?
Nikhil: 00:08:26.613 Yeah, I would say there are. Our mirrors are actually isolated from the rest of our infrastructure, kind of by design. I guess now they're all in Teleport, so Teleport is a single point of failure here.
Top security concerns
Ben: 00:08:37.268 And then any other concerns that you have from a security perspective? Well actually, who's responsible for security?
Nikhil: 00:08:43.061 I mean, the short answer is we are.
Ben: 00:08:44.786 It's a collective team?
Nikhil: 00:08:46.202 Yeah, the long answer is it doesn't matter a whole lot if one of our Linux boxes is now running untrusted code for a little bit. We very much do not rely on trust for the user code for any of our other infrastructure. But also, these are running on UC Berkeley IPs, and those have trust associated with them, especially for accessing academic papers. Journals trust the Berkeley IP space. So we have some tooling there to disallow and detect that stuff so that they don't get mad at us.
Ben: 00:09:16.838 Yeah, because it has a strong sort of domain authority in the IP space that you kind of already have. So I guess it seems you have quite an open playground in which people can experiment, run stuff, kind of like when you go to university. It's a place to experiment and maybe break some stuff, but within the guardrails, it keeps stuff pretty secure.
Nikhil: 00:09:37.013 Yeah, I think that's a pretty valuable thing to maintain. And it's one of the reasons that we haven't just — I mean, the ideal situation is, "Oh, we just manage everything for everyone, and you click a button and you get a website and we don't allow you to run any of your own code." And that's probably a lot better from a security perspective, but that's not really the problem we're trying to solve.
Ben: 00:09:53.880 Yeah, yeah, because you've got to learn on your LAMP stack. You can get a long way with some very questionable PHP.
Nikhil: 00:09:59.819 A lot of people are writing very questionable PHP and hosting it at this, yeah. [laughter]
Migration from legacy tech stack to Teleport and benefits gained from the transition
Ben: 00:10:04.163 You also were going through a migration process from a legacy stack to Teleport, and I think this was also to do with the mirror. Can you talk about the benefits you've gained during that transition from your legacy stack to the new one?
Nikhil: 00:10:16.253 Deploying Teleport was actually really simple, so that wasn't an issue. The hard part here is that we didn't already have two-factor in all the user accounts, so getting, I guess, 65,000 active accounts and many hundreds of thousands more active accounts going back — or hundreds of thousands inactive accounts, getting them all in 2FA or finding a way to disallow that is the actual hard problem. And I guess where mirrors come in, as to why Teleport was useful, is the mirrors box now runs on a different subnet from our primary subnet, and a lot of our infrastructure is hard coded to assume that machines will be on our primary subnet, and that includes all of our SSH infrastructure and we don't need to worry about that when Teleport is the SSH infrastructure.
Ben: 00:10:59.699 So a little bit of network isolation.
Nikhil: 00:11:01.967 Yep.
Ben: 00:11:02.653 Just to get a little bit deeper in with the migrating people from a second factor or inactive accounts. I think this is an interesting problem that many sort of organizations face, and I think we still have seen the phases of — people start with maybe SMS, then it's short token, then it's YubiKey, and sometimes it's opt-in or maybe it's a biometric like Touch ID. For the inactive accounts, do you have a grace period or what is sort of the enrollment to these different technologies and the pros and cons of them?
Nikhil: 00:11:33.638 So we have trusted endpoints so those are our actual physical machines. We figure chances are nothing bad is really going to happen if someone is physically in the lab and is logging in on a computer, so we're not bothering with 2FA for there, for now. YubiKeys are also never going to happen because we provide the service for free with a very, very tiny budget and we couldn't give people YubiKeys and they're also not going to buy their own. SMS is also out of the picture for the same reason. We have to pay Twilio or some company to send out those two-factor codes. So kind of what we've settled on is just your typical Google authenticator or email-based 2FA, which is almost not 2FA because you can also reset your password so it's kind of 1FA again. But the benefit with that, in our specific case, is all the emails are Berkeley emails and those emails have a 2FA thing associated with them already via Duo the university pays for and we don't have to worry about. So it's kind of 2FA, and we'll take it for now.
Ben: 00:12:33.766 And do alumni get access too, or is it just current students?
Nikhil: 00:12:37.036 Yes, alumni also get access. So it's an entire other set of challenges because they're not actively subscribed and log in once every four or five years and they'll be confused when things are different, but we'll deal with that problem when we get there.
Ben: 00:12:50.591 On the point of the computers in the room being heavily trusted, I know Coinbase for a while had an SSH room and they had a heavily protected room that you'd go into that had super admin privileges with cameras.
Nikhil: 00:13:02.870 That's interesting, yeah.
Ben: 00:13:04.837 All these different combinations of how you get access to systems, and different physical security too can also help increase it. I mean, there's always the case that someone can always walk to a server rack and get access to it.
Nikhil: 00:13:16.819 Yeah. I mean, there's nothing extremely high security that we're protecting here. It's just random files and websites.
How OCF prioritizes which technology services to offer
Ben: 00:13:23.466 So how do you prioritize some of the technology services that you offer to the community and sort of what factors influence the decisions about what you're going to support next?
Nikhil: 00:13:31.636 So I guess historically, the purpose of the OCF was, "Here's a place where you can come check your email, come visit messaging boards and whatnot," and those days are long gone now. Everyone has multiple computers. Both of us have computers on our wrists right now and at least one in a pocket, one on the table. These days I think the kind of technological niche of the OCF is kind of a playground for students to get exposed to technology even if they're — not necessarily because they're CS majors, just because you are going to have to use many computers in your life and understanding how they work on a deeper level, even if it's just playing around with a computer, is probably very valuable. So that's one of the technical directions that we go in, and the other technical direction that we want to go in is providing more useful services. The era of you needing a website is probably almost over. I think most people are on platforms like Facebook or Yelp or whatever and they don't actually need to make a website, so website hosting is increasingly less relevant even. Finding useful services — for a while we hosted Mastodon. We actually shut that down the week before Elon Musk bought Twitter, which was extremely unfortunate timing. We might bring it back, also chat through Matrix. Things that people can actually use even if they're not that popular.
Ben: 00:14:50.291 You say you have a high-performance computing infrastructure to — dedicated people to run AI or CUDA jobs?
Nikhil: 00:14:56.769 Yeah. Most of these servers that are high-performance computing are actually not our AI labs at Berkeley. They have their own racks upon racks of DGXs or whatever they use for AI stuff. The people that we serve with our HPC infrastructure are the physics lab that kind of is just doing their first AI project or a bio lab. Everyone needs some kind of AI thing these days. Also unsurprisingly, a lot of the people at the OCF who are volunteer staff here are CS majors, so we know a lot about how CS courses are run and we try to help them out when possible. One of the things that they're running into right now is that they also need 2FA and they don't have it, so everyone has to connect to a VPN and then do password SSH from there and it's a whole thing because the VPN software doesn't work on Linux, and yeah, it's terrible.
Future role of the OCF and new initiatives on the horizon
Ben: 00:15:47.220 Yeah, I can imagine doing these multiple hoops to sort of just get your assignment done. It can be, I'm sure, frustrating. How do you see the role of the OCF evolving in the future and sort of what new initiatives and projects are you excited about?
Nikhil: 00:16:01.997 So I think this could go one of two ways, and the correct answer is it depends/both. The first way is — we start providing more technical services. We could theoretically provide virtual private server hosting for people who just want a Linux box and they want to do stuff on it, also more advanced container hosting, the kind of stuff basically, that a cloud provider would typically provide except for free to students. The other direction is more user-facing services. Kind of the stuff that I talked about earlier, like Mastodon or Matrix or things like that. So yeah, the correct answer is probably a little bit of both depending on how hardware donations go for certain companies that are considering giving us hardware. Again, we don't have a very large budget, so we rely on things like that. Depending on how those go, we can probably just do all of the above.
Ben: 00:16:45.928 Yeah. So it's student-led as to what is sort of impactful and what is sort of interesting for the students to run.
Nikhil: 00:16:52.613 One of the reasons that we do things is for the people who are running it, me and the other volunteer staff, to get exposure to computer infrastructure, so sometimes even if something is way overkill and just really complicated, we might as well do it because of learning.
What’s next after Berkeley
Ben: 00:17:05.737 Yeah, it's a good time to do it. And so after your time at Berkeley, what are you sort of looking to pursue next?
Nikhil: 00:17:12.099 I think I'm interested in computer infrastructure, and I will do something in that space. Past that, I'm not sure. Maybe more school, I don't know.
Ben: 00:17:20.701 Yeah, it's an interesting compute infrastructure. I think we're having this reemergence a bit in the industry. So we had the movement of consolidating on VMs and then we have the cloud and now we're sort of seeing the reemergence back into people's data centers, back to edge, back to smaller embedded devices. I mean, a self-driving car is a supercomputer on wheels with all those sensors attached, so the concept of what is infrastructure is a much broader term than the big blue IBM in your basement.
Nikhil: 00:17:51.289 Right. Yeah. Pretty much anything constitutes infrastructure and I think that's pretty exciting.
Ben: 00:17:56.046 Yeah, yeah. Definitely, I agree.
Nikhil: 00:17:57.937 As a company. [laughter]
Practical advice to other university labs and startups to improve access control
Ben: 00:17:58.495 Right, provides access to infrastructure. We always try to wrap up the podcast with one last question. What's one practical advice you'd give to other university labs and startups to improve access controls?
Nikhil: 00:18:09.489 I have almost a joke answer for this question which is actually have access control to begin with. I think for a lot of lower end, your typical small lab of 10 people or groups like us, kind of the ACL is you have all the permissions or you don't have any permissions, and I think there's definitely something to be gained by putting a tiny bit of effort into segmenting access control in that way. For example, you could have undergrads with less privileges than graduate students, or even past that, not all graduate students need full root access, maybe you request it on a per-time basis, that kind of thing.
Ben: 00:18:50.188 Yeah, yeah, just some basic access control is a good place to start. [music] This podcast is brought to you by Teleport. Teleport is the easiest, most secure way to access all your infrastructure. The open-source Teleport access plane consolidates connectivity, authentication, authorization, and auditing into a single platform. By consolidating all aspects of infrastructure access, Teleport reduces attack surface area, cuts operational overhead, easily enforces compliance, and improves engineering productivity. Learn more at goTeleport.com or find us on GitHub, github.com/gravitational/Teleport.