thredUP is one of the world's largest online resale platforms for women's and kids' apparel, shoes, and accessories. With a mission to inspire a new generation to think secondhand first, the company has spent the past 10+ years reinventing resale. By building a marketplace and infrastructure now poised to power the $15 billion resale economy, thredUP is changing the way consumers shop and ushering in a more sustainable future for the fashion industry. Millions of consumers rely on thredUP as the easiest way to sell their clothes and shop over 35,000 brands at up to 90% off estimated retail price. Some of the world's leading brands and retailers are also leveraging thredUP's Resale-as-a-Service to deliver customized, scalable resale experiences to their customers. thredUP, founded just a year after AWS in 2009, has lived through the transition to cloud, going from monolithic to microservices before it was cool. In 2017, that migration included ditching handcrafted servers for Kubernetes orchestration. One year later, thredUP was entirely on Kubernetes, seeing massive reductions in costs and deployment times. To keep a close eye on resources, the infrastructure team built an in-house solution to grant access through the powerful Kubernetes RBAC API. As the company grew, the infrastructure team found themselves dedicating more time and hours just to keep their tool functional. That's where Teleport came in.
When thredUP deployed their services on Kubernetes, the default method to access development environments shifted from
kubectl. While the built-in RBAC API was definitely an improvement, the infrastructure team needed a new way for engineers to access Kubernetes clusters. Wanting to keep with security best practices, they built an in-house service that kept them in control, programmatically creating client kubeconfigs from AWS IAM roles.
A user would access a Kubernetes cluster by following these steps:
This custom solution lets the infrastructure team fully manage all kubeconfigs, maintaining visibility and control over user activity. But, as thredUP continued scaling up, bottlenecks started to appear.
The limited scope of the custom solution meant the SRE team was spending significant time onboarding engineers and troubleshooting issues.
Going one step further, thredUP security policy dictated that client-side AWS access keys be rotated regularly. Eventually, the infrastructure team had to dedicate team time to maintain and troubleshoot. The tool did a great job of tightening auth but was eating into developer time.
An upcoming SOC 2 assessment forced the team to address its growing problem. Their access controls would likely pass muster, but they didn’t want to leave anything to chance. Specifically, they were looking for even finer access controls, out-of-the-box setup, and increased visibility.
Teleport allows for instant access to a Kubernetes cluster through single-sign on (SSO) by mapping user attributes from an identity provider directly to the
ClusterRole Kubernetes objects that scope permissions. In other words, when a thredUP employee requests access to a Kubernetes cluster, her upstream group and role from OneLogin has already been translated into rules that the RBAC API can interpret. Within Teleport, the Proxy Service reads identity attributes through SAML or OAuth/OIDC and translates them into Teleport Roles, which are, in turn, mapped to Kubernetes Subjects, like Jane in Figure 3. By using Teleport as their authentication gateway, IAM roles were removed from the equation altogether, simplifying RBAC to an SSO workflow.
Not only did the SSO-to-Kubernetes integration eliminate much of the maintenance work, but employees could be onboarded much quicker. Just like new hires can immediately use basic workplace tools on day one, Teleport does the same for infrastructure resources.
Infrastructure security best practices call for centralizing audit logging and monitoring - analytics are only as powerful as the data that's being fed. thredUP's internal solution gave them a good look at who might have been inside a node, but not what they might have done. When something breaks, pinging an SRE to hunt down a lead is suboptimal. They needed the audit logging and session recording features.
The Teleport auth server keeps an audit log of various Kubernetes events (Figure 2). With this, thredUP is not only able to gather metadata like login and session starts, but can also capture and replay anything that is echoed in their terminal (Figure 4). Audit logs, bundled in JSON, could be easily shipped off to a SIEM or logging tool. Now, problems could be easily triangulated by searching through a history of
With our SOC 2 audit, and likely future compliance requirements, using Teleport for high fidelity record-keeping bolstered our risk assessment and response competency.
The thredUP infrastructure team has seen tremendous utility from using Teleport:
thredUP continues to see added value from each new Teleport release, from access workflow allowing administrators to grant real-time privileged access through Slack or the Kubernetes enhancements that allow for
kubectl events to be logged. thredUP is now rolling out a Database Access solution following the success with Kubernetes and SSH Access.