Teleport Blog - How We Built Machine ID - Apr 8, 2022
How We Built Machine ID
The DevOps workflow is all about automation driven by machine-to-machine access. To maintain the automated DevOps pipeline, engineers configure service accounts with credentials such as passwords, API tokens, certificates, etc. The issue is that engineers often fall into the security mispractice of creating long-lived credentials for service accounts to facilitate automation and lessen manual intervention. This is risky because compromised long-lived credentials allow unlimited access to adversaries.
Teleport Access Platform already enables an automated way to provision short-lived certificates for all infrastructure access requirements, including SSH, Windows, Kubernetes cluster, databases and web applications. However, the "automated" certificate management was only available to users and servers enrolled directly with Teleport. With Teleport 9 and Machine ID, we bring an entirely automated way to provision machine-to-machine access with a short-lived certificate, even for user accounts and server resources that are not directly managed by Teleport. Think of EFF's certbot for infrastructure access management at scale.
This blog post will explain how we've implemented Machine ID.
How does Teleport Machine ID work?
The core functionality of Teleport Machine ID is a certificate renewal bot that communicates with the Teleport cluster and securely renews access certificates in an automated fashion. The tool that facilitates this feature is implemented as a lightweight agent
tbot. Below are important points on
To handle certificate issuance and renewals, 'tbot' must first be authenticated with the Teleport cluster. This can be achieved with the following two methods:
Using AWS IAM join: A secure AWS IAM joining method based on AWS IAM identity verification which does not require provisioning security tokens to join Teleport clusters. This method works similar to our existing AWS Node joining method.
tbotjoined to the Teleport cluster using this method performs continuous re-authentication to receive new short-lived certificates.
Using one-time join tokens: During the initial setup, administrators need to manually authenticate
tbotwith the Teleport cluster using one-time join tokens. Upon successful authentication,
tbotwill receive a short-lived renewable certificate. Based on this renewal certificate,
tbotwill generate secondary end-user certificates (access certificates) and place them in a directory accessible by programs such as SSH clients, API clients, database clients, etc.
tbot supports and initiates certificate renewal in four scenarios:
- When some fraction of the certificate's TTL has elapsed and nearing its expiry.
- When user and/or host CA rotation is taking place.
- Manual certificate renewal request.
Security considerations for Machine ID
Considering the capability of
tbot to auto-renew certificates by design, a stolen certificate by a rogue service account or compromised machine can be used to renew certificates indefinitely, bypassing the security benefits of a small
ttl window. To counter this, we have incorporated the following primary security implementations to
Locking compromised tbot
In any situation where an administrator suspects a compromised
tbot, Teleport allows
tbot locking (i.e. session locking), preventing compromised
tbot from performing any certificate renewal requests, thus containing a threat quickly.
Separating renewable and access certificates
To prevent a situation where a compromised application could steal a renewable certificate and use it to renew certificates indefinitely,
tbot manages two distinct sets of certificates — access certificates and renewable certificates. Access certificates can only be used for authenticating with servers (e.g., OpenSSH server, CI server) and are re-issued rather than renewed. Renewable certificates are only used by tbot to authenticate with the cluster. By restricting the application's access to non-renewable certificates (which can be implemented with file access control in Linux), we can avoid the chance of getting the renewable certificate stolen while facilitating access using a separate access certificate.
Certificate generation counters to detect compromised certificate renewals
It is challenging to detect compromised certificates. Besides separating renewal and access certificates, we have also implemented certificate generation counters that would help us detect and lock out compromised
tbot certificates. The certificate generation counter increments each time a certificate is renewed. The counter values are attached to the certificate as a certificate extension and stored in Teleport’s backend. If a renewable certificate is compromised and renewed, the generation counter will not match when the next renewal takes place and Teleport will lock out the bot. For this reason, the shorter the renewal period, the more secure.
Given the sensitivity of the renewal process, we have added
tbot audit logging, which captures the events in the following cases:
- When a new
tbotuser is created
- When a new renewable certificate is issued for the first time.
- When a certificate is renewed
tbotis removed (
tctl bots rm)
- When a certificate generation counter conflict is detected (certificate possibly compromised)
tbot is assigned with a user and role (applying RBAC to bots), most of the audit events are emitted under that user and role. For example:
tctl bots addemits
tctl bots rmemits
- No specific audit events for initial cert or renewal, we just emit
user.updatewhenever a new cert is generated, which happens in both cases.
- Locks emit
lock.createdjust like user/role/etc locking works against human users
Where does Machine ID fit in modern cloud infrastructure?
The primary use case for Teleport Machine ID is to facilitate scoped and short-lived certificate-based access for machine-to-machine communications. This means Teleport Machine ID can be used to protect service account access to CI/CD pipeline, API server access, automated database access, and secure automated access by remote configuration management tools such as Ansible, Chef, Puppet, etc.
By assigning a short-lived certificate and supporting an automated certificate renewal process, Machine ID helps to enforce just-in-time access and just-enough-privilege principles to machine-to-machine access. For example, suppose a service account needs to access the CI/CD server between 9 AM and 6 PM. In that case, administrators can create a short-lived certificate that renews every 6 hours and then automates role assignments during working hours and revokes them during off-hours.
Additionally, all the security features of Teleport privileged access management can now be enforced on machine-to-machine access including RBAC and audits.
What's next for Machine ID?
With Teleport machine ID, we are a step closer in consolidating privileged access management requirements for both user-to-machine and machine-to-machine access. Teleport 9 is available from our downloads page. Read the documentation and watch an introductory video on Machine ID to get started. Join the Slack channel where Teleport users and developers hang out for community support.
Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.