Home - Teleport Blog - How to Use Teleport Machine ID and GitHub Actions to Deploy to Kubernetes Without Shared Secrets - Oct 10, 2024
How to Use Teleport Machine ID and GitHub Actions to Deploy to Kubernetes Without Shared Secrets
We are living in the era of Kubernetes. It is hard to find anyone who has not heard of it and in all likelihood you are using it, too. And if you are using Kubernetes, it is probably also safe to assume that you are using CI/CD to deploy your applications into it. However, as CI/CD and Kubernetes have grown in popularity, the number of bad actors looking to exploit weaknesses in them has grown too. It is critical that modern security techniques are used to keep your Kubernetes clusters, and the sensitive data and services within them, safe from attack.
In this blog post we will dive into why CI/CD pipelines are such appealing targets, the vulnerabilities that bad actors try to exploit in CI/CD pipelines and how tools like Teleport Machine ID can be used to prevent and detect exploitation.
The Problems with CI/CD and Kubernetes
Before we can look at how to strengthen the security of CI/CD pipelines that deploy to Kubernetes, we first need to understand why they are such an attractive target and why they pose such a significant risk.
In order to be able to deploy to a Kubernetes cluster, the CI/CD system needs some kind of credential. To be useful, these credentials require a high-level of privilege - the ability to create, read, update, and delete resources within your Kubernetes cluster. This is often a higher level of privilege than might be afforded to your engineers on a day-to-day basis.
In the wrong hands, this level of privilege can be used to wreak havoc. Not only could this privilege be used to extract secrets from your cluster, it could be used to deploy malicious services into your cluster intended to disrupt operations or intercept sensitive customer information. It is easy to see why this would be so appealing to a bad actor!
To make the credential available to the CI/CD system, it could be locked away in some form of secrets manager. This is certainly better than committing the credential unencrypted into a git repository, but still has its flaws. Even with the best secrets manager, there’s still plenty of opportunities for this secret to be stolen:
- The laptop of the engineer who first generated the credential could have been riddled with spyware - meaning the secret was exfiltrated before it was even put into the secrets manager!
- A tool used in your CI pipeline could suffer a supply chain attack. This isn’t hypothetical - in 2021, CodeCov suffered an attack that inserted malicious code into the tooling they provided for customers to run in their CI/CD pipelines. This code uploaded the environment variables within the pipeline to a server controlled by a bad actor - exfiltrating the secrets of thousands of organisations.
- An attacker could infiltrate your logging system - where the secret has been unknowingly included as part of the CI run logs.
You can mitigate these risks in a variety of ways, but ultimately, the core of the problem is the long-lived and shared nature of the secret. Even if these secrets are rotated on a weekly or monthly basis, an attacker has plenty of time to make use of them before they expire.
Fortunately, there’s a solution. Many modern CI providers have begun to issue short-lived identities to the workloads that they run. These identities are typically represented with JWTs that include claims that identify the CI run itself, including details such as the branch it is running against and which user triggered it. The tokens are signed by the CI provider, allowing third parties to verify they are legitimate and trust the details contained within. This provides a foundation for an authentication strategy based on public key cryptography rather than long-lived shared secrets.
Securing CI/CD Pipelines with Teleport Machine ID
Let us take a look at how Teleport Machine ID can be used to securely access a Kubernetes cluster from GitHub Actions. we will see how the implementation removes the use of long-lived secrets and provides additional benefits in terms of auditing capabilities and fine-grained access control.
For our example, we will be taking a GitHub repository with a basic set of Kubernetes manifests in it. Our goal will be to apply these to our Kubernetes cluster on each push to the main
branch.
Before we get started, you will first need to enroll the Kubernetes cluster into your Teleport cluster. This can be done using the Teleport UI through the “Enroll New Resource” wizard or by following the steps in our documentation: following the steps in our documentation:
Configuring Kubernetes RBAC
Before configuring Teleport, we will need to configure our Kubernetes cluster itself.
First, we will create a Role which will be used to grant our CI/CD the ability to create and update resources within the cluster. we will follow the principle of least privilege and ensure that this Role only grants access to modify the kinds of resource that we expect it to need to modify, and use a Role rather than a ClusterRole to scope these permissions to a specific namespace.
Create ci-role.yaml
and apply it using kubectl apply -f ./ci-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: github-actions-blog-demo
name: github-ci-deploy
rules:
- apiGroups:
- apps
# Restrict the role to Pods and Deployments, since that's what our CI
# system is updating. If your CI needs to create and modify other kinds,
# then this list should be expanded.
resources:
- pods
- deployments
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
Next, we will need to grant this Role to our CI/CD workflow. Later, we will be configuring Teleport to impersonate the github-ci
group when forwarding requests from our CI/CD workflow. We can use a RoleBinding to configure Kubernetes to grant the Role to this Group.
Create ci-role-binding.yaml
and apply it using kubectl apply -f ./ci-role-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: github-ci-deploy
namespace: github-actions-blog-demo
subjects:
- kind: Group
# This will match the `kubernetes_groups` in the Teleport role
# we will create later.
name: github-ci
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
# This should match the name of the Role we just created.
name: github-ci-deploy
apiGroup: rbac.authorization.k8s.io
Creating a Teleport Role
Now we will move on to creating a role within Teleport. This will be used to grant our CI job access to specific Kubernetes clusters within Teleport, and to specify which groups will be impersonated when proxying requests to the Kubernetes API.
Teleport also allows you to further restrict access to Kubernetes kinds, namespaces or even a resource with a specific name. This restriction is applied on top of the grants made in the Kubernetes Role. Whilst we aren’t using this feature in our example, this can be useful for applying a policy to a group of Kubernetes clusters without needing to manage the RBAC within each of those individual clusters.
Create role.yaml
and apply it using tctl create -f ./role.yaml
kind: role
version: v7
metadata:
name: github-actions-blog-demo
spec:
allow:
# Grant access to all the Kubernetes clusters enrolled into
# the Teleport cluster.
kubernetes_labels:
'*': '*'
# Specify which group should be impersonated when proxying requests
# to the Kubernetes cluster. This should match our RoleBinding configured
# inside Kubernetes.
kubernetes_groups:
- github-ci
# Specify which resources can be accessed through Teleport. In this example,
# we're using wildcards, but this can be used to apply additional restrictions
# on the resources granted by the Kubernetes Role.
kubernetes_resources:
- kind: "*"
namespace: "*"
name: "*"
Creating a Bot and Join Token
Next, we will create the Bot and Join Token in Teleport that will be used by our CI job.
A Bot is a special kind of user intended to represent a machine’s identity within Teleport. They have a number of key differences.
Humans typically log in with using something like a Passkey or via a SSO provider with SAML, however, these methods are not well suited to machines as they require some degree of interaction.
Instead, machines authenticate to Teleport using the joining process. This allows them to exchange a short-lived identity document signed by the platform they are running on for a Teleport certificate. These identity documents contain a variety of claims that specifically identify the workload, and since the document is signed, Teleport is able to trust these claims. This process eliminates the need for the problematic long-lived shared secrets we discussed in the introduction.
For example, a GitHub Actions ID Token includes information including which CI job is running, the repository it is running in, and the branch it is running against. This information can then be used to determine whether or not the CI job should be allowed to authenticate, and, included in audit logs to allow actions to be traced back to a specific CI run.
The Join Token resource specifies rules for the joining process and the bot it should grant access to. In our case, Let us allow a GitHub Actions Workflow running in our repository and against the main branch to join.
Create token.yaml
and apply it using tctl create -f ./token.yaml
kind: token
version: v2
metadata:
name: github-actions-blog-demo
spec:
roles: [Bot]
join_method: github
# This is the name of the Bot that the join token will grant access to.
bot_name: github-actions-blog-demo
github:
allow:
# Access will only be granted to a GitHub Actions workflow
# running in the `strideynet/machine-id-github-actions-kubernetes-demo`
# repository.
- repository: strideynet/machine-id-github-actions-kubernetes-demo
ref_type: branch
# Limit authentication only to GitHub Actions runs against the main
# branch.
ref: refs/heads/main
Now, we can create our Bot, specifying the role and the join token we have created:
$ tctl bots add github-actions-blog-demo --token github-actions-blog-demo --roles github-actions-blog-demo
Creating a GitHub Actions workflow
Finally, we can create our GitHub Actions workflow. we will want this to run on each push to our main
branch and deploy the Kubernetes manifest that is located in manifests/deployment.yaml
.
Teleport provides a number of off-the-shelf GitHub Actions to simplify using Teleport in GitHub Actions workflows.
The first action we will use will be teleport-actions/setup@v1
, this installs the Teleport binaries within the environment of the CI run, allowing them to be invoked by later steps.
The second action we will use will be teleport-actions/auth-k8s@v2
. This action will use the Machine ID agent, tbot
, to authenticate to Teleport and generate a kubectl
configuration file. This configuration file will use the short-lived credentials produced by tbot
to connect to Kubernetes clusters protected by Teleport.
Create .github/workflows/deploy.yaml
:
name: "Deploy!"
on:
push:
branches:
- main
jobs:
deploy-to-kubernetes:
name: Deploy Kubernetes manifests using Teleport Machine ID and Kubectl
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- name: Install Kubectl
uses: azure/setup-kubectl@v4
- name: Install Teleport
uses: teleport-actions/setup@v1
with:
version: 16.4.0
- name: Authenticate with Teleport
# https://github.com/teleport-actions/auth-k8s
uses: teleport-actions/auth-k8s@v2
with:
# Specify the publically accessible address of your Teleport proxy.
proxy: example.teleport.sh:443
# Specify the name of the join token for your bot.
token: github-actions-blog-demo
# Specify the length of time that the generated credentials should be
# valid for. This is optional and defaults to "1h".
# Here we've limited it to 10m as this CI job doesn't need longer.
certificate-ttl: 10m
# Specify the name of the Kubernetes cluster the credentials will be
# generated for.
kubernetes-cluster: my-cluster
# Enable submission of anonymous usage telemetry to Teleport.
# See https://goteleport.com/docs/reference/machine-id/telemetry/ for
# more information.
anonymous-telemetry: 1
- run: kubectl apply -f ./manifests/deployment.yaml
Testing It Out
With everything in place, we can now test that our workflow functions correctly. Commit your changes to the main
branch and push this to GitHub. This should trigger the workflow to run.
Check the Actions tab in your repository to check the status of your deployment - with any luck, it should succeed! It is worth also checking your Kubernetes cluster to check that the manifests deployed by the workflow are correct.
One of the other advantages of using Teleport is the audit log. Access to resources is recorded to the Teleport audit log, which can then be shipped to your log management or SIEM tool to detect and alert on unusual or suspicious activity. This plays a vital role in quickly reacting to any potential breach.
Now, let us take a look at the audit log entries for the deployment. Log into Teleport, and browse to Access Management and then Audit Log.
You should be able to see multiple audit log events related to the CI/CD deployment that just occurred. For example, you will see the “Bot Joined” event that relates to the bot’s initial authentication, and you will see “Kubernetes Request” events for each of the requests it made to the Kubernetes API.
Clicking the “details” button will reveal the JSON body of the audit event. This contains all the details that Teleport has captured. Now, let us dive into the “Bot Joined” event.
{
"addr.remote": "4.227.115.136",
"attributes": {
"actor": "strideynet",
"actor_id": "16336790",
"base_ref": "",
"environment": "",
"event_name": "push",
"head_ref": "",
"job_workflow_ref": "strideynet/machine-id-github-actions-kubernetes-demo/.github/workflows/deploy.yaml@refs/heads/main",
"ref": "refs/heads/main",
"ref_type": "branch",
"repository": "strideynet/machine-id-github-actions-kubernetes-demo",
"repository_id": "851158547",
"repository_owner": "strideynet",
"repository_owner_id": "16336790",
"repository_visibility": "public",
"run_attempt": "1",
"run_id": "10772591440",
"run_number": "9",
"sha": "6027506141d4b441b05ef3c99ffcee74f1ad4365",
"sub": "repo:strideynet/machine-id-github-actions-kubernetes-demo:ref:refs/heads/main",
"workflow": "Deploy!"
},
"bot_instance_id": "615788b8-fc2b-4de9-ac0b-b34ebc64e7dc",
"bot_name": "github-actions-blog-demo",
"cluster_name": "noah.teleport.sh",
"code": "TJ001I",
"ei": 0,
"event": "bot.join",
"method": "github",
"success": true,
"time": "2024-09-09T11:51:33.582Z",
"token_name": "github-actions-blog-demo",
"uid": "b653642e-0123-4996-a1c5-b73b605c326a",
"user_name": "bot-github-actions-blog-demo"
}
We can see several key pieces of information here that allow us to track this back to a specific run of our CI/CD pipeline. We can see which commit it ran against, the user who triggered the run, and the run ID. This gives us the information we need to analyse any unexpected behaviour - whether caused by a bad actor or merely a misconfiguration.
How It Works
Now that we’ve set it up, Let us explore why this works and how this replaces long-lived secrets.
As discussed earlier, the GitHub Actions issues short-lived OpenID Connect (OIDC tokens), to each CI/CD run. These are JWTs that contain claims that identify the specific run, such as which repository it resides in, which branch it is running against and which workflow is running.
Public-key cryptography is used to then produce a signature over these claims. This allows any third-party with knowledge of the public key used by the issuer to validate that the JWT, and the claims within, is legitimate. The third-party can then trust and act on the value of these claims.
It is common practice for the issuers to publish their public keys. In the case of GitHub Actions, they are published to https://token.actions.githubusercontent.com/.well-known/jwks.
Teleport can be configured via a join token to allow authentication using one of these GitHub Actions ID tokens. During the join process, the Bot submits its ID token. The Teleport Auth Service can then verify this ID token using the public key published by GitHub, and then validate the claims within the token against the rules configured within a join token. If the token passes the rules, then a Teleport X.509 certificate is issued to the bot.
A Teleport X.509 certificate allows connections to be made to protected resources through the Proxy Service. This directs the connection to the Teleport Agent responsible for that resource via a Reverse Tunnel, which enables connectivity in situations where the client is not able to directly connect to the resource (for example, the firewall rules do not allow ingress traffic).
The Teleport Agent verifies that the X.509 certificate was issued by the Auth Service, and then ensures that it contains an identity that has been granted access to the resource. As the Agent manages the connection, it is also then able to record sessions and submit audit events for actions taken using Teleport.
Overview
It is clear that long-lived secrets are a critical weakness in CI/CD pipelines and in an era of more sophisticated attacks, alternatives like short-lived OIDC ID tokens and federation of trust should be explored. In this blog post, you’ve seen how Teleport Machine ID can help you leverage these new techniques and earn additional benefits such as detailed audit logging and fine-grained access control.
You can view the full source code for the example given in this blog at https://github.com/strideynet/machine-id-github-actions-kubernetes-demo
Tags
Teleport Newsletter
Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.