Securing Infrastructure Access at Scale in Large Enterprises
Dec 12
Virtual
Register Now
Teleport logo

Home - Teleport Blog - Getting Rid of Shared Secrets: The Major Design Flaw of All CI Systems - Mar 8, 2023

Getting Rid of Shared Secrets: The Major Design Flaw of All CI Systems

by Noah Stride

Flaw of shared secrets in CICD

If you’re a developer, devops or security engineer whose continuous integration (CI) systems rely on shared secrets for access management, you probably know firsthand the security risks that shared secrets present. And you probably know what it means to say, as Microsoft CISO Bret Arsenault put it: “Hackers don’t break in, they log in.” As for how to eliminate those risks, let’s first look closely at how breaches caused by stolen shared secrets happen and then consider an open source tool developers are using to address this problem.

The serious security liabilities of shared secrets

CI systems and CD pipelines, such as Github Actions and CircleCI rely on secrets (API keys and SSH keypairs) to publish images, run builds and checkout code. This makes them a treasure chest for hackers who hunt the secrets in CI systems to pivot to production and steal customer data. This points to the inherent risk in shared secrets. After all, if secrets can be shared between two parties, they can ultimately be shared with the world.

PaaS like Heroku and Vercel suffer from the same issue. Take, for example, popular serverless website platform Vercel offering to store private keys in a built-in vault and exporting them as environment variables, as shown in the GitHub discussion below:

tettoffensive tweet
tettoffensive tweet
mcsdevv followup
mcsdevv followup

As you see, it’s as simple as:

$ cat firebase-private-prod.key | vc env add FIREBASE_PRIVATE_KEY production

And just like that, we added a serious security liability with one line of code.

Global impact of stolen secrets

Your fellow developers and security team would certainly agree — exfiltration, a fancy word for stealing, is not a theoretical threat. The potential for security breaches, once limited to security expert communities, has become mainstream knowledge.

In January 2023, CircleCI started a year with a warning to its customers returning from their holiday vacations to rotate all secrets.

Not so long ago, hackers stole OAuth user tokens from Travis-CI and Heroku applications and used them to steal code and secrets from private GitHub repos.

Once hackers steal the secrets, they pivot to all other systems, expanding the scope of the attack.

Here’s how it usually works:

exploiting CI/CD
exploiting CI/CD

The only long-term solution is to replace the shared secrets granting broad access to your infrastructure with scoped, limited credentials:

mcsdevv followup
mcsdevv followup

How scoped and limited are we talking? In the rest of this article, we will explore how far we can go with replacing long-term secrets with short-lived, scoped certificates in Github Actions. At the very end, we will wrap up by showcasing our software that codifies lessons learned.

Laying out the plan

Traditionally, engineers generated a long-lived SSH private/public keypair and stored this within the secrets store of their CI provider, where it can be accessed by their workflows.

Since this keypair is stored in the CI platform’s secrets manager, this gives an attacker a new option: targeting the platform itself. This has become more common in recent years as the number of credentials stored in CI platforms makes them a lucrative target.

If exfiltrated this long-lived credential gives the attacker months, or even years, to explore your systems.

To fix this situation, let’s make CI runner’s credentials short-lived by using certificates. This solution also lets us get rid of the secrets manager.

Our CI runner will submit its public key and proof of identity to get a signed short-lived certificate from a certificate authority (CA). This not only lets us issue a short-lived credential, but also means that no private keys are ever transmitted over the network.

Step 1. Proving worker’s identity

First, let’s make the CI runner identify itself to get its public key signed by the certificate authority.

We will use OpenID Connect (OIDC) — a standard protocol adopted by many CI platforms — GitLab, GitHub and CircleCI.

  • OIDC is widely known for SSO but has recently become a standard for providing identities to workloads.
  • OIDC gives every workload a special ID Token. The ID Token is a JSON blob with key value pairs that is signed by the CI platform, the issuing party.

Some platforms make the ID token directly accessible to the workload via environment variables. GitHub takes a different approach, providing a URL and a bearer token via environment variables.

Using this bearer token, a HTTP request can be made to the provided URL, and the response contains an ID token for the workload. This approach allows the workflow to use query parameters to customize certain claims contained within the ID token such as “aud”. Short for ‘audience’, this claim specifies who the intended relying party of an ID token is, and prevents ID tokens intended for one party being used with another.

Here is an example of JWT token issued by GitHub API:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaXNzIjoiaHR0cHM6Ly9naXRodWIuY29tIiwiaWF0IjoxNTE2MjM5MDIyLCJleHAiOjE1MTYyMzk4MjIsImF1ZCI6Imh0dHBzOi8vZ2l0aHViLmNvbSJ9.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

This token consists of three base64 encoded JSON strings separated by dots: a header that provides the signature algorithm and token type, a payload with key values and a signature.

In the example above, header and payload, when base64 is decoded, look like this:

{
  "typ": "JWT",
  "alg": "RS256",
  "x5t": "example-thumbprint",
  "kid": "example-key-id"
}
{
  "jti": "example-id",
  "sub": "repo:octo-org/octo-repo:environment:prod",
  "environment": "prod",
  "aud": "https://github.com/octo-org",
  "ref": "refs/heads/main",
  "repository": "octo-org/octo-repo","repository_owner": "octo-org",
  "actor": "octocat",
  "workflow": "example-workflow",
  "event_name": "workflow_dispatch",
  "ref_type": "branch",
  "iss": "https://token.actions.githubusercontent.com",
  "nbf": 1632492967,
  "exp": 1632493867,
  "iat": 1632493567
}

To use this token to authenticate against another system (known as the “relying party”), that system must first be configured to trust the issuer.

JWTs are not perfect as someone can still steal and reuse them. However, they are only valid for a brief period of time (time bound).

Our CI runner will present its public keys to our certificate authority. To prove its identity, it will send its JWT token issued by GitHub. CA will then validate the token and sign the certificate.

In our case, the issuing party will be GitHub Actions, and the relying party will be our CA.

mcsdevv followup
mcsdevv followup

Step 2: Exchange the identity for the certificate

As a second step, let’s exchange the CI runner's identity for a signed certificate.

First, the workflow will need to generate an SSH key pair. The private key will never leave the workflow run environment and is ephemeral; each new run will generate a new keypair.

The workflow submits the public part of the key pair, along with the ID token to the certificate authority using a remote procedure call.

Validating the identity

The CA must validate the ID token’s cryptographic signature. As with any cryptographic work, it’s safer to use a battle-hardened library for this purpose. At Teleport, we use our own fork of https://github.com/coreos/go-oidc.

Now, once we verify the signature, we can take a look at the content. Our CA must evaluate the key value pairs (called claims) of the ID token against a set of rules to determine if the token comes from the right workflow.

First, we will check some standard JWT claims:

  • aud: short for audience. Indicates the intended recipient of the ID token. When requesting the token from GitHub, this value should be set to a string that identifies your certificate authority service. Typically, the URL of the service is used. Your service should ensure it only accepts tokens where this value identifies it.
  • exp: indicates the time that the token expires. Tokens should not be accepted if their expiry is in the past.
  • nbf: stands for “not before”. Tokens should not be accepted if their nbf is in the future.

GitHub allows an environment, representing a specific deployment target such as ‘production’, to be associated with a workflow and will include this environment in the ID token.

GitHub Actions can associate any workflow with a deployment target called environment. GitHub Actions includes the environment in each JWT token issued to a CI runner.

Each environment can have a number of rules, for example, requiring that the workflow is triggered by a specific branch or approved by an admin. This makes the environment claim useful for granting a number of workflows access to your resources.

Signing certs

Once our CA has evaluated the workflow rules from the ID Token environment claim and verified the token belongs to your workload, it can sign a public key to respond with a short-lived certificate.

This process requires some knowledge of cryptography — there are well-tested open source libraries available in most languages. As users of Go, we used golang.org/x/crypto/ssh to issue SSH certificates.

We have set two fields in each SSH certificate: expiry date and principals.

We will make expiry time as short as possible, but last long enough to allow the workflow to complete, for example, half an hour.

The SSH certificate principals field controls the Linux logins a certificate grants access to.

Configuring an OpenSSH server to trust a certificate authority

With our workflow having received the signed SSH certificate from the CA, we need to configure the SSH servers to trust certificates signed by the CA:

TrustedUserCAKeys /etc/ssh/ca_user_key.pub

Using SSH certificates

Once our servers trust certificates signed by the CA, we can connect with vanilla SSH:

ssh -o CertificateFile=path/to/cert.pub -i path/to/key root@host.example.com

Some missing parts

We have replaced long-lived secrets with short-lived certs, which is a massive improvement. However, no plan is bulletproof. Let’s take a look at a few remaining issues.

The workflow’s certificate and private key can be stolen if an attacker gets access to the workflow environment. Since this certificate has a short expiry, the attacker's access to the system is limited to minutes. We can further limit the usefulness of these stolen credentials by using a technique known as “IP pinning”.

To implement IP pinning, we can add a critical option to the SSH certificate called “source-address” with the value of the IP address of the client exchanging its ID token. When an OpenSSH server receives a certificate including this option, it checks if the value matches the address of the connection. This renders a stolen certificate and private key less useful once stolen from the workflow environment.

To make it even harder to steal the private key, we can use special security devices — Trusted Platform Modules (TPM). TPM is a device that can generate, store and enable the use of private keys without them being exposed to the host. If a machine with TPM is taken over by an attacker, the private keys cannot be exfiltrated. Unfortunately, TPMs are not available in SaaS hosted CI runners. You can configure a self-hosted runner and provide it with TPM.

An open source alternative: Machine ID

In this post, we’ve reduced the attack surface of CI/CD systems using OIDC and short-lived SSH certificates. We’ve covered how these workflow runs can identify themselves, and get signed short-lived SSH certificates.

We have focused on SSH, but we can use the same process to get HTTPS working with X.509 client certificates.

To be honest, this machinery is pretty hard to build by yourself.

No worries, we got you covered with our open source tool, Machine ID, that uses the same principles we’ve described in this article, and adds auditing, RBAC and supports SSH, databases, Kubernetes and HTTP out of the box. As a service that programmatically issues and renews short-lived certificates to any service account, Machine ID can help eliminate the risk and pain of managing shared secrets. Providing a secure methodology for securing continuous delivery pipelines.

Give it a go, and if you have any questions, reach out to us in our Community Slack channel.

Tags

Teleport Newsletter

Stay up-to-date with the newest Teleport releases by subscribing to our monthly updates.

background

Subscribe to our newsletter

PAM / Teleport