Skip to main content

Standalone Kubernetes Operator

This guide explains how to run the Teleport Kubernetes Operator against any remote Teleport cluster. If your Teleport cluster is deployed using the teleport-cluster Helm chart, you might want to follow the guide for Helm-deployed clusters instead.

Prerequisites

  • A running Teleport cluster version 15.4.22 or above. If you want to get started with Teleport, sign up for a free trial or set up a demo environment.

  • The tctl admin tool and tsh client tool.

    On Teleport Enterprise, you must use the Enterprise version of tctl, which you can download from your Teleport account workspace. Otherwise, visit Installation for instructions on downloading tctl and tsh for Teleport Community Edition.

  • a Kubernetes cluster. You must be able to create/read Namespace, ServiceAccount, Deployment, Secret, Role, RoleBinding and CustomResourceDefinition resources.
  • Helm
  • kubectl
  • a Teleport cluster running at least version 15.

Validate Kubernetes connectivity by running the following command:

$ kubectl cluster-info
# Kubernetes control plane is running at https://127.0.0.1:6443
# CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
# Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
tip

Users wanting to experiment locally with the operator can use minikube to start a local Kubernetes cluster:

$ minikube start

Step 1/4. Create the operator role

In this step we create the role the operator uses to interact with Teleport resources.

Download and apply the operator role manifest:

$ curl -L https://raw.githubusercontent.com/gravitational/teleport/v15.4.22/integrations/operator/hack/fixture-operator-role.yaml -o operator-role.yaml
$ tctl create -f operator-role.yaml
note

If you upgrade the operator to a new version that adds support for new Teleport resources, you will need to re-apply the operator role manifest. This will grant the operator access to the new resources.

Step 2/4. Create the operator join token

The join token is used by the operator on each startup to join the Teleport cluster and retrieve its client certificates.

To establish trust between the connecting operator and Teleport, we are delegating the authentication to Kubernetes. Kubernetes has its own internal CA which signs the ServiceAccount tokens that are mounted in the pods. In the following setup, Teleport will trust SA tokens signed by Kubernetes to join the cluster.

  1. Retrieve the Kubernetes JWKS (the keys Teleport can use to validate Kubernetes SA tokens)
    $ export JWKS="$(kubectl get --raw /openid/v1/jwks)"
  2. Create the token manifest that allows serviceaccount teleport-iac-operator from the namespace teleport-iac to join the cluster as the operator.
    $ cat <<EOF > operator-token.yaml   
    kind: token
    version: v2
    metadata:
    name: operator-bot
    spec:
    roles: [Bot]
    # bot_name will match the name of the bot created later in this guide.
    bot_name: operator
    join_method: kubernetes
    kubernetes:
    type: static_jwks
    static_jwks:
    jwks: |
    $JWKS
    allow:
    - service_account: "teleport-iac:teleport-operator" # namespace:serviceaccount
    EOF
  3. Then, apply the token manifest:
    $ tctl create -f operator-token.yaml
  4. Finally, retrieve the Teleport cluster name that will be required to use the token:
    $ export CLUSTER_NAME="$(tctl status | awk '/Cluster/ {print $2}')"

Step 3/4. Create the operator bot

In Teleport, a bot is a resource allowing a machine to access Teleport. Create a bot for the operator with the following command:

$ tctl bots add operator --token operator-bot --roles operator

Step 4/4. Deploy the operator in the Kubernetes cluster

At this point, you can configure and run the operator:

Set up the Teleport Helm repository.

Allow Helm to install charts that are hosted in the Teleport Helm repository:

$ helm repo add teleport https://charts.releases.teleport.dev

Update the cache of charts from the remote repository so you can upgrade to all available releases:

$ helm repo update
  1. Recover the version of your Teleport cluster
    export TELEPORT_VERSION="$(tsh version | awk '/Proxy[[:space:]]version/ {print $3}')"
    echo "$TELEPORT_VERSION"
  2. Create the Kubernetes namespace that will contain both the operator Pods and the CustomResources to configure Teleport:
    $ kubectl create namespace teleport-iac
  3. Apply the strictest Pod Security Standard on the namespace:
    $ kubectl label namespace teleport-iac 'pod-security.kubernetes.io/enforce=restricted'
  4. Deploy the operator with Helm:
    $ helm install teleport-operator teleport/teleport-operator -n teleport-iac --version "$TELEPORT_VERSION" --set teleportAddress=teleport.example.com:443 --set "teleportClusterName=$CLUSTER_NAME" --set token=operator-bot 
  5. Validate that operator is running properly (the operator might take a few seconds to start):
    $ kubectl get pods -n teleport-iac

Next steps

Follow the user and role IaC guide to use your newly deployed Teleport Kubernetes Operator to create Teleport users and grant them roles.

Helm Chart parameters are documented in the teleport-operator Helm chart reference.

Troubleshooting

The CustomResources (CRs) are not reconciled

The Teleport Operator watches for new resources or changes in Kubernetes. When a change happens, it triggers the reconciliation loop. This loop is in charge of validating the resource, checking if it already exists in Teleport and making calls to the Teleport API to create/update/delete the resource. The reconciliation loop also adds a status field on the Kubernetes resource.

If an error happens and the reconciliation loop is not successful, an item in status.conditions will describe what went wrong. This allows users to diagnose errors by inspecting Kubernetes resources with kubectl:

$ kubectl describe teleportusers myuser

For example, if a user has been granted a nonexistent role the status will look like:

apiVersion: resources.teleport.dev/v2
kind: TeleportUser
# [...]
status:
conditions:
- lastTransitionTime: "2022-07-25T16:15:52Z"
message: Teleport resource has the Kubernetes origin label.
reason: OriginLabelMatching
status: "True"
type: TeleportResourceOwned
- lastTransitionTime: "2022-07-25T17:08:58Z"
message: 'Teleport returned the error: role my-non-existing-role is not found'
reason: TeleportError
status: "False"
type: SuccessfullyReconciled

Here SuccessfullyReconciled is False and the error is role my-non-existing-role is not found.

If the status is not present or does not give sufficient information to solve the issue, check the operator logs:

The CR doesn't have a status

  1. Check if the CR is in the same namespace as the operator. The operator only watches for resource in its own namespace.

  2. Check if the operator pods are running and healthy:

    kubectl get pods -n "$OPERATOR_NAMESPACE"`
  3. Check the operator logs:

    $ kubectl logs deploy/<OPERATOR_DEPLOYMENT_NAME> -n "$OPERATOR_NAMESPACE"
    note

    In case of multi-replica deployments, only one operator instance is running the reconciliation loop. This operator is called the leader and is the only one producing reconciliation logs. The other operator instances are waiting with the following log:

    leaderelection.go:248] attempting to acquire leader lease teleport/431e83f4.teleport.dev...

    To diagnose reconciliation issues, you will have to inspect all pods to find the one reconciling the resources.

I cannot delete the Kubernetes CR

The operator protects Kubernetes CRs from deletion with a finalizer. It will not allow the CR to be deleted until the Teleport resource is deleted as well, this is a safety to avoid leaving dangling resources and potentially grant unintentional access.

There might be some reasons causing Teleport to refuse a resource deletion, the most frequent one is if another resource depends on it. For example: you cannot delete a role if it is still assigned to a user.

If this happens, the operator will report the error sent by Teleport in its log.

To resolve this lock, you can either:

  • resolve the dependency issue so the resource gets successfully deleted in Teleport. In the role example, this would imply removing any mention of the role from the various users who had it.

  • patch the Kubernetes CR to remove the finalizers. This will tell Kubernetes to stop waiting for the operator deletion and remove the CR. If you do this, the CR will be removed but the Teleport resource will remain. The operator will never attempt to remove it again.

    For example, if the role is named my-role:

    kubectl patch TeleportRole my-role -p '{"metadata":{"finalizers":null}}' --type=merge