Kubernetes Access Troubleshooting
This page describes common issues with Kubernetes and how to resolve them.
An agent failed to join a cluster due to "no authorities for hostname"
Symptoms
The agent can't rejoin the Teleport cluster after restart and reports an error similar to:
ssh: handshake failed: ssh: no authorities for hostname
Explanation
Teleport uses certificate authorities (CAs) to sign certificates for each component. When the component joins the cluster for the first time, it receives a certificate signed by the cluster's CA and stores it in its state directory. When the component restarts, it uses the certificate stored in its state directory to join the cluster again.
This error occurs when the component tries to rejoin the Teleport cluster, but the cluster's CA has changed and the component's certificate is no longer valid.
It can happen when the cluster's CA is rotated or when the cluster is recreated or renamed.
Resolution
The agent's state needs to be reset, so it can request a new certificate and join the cluster again.
The process for deleting the agent's state depends on whether the agent is running inside or outside of Kubernetes.
Agents running outside of Kubernetes (standalone)
If the agent is running outside of Kubernetes, the state directory is located
at /var/lib/teleport/proc
by default. You can delete the state directory with
the following command:
sudo rm -rf /var/lib/teleport/proc
And then restart the agent:
sudo systemctl restart teleport
Agents running in Kubernetes (teleport-kube-agent
)
Starting in Teleport 11, the teleport-kube-agent
pod's state is stored in a
Kubernetes Secret - name:{pod-name}-state
- existing in the installation namespace.
To delete the state, follow the steps below:
# Get the secrets for the teleport-kube-agent pods
$ kubectl get secret -o name -n teleport-agent | grep "state"
teleport-agent-0-state
teleport-agent-1-state
# Delete the secrets
$ kubectl delete secret -n teleport-agent teleport-agent-0-state teleport-agent-1-state
If you're mounting /var/lib/teleport
into the container, please clean the
contents of /var/lib/teleport/proc
inside the container and then restart
the container.
Once the state is deleted, restart each agent pod.
Unable to connect to GKE Autopilot clusters
GKE Warden authz [denied by user-impersonation-limitation]: impersonating system identities are not allowed
Symptoms
After configuring a GKE Autopilot cluster in Teleport, all attempts to retrieve Kubernetes Objects fail with an error similar to:
Or the following:
GKE Autopilot denies requests that impersonate "system:masters" group
This issue only affects GKE Autopilot clusters. If you're using a standard GKE cluster, this issue doesn't apply to you.
Explanation
Unlike the standard Kubernetes cluster, Autopilot clusters forbid requests to impersonate system identities. This is a security feature that prevents users from gaining access to the cluster's control plane and performing administrative actions if they can impersonate users or groups. Learn more about GKE Autopilot security.
Teleport uses impersonation to retrieve Kubernetes objects
on behalf of the user. This is done by sending a request to the Kubernetes API server
with the user's identity in the Impersonate-User
header and all the Kubernetes
Groups they can use in the Impersonate-Group
header.
Since system:masters
is a built-in Kubernetes group in all clusters, it's usual
for administrators to use it to gain administrative access to the cluster's control
plane. However, Autopilot clusters forbid impersonating this group and require that
another group is used instead.
Resolution
Per the description above, the solution is to configure a different group for
impersonation in Teleport. This can be done by setting the role's
kubernetes_group
parameter to a Group that the Autopilot cluster allows to
impersonate.
The teleport-kube-agent
chart configures a Kubernetes Group with the same access
levels as the system:masters
group when it detects the target cluster is a GKE
cluster.
This group is named, by default, cluster-admin
, but the name can be changed
by setting the adminClusterRoleBinding.name
parameter.
The Kubernetes Group isn't created automatically when installing the chart on
non-GKE clusters, so don't change the kubernetes_groups
parameter to
cluster-admin
unless you created the group manually or installed the chart with
adminClusterRoleBinding.create
parameter set to true
.
It's important to note that the group must be configured in the cluster before
it can be used for impersonation. If the group is not configured, the impersonation
request will fail with a 403 Forbidden
error and the user will not be able to
access the cluster.
When you opt to continue using the system:masters
group for impersonation in
non-Autopilot clusters, you must ensure that the Teleport roles that grant
system:masters
access can't be used to access GKE Autopilot clusters.
As an example, a user with the following role can impersonate the system:masters
group in any Kubernetes cluster:
kind: role
version: v7
metadata:
name: k8s-admin
spec:
allow:
kubernetes_labels:
'*': '*'
kubernetes_groups: ["system:masters"]
The wildcard *
in kubernetes_labels
allows the user to access any Kubernetes
cluster in the Teleport cluster. To prevent the user from accessing GKE Autopilot
clusters with that role, you can install the teleport-kube-agent
chart with
a label that identifies the cluster as an Autopilot cluster. For example:
$ PROXY_ADDR=teleport.example.com:443
$ CLUSTER=cookie
# Create the values.yaml file
$ cat > values.yaml << EOF
authToken: "${TOKEN}"
proxyAddr: "${PROXY_ADDR}"
roles: "kube,app,discovery"
joinParams:
method: "token"
tokenName: "${TOKEN}"
kubeClusterName: "${CLUSTER}"
labels:
"type" : "autopilot"
EOF
# Install the helm chart with the values.yaml setting
$ helm install teleport-agent teleport/teleport-kube-agent \
-f values.yaml \
--create-namespace \
--namespace=teleport-agent \
--version 14.3.33
Make sure that the Teleport agent pod is running. You should see one Teleport agent pod pod with a single ready container:
$ kubectl -n teleport-agent get pods
NAME READY STATUS RESTARTS AGE
teleport-agent-0 1/1 Running 0 32s
Now that the cluster is labeled, you can split the k8s-admin
role into two
roles: one that allows access to all non-Autopilot clusters and another that only
allows access to Autopilot clusters.
kind: role
version: v7
metadata:
name: k8s-admin-non-gke-autopilot
spec:
allow:
kubernetes_labels_expression: 'labels["type"] != "autopilot"'
kubernetes_groups: ["system:masters"]
---
kind: role
version: v7
metadata:
name: k8s-admin-gke-autopilot
spec:
allow:
kubernetes_labels_expression: 'labels["type"] == "autopilot"'
kubernetes_groups: ["cluster-admin"]
Once the roles are created, you can assign them to users as usual, but to be effective immediately, they must logout and login again.
Unable to exec into a Pod with kubectl 1.30+
pods "<pod_name>" is forbidden: User "<user>" cannot get resource "pods/exec" in API group "" in the namespace "<namespace>"
Symptoms
After upgrading kubectl
to version 1.30 or later, attempts to exec into a Pod
fail with an error similar to:
pods "<pod_name>" is forbidden: User "<user>" cannot get resource "pods/exec" in API group "" in the namespace "<namespace>"
Explanation
Starting with Kubernetes 1.30, the kubectl exec
command switched from using
the SPDY protocol to the WebSocket protocol for communicating with the Kubernetes
API server. This change was made to enhance the performance and reliability of
the kubectl exec
command, as the SPDY protocol was deprecated.
Although WebSocket was intended to be a drop-in replacement for SPDY, it introduced
a breaking change for Kubernetes clusters where RBAC was configured to restrict
access to the pods/exec
resource using only the create
verb. Previously,
the SPDY protocol allowed creating a connection using either
GET
(mapped to get
in Kubernetes RBAC) or
POST
(mapped to create
in Kubernetes RBAC).
With the WebSocket protocol, the kubectl exec
command always uses the GET
method to create a connection. This means that if the RBAC policy permits only
the create
verb, the connection will be denied.
Resolution
To resolve this issue, update the RBAC policy to allow the get
verb for
the pods/exec
(sub)resource. This can be done by modifying the ClusterRole
or Role
that grants access to the user.
For example, if you have a ClusterRole
that grants access to the pods/exec
resource with the create
verb, update it to allow the get
verb as well:
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "get"]
Once the ClusterRole
is updated, you should be able to exec into Pods using
kubectl
version 1.30 or later.