Troubleshooting
In this guide, we will explain how to address issues or unexpected behavior in your Teleport cluster.
You can use these steps to get more visibility into the teleport
process so
you can troubleshoot the Auth Service, Proxy Service, and Teleport agent
services such as the Application Service and Database Service.
Prerequisites
-
A running Teleport cluster version 17.0.0-dev or above. If you want to get started with Teleport, sign up for a free trial or set up a demo environment.
-
The
tctl
admin tool andtsh
client tool.Visit Installation for instructions on downloading
tctl
andtsh
.
- To check that you can connect to your Teleport cluster, sign in with
tsh login
, then verify that you can runtctl
commands using your current credentials. For example:If you can connect to the cluster and run the$ tsh login --proxy=teleport.example.com [email protected]
$ tctl status
# Cluster teleport.example.com
# Version 17.0.0-dev
# CA pin sha256:abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678tctl status
command, you can use your current credentials to run subsequenttctl
commands from your workstation. If you host your own Teleport cluster, you can also runtctl
commands on the computer that hosts the Teleport Auth Service for full permissions.
Step 1/3. Enable verbose logging
To change log levels in Teleport, you can use either of the following methods:
- Debug Service: Allows on-the-fly log level adjustments without restarting the instance, which is ideal for troubleshooting sessions.
- Updating configuration: Involves updating the Teleport configuration file and restarting the instance.
- Debug Service
- Updating configuration
The Teleport Debug Service allows administrators to dynamically manage log levels without restarting the instance. The service, enabled by default, ensures local-only access and must be consumed from inside the same instance.
To change the instance log level use the teleport debug set-log-level
command:
$ teleport debug set-log-level DEBUG
Changed log level from "INFO" to "DEBUG".
$ kubectl -n teleport exec my-pod -- teleport set-log-level DEBUG
Changed log level from "INFO" to "DEBUG".
If you're unsure what is the current level you can retrieve it using
teleport debug get-log-level
.
After troubleshooting, remember to turn the log level back to avoid generating unnecessary logs.
If your Teleport configuration is not placed on the default path
(/etc/teleport.yaml
), you must specify its location to the CLI command
using the -c/--config
flag.
To diagnose problems, you can configure the teleport
process to run with
verbose logging enabled by passing it the -d
flag. teleport
will write logs
to stderr.
Alternatively, you can set the log level from the Teleport configuration file:
teleport:
log:
severity: DEBUG
Restart the teleport
process to apply the modified log level. Logs will resemble
the following (these logs were printed while joining a server to a cluster, then
terminating the teleport
process on the server):
Debug logs include the file and line number of the code that emitted the log, so
you can investigate (or report) what a teleport
process was doing before it ran into
problems. Here's an example:
DEBU [NODE:PROX] Agent connected to proxy: [aee1241f-0f6f-460e-8149-23c38709e46d.tele.example.com aee1241f-0f6f-460e-8149-23c38709e46d teleport-proxy-us-west-2-6db8db844c-ftmg9.tele.example.com teleport-proxy-us-west-2-6db8db844c-ftmg9 localhost 127.0.0.1 ::1 tele.example.com 100.92.90.42 remote.kube.proxy.teleport.cluster.local]. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:414
DEBU [NODE:PROX] Changing state connecting -> connected. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:210
DEBU [NODE:PROX] Discovery request channel opened: teleport-discovery. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:526
DEBU [NODE:PROX] handleDiscovery requests channel. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:544
DEBU [NODE:PROX] Pool is closing agent. leaseID:2 target:tele.example.com:11106 reversetunnel/agentpool.go:238
DEBU [NODE:PROX] Pool is closing agent. leaseID:3 target:tele.example.com:11106 reversetunnel/agentpool.go:238
It is not recommended to run Teleport in production with verbose logging as it generates a substantial amount of data.
Step 2/3. Generate a debug dump
The teleport
binary is a Go program. Go programs assign work to CPU threads
using an abstraction called a goroutine. You can get a goroutine dump of a
running teleport
process by sending it a USR1
signal.
This is especially useful for troubleshooting a teleport
process that appears
stuck, since you can see which a goroutine is blocked and and why. For example,
goroutines often communicate using channels, and a goroutine dump indicates
whether a goroutine is waiting to send or receive on a channel.
To generate a goroutine dump, send a USR1
signal to a teleport
process:
$ kill -USR1 $(pidof teleport)
Teleport will print the debug information to stderr
. Here what you will see in
the logs:
INFO [PROC:1] Got signal "user defined signal 1", logging diagnostic info to stderr. service/signals.go:99
Runtime stats
goroutines: 64
OS threads: 10
GOMAXPROCS: 2
num CPU: 2
...
goroutines: 84
...
Goroutines
goroutine 1 [running]:
runtime/pprof.writeGoroutineStacks(0x3c2ffc0, 0xc0001a8010, 0xc001011a38, 0x4bcfb3)
/usr/local/go/src/runtime/pprof/pprof.go:693 +0x9f
...
You can print a goroutine dump without enabling verbose logging.
Step 3/3. Ask for help
Once you have collected verbose logs and a goroutine dump from your teleport
binary, you can use this information to get help from the Teleport community and
Support team.
Collect your Teleport version
Determine the version of the teleport
process you are investigating.
$ teleport version
Teleport v8.3.7 git:v8.3.7-0-ga8d066935 go1.17.3
You can also collect the versions of the Teleport Auth Service, Proxy Service, and client tools to rule out version compatibility issues.
To see the version of the Auth Service and Proxy Service, run the following command:
$ tctl status
Cluster mytenant.teleport.sh
Version 16.4.3
Host CA never updated
User CA never updated
Jwt CA never updated
CA pin sha256:abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678
Get the versions of your client tools:
$ tctl version
Teleport v9.0.4 git: go1.18
$ tsh version
Teleport v9.0.4 git: go1.18
Pose your question
- Commercial Teleport Editions
- Teleport Community Edition
If you have a question or need assistance please submit a request through the Teleport support portal.
If you need help, please ask on our community forum. You can also open an issue on GitHub.
For more information about Enterprise features reach out to the Teleport sales team. You can also sign up for a free trial of Teleport Enterprise.
Further reading
This guide showed how to investigate issues with the teleport
process. To see
how you can monitor more general health and performance data from your Teleport
cluster, read our Teleport Diagnostics guides.
For additional sources of Teleport support, please see the Teleport Support and Education Center.
Common Issues
teleport.cluster.local
It is common to see references to teleport.cluster.local
within logs and
errors in Teleport. This is a special value that is used within Teleport for two
purposes and seeing it within your logs is not necessarily an indication that
anything is incorrect.
Firstly, Teleport uses this value within certificates (as a DNS Subject Alternative Name) issued to the Auth and Proxy Service. Teleport clients can then use this value to validate the service's certificates during the TLS handshake regardless of the service address as long as the client already has a copy of the cluster's certificate authorities. This is important as there are often multiple different ways that a client can connect to the Auth Service and these are not always via the same address.
Secondly, this value is used by clients as part of the URL when making gRPC or
HTTP requests to the Teleport API. This is because the Teleport API client uses
special logic to open the connection to the Auth Service to make the request,
rather than connecting to a single address as a typical client may do. This
special logic is necessary for the client to be able to support connecting to a
list of Auth Services or to be able to connect to the Auth Service through a
tunnel via the Proxy Service. This means that teleport.cluster.local
appears
in log messages that show the URL of a request made to the Auth Service, and
does not explicitly indicate that something is misconfigured.
ssh: overflow reading version string
and/or 502: Bad Gateway
errors
You must ensure that your reverse proxy is communicating with Teleport using HTTPS. When running Teleport in Kubernetes and using nginx as an ingress, this requires adding an annotation to the chart values:
annotations:
ingress:
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
Deploying Teleport behind Cloudflare, whether using its proxy ("orange-clouding") or tunnels
(cloudflared
) should work with Teleport version 15.1 or higher. See the
TLS Routing FAQ for more details.