Skip to main content

Troubleshooting

In this guide, we will explain how to address issues or unexpected behavior in your Teleport cluster.

You can use these steps to get more visibility into the teleport process so you can troubleshoot the Auth Service, Proxy Service, and Teleport agent services such as the Application Service and Database Service.

Prerequisites

  • A running Teleport cluster version 14.3.33 or above. If you want to get started with Teleport, sign up for a free trial or set up a demo environment.

  • The tctl admin tool and tsh client tool.

    Visit Installation for instructions on downloading tctl and tsh.

  • To check that you can connect to your Teleport cluster, sign in with tsh login, then verify that you can run tctl commands using your current credentials. tctl is supported on macOS and Linux machines. For example:
    $ tsh login --proxy=teleport.example.com [email protected]
    $ tctl status
    # Cluster teleport.example.com
    # Version 14.3.33
    # CA pin sha256:abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678
    If you can connect to the cluster and run the tctl status command, you can use your current credentials to run subsequent tctl commands from your workstation. If you host your own Teleport cluster, you can also run tctl commands on the computer that hosts the Teleport Auth Service for full permissions.

Step 1/3. Enable verbose logging

To diagnose problems, you can configure the teleport process to run with verbose logging enabled by passing it the -d flag. teleport will write logs to stderr.

Alternatively, you can set the log level from the Teleport configuration file:

teleport:
log:
severity: DEBUG

Restart the teleport process to apply the modified log level. Logs will resemble the following (these logs were printed while joining a server to a cluster, then terminating the teleport process on the server):

DEBU [NODE:PROX] Agent connected to proxy: [aee1241f-0f6f-460e-8149-23c38709e46d.tele.example.com aee1241f-0f6f-460e-8149-23c38709e46d teleport-proxy-us-west-2-6db8db844c-ftmg9.tele.example.com teleport-proxy-us-west-2-6db8db844c-ftmg9 localhost 127.0.0.1 ::1 tele.example.com 100.92.90.42 remote.kube.proxy.teleport.cluster.local]. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:414
DEBU [NODE:PROX] Changing state connecting -> connected. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:210
DEBU [NODE:PROX] Discovery request channel opened: teleport-discovery. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:526
DEBU [NODE:PROX] handleDiscovery requests channel. leaseID:4 target:tele.example.com:11106 reversetunnel/agent.go:544
DEBU [NODE:PROX] Pool is closing agent. leaseID:2 target:tele.example.com:11106 reversetunnel/agentpool.go:238
DEBU [NODE:PROX] Pool is closing agent. leaseID:3 target:tele.example.com:11106 reversetunnel/agentpool.go:238

Debug logs include the file and line number of the code that emitted the log, so you can investigate (or report) what a teleport process was doing before it ran into problems.

warning

It is not recommended to run Teleport in production with verbose logging as it generates a substantial amount of data.

Step 2/3. Generate a debug dump

The teleport binary is a Go program. Go programs assign work to CPU threads using an abstraction called a goroutine. You can get a goroutine dump of a running teleport process by sending it a USR1 signal.

This is especially useful for troubleshooting a teleport process that appears stuck, since you can see which a goroutine is blocked and and why. For example, goroutines often communicate using channels, and a goroutine dump indicates whether a goroutine is waiting to send or receive on a channel.

To generate a goroutine dump, send a USR1 signal to a teleport process:

$ kill -USR1 $(pidof teleport)

Teleport will print the debug information to stderr. Here what you will see in the logs:

INFO [PROC:1]    Got signal "user defined signal 1", logging diagnostic info to stderr. service/signals.go:99
Runtime stats
goroutines: 64
OS threads: 10
GOMAXPROCS: 2
num CPU: 2
...
goroutines: 84
...
Goroutines
goroutine 1 [running]:
runtime/pprof.writeGoroutineStacks(0x3c2ffc0, 0xc0001a8010, 0xc001011a38, 0x4bcfb3)
/usr/local/go/src/runtime/pprof/pprof.go:693 +0x9f
...
tip

You can print a goroutine dump without enabling verbose logging.

Step 3/3. Ask for help

Once you have collected verbose logs and a goroutine dump from your teleport binary, you can use this information to get help from the Teleport community and Support team.

Collect your Teleport version

Determine the version of the teleport process you are investigating.

$ teleport version
Teleport v8.3.7 git:v8.3.7-0-ga8d066935 go1.17.3

You can also collect the versions of the Teleport Auth Service, Proxy Service, and client tools to rule out version compatibility issues.

To see the version of the Auth Service and Proxy Service, run the following command:

$ tctl status
Cluster mytenant.teleport.sh
Version 16.4.3
Host CA never updated
User CA never updated
Jwt CA never updated
CA pin sha256:abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678abdc1245efgh5678

Get the versions of your client tools:

$ tctl version
Teleport v9.0.4 git: go1.18
$ tsh version
Teleport v9.0.4 git: go1.18

Pose your question

If you have a question or need assistance please submit a request through the Teleport support portal.

Further reading

This guide showed how to investigate issues with the teleport process. To see how you can monitor more general health and performance data from your Teleport cluster, read our Teleport Diagnostics guides.

For additional sources of Teleport support, please see the Teleport Support and Education Center.

Common Issues

teleport.cluster.local

It is common to see references to teleport.cluster.local within logs and errors in Teleport. This is a special value that is used within Teleport for two purposes and seeing it within your logs is not necessarily an indication that anything is incorrect.

Firstly, Teleport uses this value within certificates (as a DNS Subject Alternative Name) issued to the Auth and Proxy Service. Teleport clients can then use this value to validate the service's certificates during the TLS handshake regardless of the service address as long as the client already has a copy of the cluster's certificate authorities. This is important as there are often multiple different ways that a client can connect to the Auth Service and these are not always via the same address.

Secondly, this value is used by clients as part of the URL when making gRPC or HTTP requests to the Teleport API. This is because the Teleport API client uses special logic to open the connection to the Auth Service to make the request, rather than connecting to a single address as a typical client may do. This special logic is necessary for the client to be able to support connecting to a list of Auth Services or to be able to connect to the Auth Service through a tunnel via the Proxy Service. This means that teleport.cluster.local appears in log messages that show the URL of a request made to the Auth Service, and does not explicitly indicate that something is misconfigured.

ssh: overflow reading version string and/or 502: Bad Gateway errors

Teleport version 13.0+

Support for TLS routing behind layer 7 (HTTP/HTTPS) load balancers and reverse proxies is available starting from Teleport 13.0. Please ensure your Teleport cluster and Teleport clients are up to date. If the problem persists, please submit a GitHub issue.

You must ensure that your reverse proxy is communicating with Teleport using HTTPS. When running Teleport in Kubernetes and using nginx as an ingress, this requires adding an annotation to the chart values:

annotations:
ingress:
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"

Deploying Teleport behind Cloudflare, whether using its proxy ("orange-clouding") or tunnels (cloudflared) should work with Teleport version 15.1 or higher. See the TLS Routing FAQ for more details.

Prior to Teleport version 13.0

Prior to Teleport version 13.0, using Teleport's TLS routing mode behind a layer 7 (HTTP/HTTPS) proxy is not supported, due to these proxies terminating TLS themselves and then rewriting their requests to the upstream service, stripping the additional SNI/ALPN parts of the request in the process.

For older versions, in order for ALPN to work correctly, the Teleport Proxy Service must terminate TLS itself.

Broadly, this means that prior to Teleport version 13.0, Teleport's TLS routing functionality is incompatible with:

  • AWS ALBs (Application Load Balancers)
  • AWS NLBs (Network Load Balancers), when using a TLS listener and a public ACM (Amazon Certificate Manager) certificate
  • Commonly used HTTP reverse proxies including nginx, Apache, Caddy, Traefik, HAProxy and many others
  • Cloudflare tunnels in their default configuration

Deploying Teleport in TLS routing mode behind an HTTP proxy will result in a Teleport Web UI experience that seems to work perfectly, but the use of tsh, tctl and attempting to join remote Teleport services to the cluster will fail with errors like ssh: overflow reading version string and EOF. A functioning Teleport Web UI is not always an indication of a correctly configured Teleport cluster.

If in doubt, remove all load balancers/proxies from the equation and connect Teleport clients or agent processes directly to Teleport's web port to isolate the issue.

To use Teleport behind a reverse proxy prior to Teleport version 13.0, you should either:

  • use a layer 4 (TCP) proxy which forwards TCP streams directly to Teleport (which will in turn handle TLS termination itself)
  • disable Teleport's TLS routing mode by adding version: v1 to your config file and removing proxy_listener_mode: multiplex

You can get an example v1 config file using teleport configure --version=v1 --public-addr=teleport.example.com:443 (change the public address to your own domain)

If disabling TLS routing, consult the list of default ports to use for connecting different Teleport services.