Fork me on GitHub

Teleport

Metrics

Improve

Teleport exposes metrics for all of its components, helping you get insight into the state of your cluster. This guide explains the metrics that you can collect from your Teleport cluster.

Enabling metrics

Teleport's diagnostic HTTP endpoints are disabled by default. You can enable them via:

Start a teleport instance with the --diag-addr flag set to the local address where the diagnostic endpoint will listen:

sudo teleport start --diag-addr=127.0.0.1:3000

Edit a teleport instance's configuration file (/etc/teleport.yaml by default) to include the following:

teleport:
    diag_addr: 127.0.0.1:3000

Verify that Teleport is now serving the diagnostics endpoint:

curl http://127.0.0.1:3000

This will enable the http://127.0.0.1:3000/metrics endpoint, which serves the metrics that Teleport tracks. It is compatible with Prometheus collectors.

The following metrics are available:

Teleport Cloud does not expose monitoring endpoints for the Auth Service and Proxy Service.

Auth Service and backends

NameTypeComponentDescription
audit_failed_disk_monitoringcounterTeleport Audit LogNumber of times disk monitoring failed.
audit_failed_emit_eventscounterTeleport Audit LogNumber of times emitting audit events failed.
audit_percentage_disk_space_usedgaugeTeleport Audit LogPercentage of disk space used.
audit_server_open_filesgaugeTeleport Audit LogNumber of open audit files.
auth_generate_requests_throttled_totalcounterTeleport AuthNumber of throttled requests to generate new server keys.
auth_generate_requests_totalcounterTeleport AuthNumber of requests to generate new server keys.
auth_generate_requestsgaugeTeleport AuthNumber of current generate requests.
auth_generate_secondshistogramTeleport AuthLatency for generate requests.
backend_batch_read_requests_totalcountercacheNumber of read requests to the backend.
backend_batch_read_secondshistogramcacheLatency for batch read operations.
backend_batch_write_requests_totalcountercacheNumber of batch write requests to the backend.
backend_batch_write_secondshistogramcacheLatency for backend batch write operations.
backend_read_requests_totalcountercacheNumber of read requests to the backend.
backend_read_secondshistogramcacheLatency for read operations.
backend_requestscountercacheNumber of write requests to the backend.
backend_write_secondshistogramcacheLatency for backend write operations.
cluster_name_not_found_totalcounterTeleport AuthNumber of times a cluster was not found.
dynamo_requests_totalcounterDynamoDBNumber of requests to the DynamoDB API.
dynamo_requestscounterDynamoDBNumber of failed requests to the DynamoDB API.
dynamo_requests_secondshistogramDynamoDBLatency of DynamoDB API requests.
dynamo_requests_totalcounterDynamoDBNumber of requests to the DynamoDB API
etcd_backend_batch_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_batch_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_tx_requestscounteretcdNumber of transaction requests to the database.
etcd_backend_tx_secondshistogrametcdLatency for etcd transaction operations.
etcd_backend_write_requestscounteretcdNumber of write requests to the database.
etcd_backend_write_secondshistogrametcdLatency for etcd write operations.
firestore_events_backend_batch_read_requestscounterGCP Cloud FirestoreNumber of batch read requests to Cloud Firestore events.
firestore_events_backend_batch_read_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch read operations.
firestore_events_backend_batch_write_requestscounterGCP Cloud FirestoreNumber of batch write requests to Cloud Firestore events.
firestore_events_backend_batch_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch write operations.
firestore_events_backend_write_requestscounterGCP Cloud FirestoreNumber of write requests to Cloud Firestore events.
firestore_events_backend_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events write operations.
gcs_event_storage_downloads_secondshistogramGCP GCSLatency for GCS download operations.
gcs_event_storage_downloadscounterGCP GCSNumber of downloads from the GCS backend.
gcs_event_storage_uploads_secondshistogramGCP GCSLatency for GCS upload operations.
gcs_event_storage_uploadscounterGCP GCSNumber of uploads to the GCS backend.
grpc_server_started_totalcounterTeleport AuthTotal number of RPCs started on the server.
grpc_server_handled_totalcounterTeleport AuthTotal number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_totalcounterTeleport AuthTotal number of RPC stream messages received on the server.
grpc_server_msg_sent_totalcounterTeleport AuthTotal number of gRPC stream messages sent by the server.
heartbeat_missed_totalcounterTeleport AuthNumber of times the Auth Service did not receive a heartbeat from a Node.
heartbeat_connections_received_totalcounterTeleport AuthNumber of times the Auth Service received a heartbeat connection.
s3_requests_totalcounterAmazon S3Total number of requests to the S3 API.
s3_requestscounterAmazon S3Number of requests to the S3 API by result.
s3_requests_secondshistogramAmazon S3Request latency for the S3 API.
teleport_audit_emit_eventscounterTeleport Audit LogNumber of audit events emitted.
teleport_connected_resourcesgaugeTeleport AuthNumber and type of resources connected via keepalives.
teleport_registered_serversgaugeTeleport AuthThe number of Teleport services that are connected to an Auth Service instance grouped by version.
user_login_totalcounterTeleport AuthNumber of user logins.
watcher_event_sizeshistogramcacheOverall size of events emitted.
watcher_eventshistogramcachePer resource size of events emitted.

Enhanced Session Recording / BPF

NameTypeComponentDescription
bpf_lost_command_eventscounterBPFNumber of lost command events.
bpf_lost_disk_eventscounterBPFNumber of lost disk events.
bpf_lost_network_eventscounterBPFNumber of lost network events.

Proxy Service

NameTypeComponentDescription
failed_connect_to_node_attempts_totalcounterTeleport ProxyNumber of failed SSH connection attempts to a Node. Use with teleport_connect_to_node_attempts_total to get the failure rate.
failed_login_attempts_totalcounterTeleport ProxyNumber of failed tsh login or tsh ssh logins.
grpc_client_started_totalcounterTeleport ProxyTotal number of RPCs started on the client.
grpc_client_handled_totalcounterTeleport ProxyTotal number of RPCs completed on the client, regardless of success or failure.
grpc_client_msg_received_totalcounterTeleport ProxyTotal number of RPC stream messages received on the client.
grpc_client_msg_sent_totalcounterTeleport ProxyTotal number of gRPC stream messages sent by the client.
proxy_connection_limit_exceeded_totalcounterTeleport ProxyNumber of connections that exceeded the Proxy Service connection limit.
proxy_ssh_sessions_totalgaugeTeleport ProxyNumber of active sessions through this Proxy Service instance.
proxy_missing_ssh_tunnelsgaugeTeleport ProxyNumber of missing SSH tunnels. Used to debug if Nodes have discovered all Proxy Service instances.
remote_clustersgaugeTeleport ProxyNumber of inbound connections from leaf clusters.
teleport_connect_to_node_attempts_totalcounterTeleport ProxyNumber of SSH connection attempts to a node. Use with failed_connect_to_node_attempts_total to get the failure rate.
teleport_reverse_tunnels_connectedgaugeTeleport ProxyNumber of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances.
trusted_clustersgaugeTeleport ProxyNumber of outbound connections to leaf clusters.

Teleport Nodes

NameTypeComponentDescription
user_max_concurrent_sessions_hit_totalcounterTeleport NodeNumber of times a user exceeded their concurrent session limit.

All Teleport instances

NameTypeComponentDescription
certificate_mismatch_totalcounterTeleportNumber of SSH server login failures due to a certificate mismatch.
reversetunnel_connected_proxiesgaugeTeleportNumber of known proxies being sought.
rxcounterTeleportNumber of bytes received during an SSH connection.
server_interactive_sessions_totalgaugeTeleportNumber of active sessions.
teleport_build_infogaugeTeleportProvides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1.
teleport_cache_eventscounterTeleportNumber of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service.
teleport_cache_stale_eventscounterTeleportNumber of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
txcounterTeleportNumber of bytes transmitted during an SSH connection.

Golang runtime metrics

NameTypeComponentDescription
go_gc_duration_secondssummaryInternal GolangA summary of GC invocation durations.
go_goroutinesgaugeInternal GolangNumber of goroutines that currently exist.
go_infogaugeInternal GolangInformation about the Go environment.
go_memstats_alloc_bytes_totalcounterInternal GolangTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytesgaugeInternal GolangNumber of bytes allocated and still in use.
go_memstats_buck_hash_sys_bytesgaugeInternal GolangNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterInternal GolangTotal number of frees.
go_memstats_gc_cpu_fractiongaugeInternal GolangThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesgaugeInternal GolangNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeInternal GolangNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeInternal GolangNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeInternal GolangNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeInternal GolangNumber of allocated objects.
go_memstats_heap_released_bytesgaugeInternal GolangNumber of heap bytes released to the OS.
go_memstats_heap_sys_bytesgaugeInternal GolangNumber of heap bytes obtained from the system.
go_memstats_last_gc_time_secondsgaugeInternal GolangNumber of seconds since the Unix epoch of the last garbage collection.
go_memstats_lookups_totalcounterInternal GolangTotal number of pointer lookups.
go_memstats_mallocs_totalcounterInternal GolangTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeInternal GolangNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeInternal GolangNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeInternal GolangNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeInternal GolangNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeInternal GolangNumber of heap bytes when next the garbage collection will take place.
go_memstats_other_sys_bytesgaugeInternal GolangNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeInternal GolangNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeInternal GolangNumber of bytes obtained from the system for stack allocator.
go_memstats_sys_bytesgaugeInternal GolangNumber of bytes obtained from the system.
go_threadsgaugeInternal GolangNumber of OS threads created.
process_cpu_seconds_totalcounterInternal GolangTotal user and system CPU time spent in seconds.
process_max_fdsgaugeInternal GolangMaximum number of open file descriptors.
process_open_fdsgaugeInternal GolangNumber of open file descriptors.
process_resident_memory_bytesgaugeInternal GolangResident memory size in bytes.
process_start_time_secondsgaugeInternal GolangStart time of the process since the Unix epoch in seconds.
process_virtual_memory_bytesgaugeInternal GolangVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeInternal GolangMaximum amount of virtual memory available in bytes.

Prometheus

NameTypeComponentDescription
promhttp_metric_handler_requests_in_flightgaugeprometheusCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterprometheusTotal number of scrapes by HTTP status code.