Fork me on GitHub

Teleport

Teleport Metrics

Improve

Teleport Cloud does not expose monitoring endpoints for the Auth Service and Proxy Service.

Teleport metrics are intended for performance monitoring. If you'd like to monitor Teleport usage, consider utilizing our Event Handler plugin to push Audit Events into your preferred logging aggregation system (Elastic, Splunk, Sumo Logic, etc).

The following metrics are available:

Auth Service and backends

NameTypeComponentDescription
audit_failed_disk_monitoringcounterTeleport Audit LogNumber of times disk monitoring failed.
audit_failed_emit_eventscounterTeleport Audit LogNumber of times emitting audit events failed.
audit_percentage_disk_space_usedgaugeTeleport Audit LogPercentage of disk space used.
audit_server_open_filesgaugeTeleport Audit LogNumber of open audit files.
auth_generate_requests_throttled_totalcounterTeleport AuthNumber of throttled requests to generate new server keys.
auth_generate_requests_totalcounterTeleport AuthNumber of requests to generate new server keys.
auth_generate_requestsgaugeTeleport AuthNumber of current generate requests.
auth_generate_secondshistogramTeleport AuthLatency for generate requests.
backend_batch_read_requests_totalcountercacheNumber of read requests to the backend.
backend_batch_read_secondshistogramcacheLatency for batch read operations.
backend_batch_write_requests_totalcountercacheNumber of batch write requests to the backend.
backend_batch_write_secondshistogramcacheLatency for backend batch write operations.
backend_read_requests_totalcountercacheNumber of read requests to the backend.
backend_read_secondshistogramcacheLatency for read operations.
backend_requestscountercacheNumber of write requests to the backend.
backend_write_secondshistogramcacheLatency for backend write operations.
cluster_name_not_found_totalcounterTeleport AuthNumber of times a cluster was not found.
dynamo_requests_totalcounterDynamoDBTotal number of requests to the DynamoDB API.
dynamo_requestscounterDynamoDBTotal number of requests to the DynamoDB API grouped by result.
dynamo_requests_secondshistogramDynamoDBLatency of DynamoDB API requests.
etcd_backend_batch_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_batch_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_tx_requestscounteretcdNumber of transaction requests to the database.
etcd_backend_tx_secondshistogrametcdLatency for etcd transaction operations.
etcd_backend_write_requestscounteretcdNumber of write requests to the database.
etcd_backend_write_secondshistogrametcdLatency for etcd write operations.
firestore_events_backend_batch_read_requestscounterGCP Cloud FirestoreNumber of batch read requests to Cloud Firestore events.
firestore_events_backend_batch_read_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch read operations.
firestore_events_backend_batch_write_requestscounterGCP Cloud FirestoreNumber of batch write requests to Cloud Firestore events.
firestore_events_backend_batch_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch write operations.
firestore_events_backend_write_requestscounterGCP Cloud FirestoreNumber of write requests to Cloud Firestore events.
firestore_events_backend_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events write operations.
gcs_event_storage_downloads_secondshistogramGCP GCSLatency for GCS download operations.
gcs_event_storage_downloadscounterGCP GCSNumber of downloads from the GCS backend.
gcs_event_storage_uploads_secondshistogramGCP GCSLatency for GCS upload operations.
gcs_event_storage_uploadscounterGCP GCSNumber of uploads to the GCS backend.
grpc_server_started_totalcounterTeleport AuthTotal number of RPCs started on the server.
grpc_server_handled_totalcounterTeleport AuthTotal number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_totalcounterTeleport AuthTotal number of RPC stream messages received on the server.
grpc_server_msg_sent_totalcounterTeleport AuthTotal number of gRPC stream messages sent by the server.
heartbeat_connections_received_totalcounterTeleport AuthNumber of times the Auth Service received a heartbeat connection.
s3_requests_totalcounterAmazon S3Total number of requests to the S3 API.
s3_requestscounterAmazon S3Total number of requests to the S3 API grouped by result.
s3_requests_secondshistogramAmazon S3Request latency for the S3 API.
teleport_audit_emit_eventscounterTeleport Audit LogNumber of audit events emitted.
teleport_audit_parquetlog_batch_processing_secondshistogramTeleport Audit LogDuration of processing single batch of events in the Parquet-format audit log.
teleport_audit_parquetlog_s3_flush_secondshistogramTeleport Audit LogDuration of flushing parquet files to S3 in Parquet-format audit log.
teleport_audit_parquetlog_delete_events_secondshistogramTeleport Audit LogDuration of deletion events from SQS in Parquet-format audit log.
teleport_audit_parquetlog_batch_sizehistogramTeleport Audit LogOverall size of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_batch_countcounterTeleport Audit LogTotal number of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_last_processed_timestampgaugeTeleport Audit LogNumber of last processing time in Parquet-format audit log.
teleport_audit_parquetlog_age_oldest_processed_messagegaugeTeleport Audit LogNumber of age of oldest event in Parquet-format audit log.
teleport_audit_parquetlog_errors_from_collect_countcounterTeleport Audit LogNumber of collect failures in Parquet-format audit log.
teleport_connected_resourcesgaugeTeleport AuthNumber and type of resources connected via keepalives.
teleport_registered_serversgaugeTeleport AuthThe number of Teleport services that are connected to an Auth Service instance grouped by version.
user_login_totalcounterTeleport AuthNumber of user logins.
teleport_migrationsgaugeTeleport AuthTracks for each migration if it is active (1) or not (0).
watcher_event_sizeshistogramcacheOverall size of events emitted.
watcher_eventshistogramcachePer resource size of events emitted.

Enhanced Session Recording / BPF

NameTypeComponentDescription
bpf_lost_command_eventscounterBPFNumber of lost command events.
bpf_lost_disk_eventscounterBPFNumber of lost disk events.
bpf_lost_network_eventscounterBPFNumber of lost network events.

Proxy Service

NameTypeComponentDescription
failed_connect_to_node_attempts_totalcounterTeleport ProxyNumber of failed SSH connection attempts to the SSH Service. Use with teleport_connect_to_node_attempts_total to get the failure rate.
failed_login_attempts_totalcounterTeleport ProxyNumber of failed tsh login or tsh ssh logins.
grpc_client_started_totalcounterTeleport ProxyTotal number of RPCs started on the client.
grpc_client_handled_totalcounterTeleport ProxyTotal number of RPCs completed on the client, regardless of success or failure.
grpc_client_msg_received_totalcounterTeleport ProxyTotal number of RPC stream messages received on the client.
grpc_client_msg_sent_totalcounterTeleport ProxyTotal number of gRPC stream messages sent by the client.
proxy_connection_limit_exceeded_totalcounterTeleport ProxyNumber of connections that exceeded the Proxy Service connection limit.
proxy_peer_client_dial_error_totalcounterTeleport ProxyTotal number of errors encountered dialing peer Proxy Service instances.
proxy_peer_server_connectionsgaugeTeleport ProxyNumber of currently opened connection to proxy Proxy Service instances.
proxy_peer_client_rpcgaugeTeleport ProxyNumber of current client RPC requests.
proxy_peer_client_rpc_totalcounterTeleport ProxyTotal number of client RPC requests.
proxy_peer_client_rpc_duration_secondshistogramTeleport ProxyDuration in seconds of RPCs sent by the client.
proxy_peer_client_message_sent_sizehistogramTeleport ProxySize of messages sent by the client.
proxy_peer_client_message_received_sizehistogramTeleport ProxySize of messages received by the client.
proxy_peer_server_connectionsgaugeTeleport ProxyNumber of currently opened connection to peer Proxy Service clients.
proxy_peer_server_rpcgaugeTeleport ProxyNumber of current server RPC requests.
proxy_peer_server_rpc_totalcounterTeleport ProxyTotal number of server RPC requests.
proxy_peer_server_rpc_duration_secondshistogramTeleport ProxyDuration in seconds of RPCs sent by the server.
proxy_peer_server_message_sent_sizehistogramTeleport ProxySize of messages sent by the server.
proxy_peer_server_message_received_sizehistogramTeleport ProxySize of messages received by the server.
proxy_ssh_sessions_totalgaugeTeleport ProxyNumber of active sessions through this Proxy Service instance.
proxy_missing_ssh_tunnelsgaugeTeleport ProxyNumber of missing SSH tunnels. Used to debug if Teleport instances have discovered all Proxy Service instances.
remote_clustersgaugeTeleport ProxyNumber of inbound connections from leaf clusters.
teleport_connect_to_node_attempts_totalcounterTeleport ProxyNumber of SSH connection attempts to a SSH Service. Use with failed_connect_to_node_attempts_total to get the failure rate.
teleport_reverse_tunnels_connectedgaugeTeleport ProxyNumber of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances.
trusted_clustersgaugeTeleport ProxyNumber of outbound connections to leaf clusters.

Teleport SSH Service

NameTypeComponentDescription
user_max_concurrent_sessions_hit_totalcounterTeleport SSHNumber of times a user exceeded their concurrent session limit.

All Teleport instances

NameTypeComponentDescription
certificate_mismatch_totalcounterTeleportNumber of SSH server login failures due to a certificate mismatch.
reversetunnel_connected_proxiesgaugeTeleportNumber of known proxies being sought.
rxcounterTeleportNumber of bytes received during an SSH connection.
server_interactive_sessions_totalgaugeTeleportNumber of active sessions.
teleport_build_infogaugeTeleportProvides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1.
teleport_cache_eventscounterTeleportNumber of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service.
teleport_cache_stale_eventscounterTeleportNumber of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
txcounterTeleportNumber of bytes transmitted during an SSH connection.

Golang runtime metrics

NameTypeComponentDescription
go_gc_duration_secondssummaryInternal GolangA summary of GC invocation durations.
go_goroutinesgaugeInternal GolangNumber of goroutines that currently exist.
go_infogaugeInternal GolangInformation about the Go environment.
go_memstats_alloc_bytes_totalcounterInternal GolangTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytesgaugeInternal GolangNumber of bytes allocated and still in use.
go_memstats_buck_hash_sys_bytesgaugeInternal GolangNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterInternal GolangTotal number of frees.
go_memstats_gc_cpu_fractiongaugeInternal GolangThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesgaugeInternal GolangNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeInternal GolangNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeInternal GolangNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeInternal GolangNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeInternal GolangNumber of allocated objects.
go_memstats_heap_released_bytesgaugeInternal GolangNumber of heap bytes released to the OS.
go_memstats_heap_sys_bytesgaugeInternal GolangNumber of heap bytes obtained from the system.
go_memstats_last_gc_time_secondsgaugeInternal GolangNumber of seconds since the Unix epoch of the last garbage collection.
go_memstats_lookups_totalcounterInternal GolangTotal number of pointer lookups.
go_memstats_mallocs_totalcounterInternal GolangTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeInternal GolangNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeInternal GolangNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeInternal GolangNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeInternal GolangNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeInternal GolangNumber of heap bytes when next the garbage collection will take place.
go_memstats_other_sys_bytesgaugeInternal GolangNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeInternal GolangNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeInternal GolangNumber of bytes obtained from the system for stack allocator.
go_memstats_sys_bytesgaugeInternal GolangNumber of bytes obtained from the system.
go_threadsgaugeInternal GolangNumber of OS threads created.
process_cpu_seconds_totalcounterInternal GolangTotal user and system CPU time spent in seconds.
process_max_fdsgaugeInternal GolangMaximum number of open file descriptors.
process_open_fdsgaugeInternal GolangNumber of open file descriptors.
process_resident_memory_bytesgaugeInternal GolangResident memory size in bytes.
process_start_time_secondsgaugeInternal GolangStart time of the process since the Unix epoch in seconds.
process_virtual_memory_bytesgaugeInternal GolangVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeInternal GolangMaximum amount of virtual memory available in bytes.

Prometheus

NameTypeComponentDescription
promhttp_metric_handler_requests_in_flightgaugeprometheusCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterprometheusTotal number of scrapes by HTTP status code.