Scaling Privileged Access for Modern Infrastructure: Real-World Insights
Apr 25
Virtual
Register Today
Teleport logoTry For Free
Fork me on GitHub

Teleport

Teleport Metrics

Teleport Cloud does not expose monitoring endpoints for the Auth Service and Proxy Service.

Teleport metrics are intended for performance monitoring. If you'd like to monitor Teleport usage, consider utilizing our Event Handler plugin to push Audit Events into your preferred logging aggregation system (Elastic, Splunk, Sumo Logic, etc).

The following metrics are available:

Auth Service and backends

NameTypeComponentDescription
audit_failed_disk_monitoringcounterTeleport Audit LogNumber of times disk monitoring failed.
audit_failed_emit_eventscounterTeleport Audit LogNumber of times emitting audit events failed.
audit_percentage_disk_space_usedgaugeTeleport Audit LogPercentage of disk space used.
audit_server_open_filesgaugeTeleport Audit LogNumber of open audit files.
auth_generate_requests_throttled_totalcounterTeleport AuthNumber of throttled requests to generate new server keys.
auth_generate_requests_totalcounterTeleport AuthNumber of requests to generate new server keys.
auth_generate_requestsgaugeTeleport AuthNumber of current generate requests.
auth_generate_secondshistogramTeleport AuthLatency for generate requests.
backend_batch_read_requests_totalcountercacheNumber of read requests to the backend.
backend_batch_read_secondshistogramcacheLatency for batch read operations.
backend_batch_write_requests_totalcountercacheNumber of batch write requests to the backend.
backend_batch_write_secondshistogramcacheLatency for backend batch write operations.
backend_read_requests_totalcountercacheNumber of read requests to the backend.
backend_read_secondshistogramcacheLatency for read operations.
backend_requestscountercacheNumber of requests to the backend (reads, writes, and keepalives).
backend_write_requests_totalcountercacheNumber of write requests to the backend.
backend_write_secondshistogramcacheLatency for backend write operations.
cluster_name_not_found_totalcounterTeleport AuthNumber of times a cluster was not found.
dynamo_requests_totalcounterDynamoDBTotal number of requests to the DynamoDB API.
dynamo_requestscounterDynamoDBTotal number of requests to the DynamoDB API grouped by result.
dynamo_requests_secondshistogramDynamoDBLatency of DynamoDB API requests.
etcd_backend_batch_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_batch_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_read_requestscounteretcdNumber of read requests to the etcd database.
etcd_backend_read_secondshistogrametcdLatency for etcd read operations.
etcd_backend_tx_requestscounteretcdNumber of transaction requests to the database.
etcd_backend_tx_secondshistogrametcdLatency for etcd transaction operations.
etcd_backend_write_requestscounteretcdNumber of write requests to the database.
etcd_backend_write_secondshistogrametcdLatency for etcd write operations.
teleport_etcd_eventscounteretcdTotal number of etcd events processed.
teleport_etcd_event_backpressurecounteretcdTotal number of times event processing encountered backpressure.
firestore_events_backend_batch_read_requestscounterGCP Cloud FirestoreNumber of batch read requests to Cloud Firestore events.
firestore_events_backend_batch_read_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch read operations.
firestore_events_backend_batch_write_requestscounterGCP Cloud FirestoreNumber of batch write requests to Cloud Firestore events.
firestore_events_backend_batch_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events batch write operations.
firestore_events_backend_write_requestscounterGCP Cloud FirestoreNumber of write requests to Cloud Firestore events.
firestore_events_backend_write_secondshistogramGCP Cloud FirestoreLatency for Cloud Firestore events write operations.
gcs_event_storage_downloads_secondshistogramGCP GCSLatency for GCS download operations.
gcs_event_storage_downloadscounterGCP GCSNumber of downloads from the GCS backend.
gcs_event_storage_uploads_secondshistogramGCP GCSLatency for GCS upload operations.
gcs_event_storage_uploadscounterGCP GCSNumber of uploads to the GCS backend.
grpc_server_started_totalcounterTeleport AuthTotal number of RPCs started on the server.
grpc_server_handled_totalcounterTeleport AuthTotal number of RPCs completed on the server, regardless of success or failure.
grpc_server_msg_received_totalcounterTeleport AuthTotal number of RPC stream messages received on the server.
grpc_server_msg_sent_totalcounterTeleport AuthTotal number of gRPC stream messages sent by the server.
heartbeat_connections_received_totalcounterTeleport AuthNumber of times the Auth Service received a heartbeat connection.
s3_requests_totalcounterAmazon S3Total number of requests to the S3 API.
s3_requestscounterAmazon S3Total number of requests to the S3 API grouped by result.
s3_requests_secondshistogramAmazon S3Request latency for the S3 API.
teleport_audit_emit_eventscounterTeleport Audit LogNumber of audit events emitted.
teleport_audit_parquetlog_batch_processing_secondshistogramTeleport Audit LogDuration of processing single batch of events in the Parquet-format audit log.
teleport_audit_parquetlog_s3_flush_secondshistogramTeleport Audit LogDuration of flushing parquet files to S3 in Parquet-format audit log.
teleport_audit_parquetlog_delete_events_secondshistogramTeleport Audit LogDuration of deletion events from SQS in Parquet-format audit log.
teleport_audit_parquetlog_batch_sizehistogramTeleport Audit LogOverall size of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_batch_countcounterTeleport Audit LogTotal number of events in single batch in Parquet-format audit log.
teleport_audit_parquetlog_last_processed_timestampgaugeTeleport Audit LogNumber of last processing time in Parquet-format audit log.
teleport_audit_parquetlog_age_oldest_processed_messagegaugeTeleport Audit LogNumber of age of oldest event in Parquet-format audit log.
teleport_audit_parquetlog_errors_from_collect_countcounterTeleport Audit LogNumber of collect failures in Parquet-format audit log.
teleport_connected_resourcesgaugeTeleport AuthNumber and type of resources connected via keepalives.
teleport_registered_serversgaugeTeleport AuthThe number of Teleport services that are connected to an Auth Service instance grouped by version.
teleport_registered_servers_by_install_methodsgaugeTeleport AuthThe number of Teleport services that are connected to an Auth Service instance grouped by install methods.
user_login_totalcounterTeleport AuthNumber of user logins.
teleport_migrationsgaugeTeleport AuthTracks for each migration if it is active (1) or not (0).
watcher_event_sizeshistogramcacheOverall size of events emitted.
watcher_eventshistogramcachePer resource size of events emitted.

Enhanced Session Recording / BPF

NameTypeComponentDescription
bpf_lost_command_eventscounterBPFNumber of lost command events.
bpf_lost_disk_eventscounterBPFNumber of lost disk events.
bpf_lost_network_eventscounterBPFNumber of lost network events.

Proxy Service

NameTypeComponentDescription
failed_connect_to_node_attempts_totalcounterTeleport ProxyNumber of failed SSH connection attempts to the SSH Service. Use with teleport_connect_to_node_attempts_total to get the failure rate.
failed_login_attempts_totalcounterTeleport ProxyNumber of failed tsh login or tsh ssh logins.
grpc_client_started_totalcounterTeleport ProxyTotal number of RPCs started on the client.
grpc_client_handled_totalcounterTeleport ProxyTotal number of RPCs completed on the client, regardless of success or failure.
grpc_client_msg_received_totalcounterTeleport ProxyTotal number of RPC stream messages received on the client.
grpc_client_msg_sent_totalcounterTeleport ProxyTotal number of gRPC stream messages sent by the client.
proxy_connection_limit_exceeded_totalcounterTeleport ProxyNumber of connections that exceeded the Proxy Service connection limit.
proxy_peer_client_dial_error_totalcounterTeleport ProxyTotal number of errors encountered dialing peer Proxy Service instances.
proxy_peer_server_connectionsgaugeTeleport ProxyNumber of currently opened connection to proxy Proxy Service instances.
proxy_peer_client_rpcgaugeTeleport ProxyNumber of current client RPC requests.
proxy_peer_client_rpc_totalcounterTeleport ProxyTotal number of client RPC requests.
proxy_peer_client_rpc_duration_secondshistogramTeleport ProxyDuration in seconds of RPCs sent by the client.
proxy_peer_client_message_sent_sizehistogramTeleport ProxySize of messages sent by the client.
proxy_peer_client_message_received_sizehistogramTeleport ProxySize of messages received by the client.
proxy_peer_server_connectionsgaugeTeleport ProxyNumber of currently opened connection to peer Proxy Service clients.
proxy_peer_server_rpcgaugeTeleport ProxyNumber of current server RPC requests.
proxy_peer_server_rpc_totalcounterTeleport ProxyTotal number of server RPC requests.
proxy_peer_server_rpc_duration_secondshistogramTeleport ProxyDuration in seconds of RPCs sent by the server.
proxy_peer_server_message_sent_sizehistogramTeleport ProxySize of messages sent by the server.
proxy_peer_server_message_received_sizehistogramTeleport ProxySize of messages received by the server.
proxy_ssh_sessions_totalgaugeTeleport ProxyNumber of active sessions through this Proxy Service instance.
proxy_missing_ssh_tunnelsgaugeTeleport ProxyNumber of missing SSH tunnels. Used to debug if Teleport instances have discovered all Proxy Service instances.
remote_clustersgaugeTeleport ProxyNumber of inbound connections from leaf clusters.
teleport_connect_to_node_attempts_totalcounterTeleport ProxyNumber of SSH connection attempts to a SSH Service. Use with failed_connect_to_node_attempts_total to get the failure rate.
teleport_reverse_tunnels_connectedgaugeTeleport ProxyNumber of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances.
trusted_clustersgaugeTeleport ProxyNumber of outbound connections to leaf clusters.
teleport_proxy_db_connection_setup_time_secondshistogramTeleport ProxyTime to establish connection to DB service from Proxy service.
teleport_proxy_db_connection_dial_attempts_totalcounterTeleport ProxyNumber of dial attempts from Proxy to DB service made.
teleport_proxy_db_connection_dial_failures_totalcounterTeleport ProxyNumber of failed dial attempts from Proxy to DB service made.
teleport_proxy_db_attempted_servers_totalhistogramTeleport ProxyNumber of servers processed during connection attempt to the DB service from Proxy service.
teleport_proxy_db_connection_tls_config_time_secondshistogramTeleport ProxyTime to fetch TLS configuration for the connection to DB service from Proxy service.
teleport_proxy_db_active_connections_totalgaugeTeleport ProxyNumber of currently active connections to DB service from Proxy service.

Database Service

NameTypeComponentDescription
teleport_db_messages_from_client_totalcounterTeleport Database ServiceNumber of messages (packets) received from the DB client.
teleport_db_messages_from_server_totalcounterTeleport Database ServiceNumber of messages (packets) received from the DB server.
teleport_db_method_call_count_totalcounterTeleport Database ServiceNumber of times a DB method was called.
teleport_db_method_call_latency_secondshistogramTeleport Database ServiceCall latency for a DB method calls.
teleport_db_initialized_connections_totalcounterTeleport Database ServiceNumber of initialized DB connections.
teleport_db_active_connections_totalgaugeTeleport Database ServiceNumber of active DB connections.
teleport_db_connection_durations_secondshistogramTeleport Database ServiceDuration of DB connection.
teleport_db_connection_setup_time_secondshistogramTeleport Database ServiceInitial time to setup DB connection, before any requests are handled.
teleport_db_errors_totalcounterTeleport Database ServiceNumber of synthetic DB errors sent to the client.

Kubernetes Access

The following tables identify all metrics available in the proxy service if Kubernetes access is enabled.

Client

The following table identifies all metrics available when the service connects to upstream servers. In the case of proxy, the upstream server can be a kubernetes_service or Kubernetes Cluster if it's running in legacy mode.

NameTypeComponentDescription
teleport_kubernetes_client_in_flight_requestsgaugeTeleport Kubernetes ProxyIn-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_totalcounterTeleport Kubernetes ProxyTotal number of requests sent to the upstream Teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_secondshistogramTeleport Kubernetes ProxyLatency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_secondshistogramTeleport Kubernetes ProxyLatency distribution of time to dial to the upstream server - using reverse tunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_secondshistogramTeleport Kubernetes ProxyLatency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_secondshistogramTeleport Kubernetes ProxyLatency distribution of the upstream request time.

Server

The following table identifies all metrics available for incoming connections.

NameTypeComponentDescription
teleport_kubernetes_server_in_flight_requestsgaugeTeleport Kubernetes ProxyIn-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_totalcounterTeleport Kubernetes ProxyTotal number of requests handled by the server.
teleport_kubernetes_server_request_duration_secondshistogramTeleport Kubernetes ProxyLatency distribution of the total request time.
teleport_kubernetes_server_response_size_byteshistogramTeleport Kubernetes ProxyDistribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessionsgaugeTeleport Kubernetes ProxyNumber of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_totalcounterTeleport Kubernetes ProxyTotal number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessionsgaugeTeleport Kubernetes ProxyNumber of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_totalcounterTeleport Kubernetes ProxyNumber of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessionsgaugeTeleport Kubernetes ProxyNumber of active joining sessions,
teleport_kubernetes_server_join_sessions_totalcounterTeleport Kubernetes ProxyTotal number of joining sessions.

Teleport SSH Service

NameTypeComponentDescription
user_max_concurrent_sessions_hit_totalcounterTeleport SSHNumber of times a user exceeded their concurrent session limit.

Teleport Kubernetes Service

The following table identifies all metrics available when the service connects to upstream servers. In the case of kubernetes_service, the upstream server is always a Kubernetes cluster.

NameTypeComponentDescription
teleport_kubernetes_client_in_flight_requestsgaugeTeleport Kubernetes ServiceIn-flight requests waiting for the upstream response.
teleport_kubernetes_client_requests_totalcounterTeleport Kubernetes ServiceTotal number of requests sent to the upstream teleport proxy, kube_service or Kubernetes Cluster servers.
teleport_kubernetes_client_tls_duration_secondshistogramTeleport Kubernetes ServiceLatency distribution of TLS handshakes.
teleport_kubernetes_client_got_conn_duration_secondshistogramTeleport Kubernetes ServiceLatency distribution of time to dial to the upstream server - using reversetunnel or direct dialer.
teleport_kubernetes_client_first_byte_response_duration_secondshistogramTeleport Kubernetes ServiceLatency distribution of time to receive the first response byte from the upstream server.
teleport_kubernetes_client_request_duration_secondshistogramTeleport Kubernetes ServiceLatency distribution of the upstream request time.

The following table identifies all metrics available for incoming connections.

NameTypeComponentDescription
teleport_kubernetes_server_in_flight_requestsgaugeTeleport Kubernetes ServiceIn-flight requests currently handled by the server.
teleport_kubernetes_server_api_requests_totalcounterTeleport Kubernetes ServiceTotal number of requests handled by the server.
teleport_kubernetes_server_request_duration_secondshistogramTeleport Kubernetes ServiceLatency distribution of the total request time.
teleport_kubernetes_server_response_size_byteshistogramTeleport Kubernetes ServiceDistribution of the response size.
teleport_kubernetes_server_exec_in_flight_sessionsgaugeTeleport Kubernetes ServiceNumber of active kubectl exec sessions.
teleport_kubernetes_server_exec_sessions_totalcounterTeleport Kubernetes ServiceTotal number of kubectl exec sessions.
teleport_kubernetes_server_portforward_in_flight_sessionsgaugeTeleport Kubernetes ServiceNumber of active kubectl portforward sessions.
teleport_kubernetes_server_portforward_sessions_totalcounterTeleport Kubernetes ServiceNumber of active kubectl portforward sessions.
teleport_kubernetes_server_join_in_flight_sessionsgaugeTeleport Kubernetes ServiceNumber of active joining sessions,
teleport_kubernetes_server_join_sessions_totalcounterTeleport Kubernetes ServiceTotal number of joining sessions.

All Teleport instances

NameTypeComponentDescription
process_stategaugeTeleportState of the teleport process: 0 - ok, 1 - recovering, 2 - degraded, 3 - starting.
certificate_mismatch_totalcounterTeleportNumber of SSH server login failures due to a certificate mismatch.
rxcounterTeleportNumber of bytes received during an SSH connection.
server_interactive_sessions_totalgaugeTeleportNumber of active sessions.
teleport_build_infogaugeTeleportProvides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1.
teleport_cache_eventscounterTeleportNumber of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service.
teleport_cache_stale_eventscounterTeleportNumber of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
txcounterTeleportNumber of bytes transmitted during an SSH connection.

Go runtime metrics

These metrics are surfaced by the Go runtime and are not specific to Teleport.

NameTypeComponentDescription
go_gc_duration_secondssummaryInternal GoA summary of GC invocation durations.
go_goroutinesgaugeInternal GoNumber of goroutines that currently exist.
go_infogaugeInternal GoInformation about the Go environment.
go_memstats_alloc_bytes_totalcounterInternal GoTotal number of bytes allocated, even if freed.
go_memstats_alloc_bytesgaugeInternal GoNumber of bytes allocated and still in use.
go_memstats_buck_hash_sys_bytesgaugeInternal GoNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterInternal GoTotal number of frees.
go_memstats_gc_cpu_fractiongaugeInternal GoThe fraction of this program's available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesgaugeInternal GoNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeInternal GoNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeInternal GoNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeInternal GoNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeInternal GoNumber of allocated objects.
go_memstats_heap_released_bytesgaugeInternal GoNumber of heap bytes released to the OS.
go_memstats_heap_sys_bytesgaugeInternal GoNumber of heap bytes obtained from the system.
go_memstats_last_gc_time_secondsgaugeInternal GoNumber of seconds since the Unix epoch of the last garbage collection.
go_memstats_lookups_totalcounterInternal GoTotal number of pointer lookups.
go_memstats_mallocs_totalcounterInternal GoTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeInternal GoNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeInternal GoNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeInternal GoNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeInternal GoNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeInternal GoNumber of heap bytes when next the garbage collection will take place.
go_memstats_other_sys_bytesgaugeInternal GoNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeInternal GoNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeInternal GoNumber of bytes obtained from the system for stack allocator.
go_memstats_sys_bytesgaugeInternal GoNumber of bytes obtained from the system.
go_threadsgaugeInternal GoNumber of OS threads created.
process_cpu_seconds_totalcounterInternal GoTotal user and system CPU time spent in seconds.
process_max_fdsgaugeInternal GoMaximum number of open file descriptors.
process_open_fdsgaugeInternal GoNumber of open file descriptors.
process_resident_memory_bytesgaugeInternal GoResident memory size in bytes.
process_start_time_secondsgaugeInternal GoStart time of the process since the Unix epoch in seconds.
process_virtual_memory_bytesgaugeInternal GoVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeInternal GoMaximum amount of virtual memory available in bytes.

Prometheus

NameTypeComponentDescription
promhttp_metric_handler_requests_in_flightgaugeprometheusCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterprometheusTotal number of scrapes by HTTP status code.