
Teleport Cloud does not expose monitoring endpoints for the Auth Service and Proxy Service.
Teleport metrics are intended for performance monitoring. If you'd like to monitor Teleport usage, consider utilizing our Event Handler plugin to push Audit Events into your preferred logging aggregation system (Elastic, Splunk, Sumo Logic, etc).
- Audit Events and Records
- Forwarding events with Fluentd
- Monitor Teleport Audit Events with the Elastic Stack
The following metrics are available:
Auth Service and backends
Name | Type | Component | Description |
---|---|---|---|
audit_failed_disk_monitoring | counter | Teleport Audit Log | Number of times disk monitoring failed. |
audit_failed_emit_events | counter | Teleport Audit Log | Number of times emitting audit events failed. |
audit_percentage_disk_space_used | gauge | Teleport Audit Log | Percentage of disk space used. |
audit_server_open_files | gauge | Teleport Audit Log | Number of open audit files. |
auth_generate_requests_throttled_total | counter | Teleport Auth | Number of throttled requests to generate new server keys. |
auth_generate_requests_total | counter | Teleport Auth | Number of requests to generate new server keys. |
auth_generate_requests | gauge | Teleport Auth | Number of current generate requests. |
auth_generate_seconds | histogram | Teleport Auth | Latency for generate requests. |
backend_batch_read_requests_total | counter | cache | Number of read requests to the backend. |
backend_batch_read_seconds | histogram | cache | Latency for batch read operations. |
backend_batch_write_requests_total | counter | cache | Number of batch write requests to the backend. |
backend_batch_write_seconds | histogram | cache | Latency for backend batch write operations. |
backend_read_requests_total | counter | cache | Number of read requests to the backend. |
backend_read_seconds | histogram | cache | Latency for read operations. |
backend_requests | counter | cache | Number of write requests to the backend. |
backend_write_seconds | histogram | cache | Latency for backend write operations. |
cluster_name_not_found_total | counter | Teleport Auth | Number of times a cluster was not found. |
dynamo_requests_total | counter | DynamoDB | Total number of requests to the DynamoDB API. |
dynamo_requests | counter | DynamoDB | Total number of requests to the DynamoDB API grouped by result. |
dynamo_requests_seconds | histogram | DynamoDB | Latency of DynamoDB API requests. |
etcd_backend_batch_read_requests | counter | etcd | Number of read requests to the etcd database. |
etcd_backend_batch_read_seconds | histogram | etcd | Latency for etcd read operations. |
etcd_backend_read_requests | counter | etcd | Number of read requests to the etcd database. |
etcd_backend_read_seconds | histogram | etcd | Latency for etcd read operations. |
etcd_backend_tx_requests | counter | etcd | Number of transaction requests to the database. |
etcd_backend_tx_seconds | histogram | etcd | Latency for etcd transaction operations. |
etcd_backend_write_requests | counter | etcd | Number of write requests to the database. |
etcd_backend_write_seconds | histogram | etcd | Latency for etcd write operations. |
firestore_events_backend_batch_read_requests | counter | GCP Cloud Firestore | Number of batch read requests to Cloud Firestore events. |
firestore_events_backend_batch_read_seconds | histogram | GCP Cloud Firestore | Latency for Cloud Firestore events batch read operations. |
firestore_events_backend_batch_write_requests | counter | GCP Cloud Firestore | Number of batch write requests to Cloud Firestore events. |
firestore_events_backend_batch_write_seconds | histogram | GCP Cloud Firestore | Latency for Cloud Firestore events batch write operations. |
firestore_events_backend_write_requests | counter | GCP Cloud Firestore | Number of write requests to Cloud Firestore events. |
firestore_events_backend_write_seconds | histogram | GCP Cloud Firestore | Latency for Cloud Firestore events write operations. |
gcs_event_storage_downloads_seconds | histogram | GCP GCS | Latency for GCS download operations. |
gcs_event_storage_downloads | counter | GCP GCS | Number of downloads from the GCS backend. |
gcs_event_storage_uploads_seconds | histogram | GCP GCS | Latency for GCS upload operations. |
gcs_event_storage_uploads | counter | GCP GCS | Number of uploads to the GCS backend. |
grpc_server_started_total | counter | Teleport Auth | Total number of RPCs started on the server. |
grpc_server_handled_total | counter | Teleport Auth | Total number of RPCs completed on the server, regardless of success or failure. |
grpc_server_msg_received_total | counter | Teleport Auth | Total number of RPC stream messages received on the server. |
grpc_server_msg_sent_total | counter | Teleport Auth | Total number of gRPC stream messages sent by the server. |
heartbeat_connections_received_total | counter | Teleport Auth | Number of times the Auth Service received a heartbeat connection. |
s3_requests_total | counter | Amazon S3 | Total number of requests to the S3 API. |
s3_requests | counter | Amazon S3 | Total number of requests to the S3 API grouped by result. |
s3_requests_seconds | histogram | Amazon S3 | Request latency for the S3 API. |
teleport_audit_emit_events | counter | Teleport Audit Log | Number of audit events emitted. |
teleport_audit_parquetlog_batch_processing_seconds | histogram | Teleport Audit Log | Duration of processing single batch of events in the Parquet-format audit log. |
teleport_audit_parquetlog_s3_flush_seconds | histogram | Teleport Audit Log | Duration of flushing parquet files to S3 in Parquet-format audit log. |
teleport_audit_parquetlog_delete_events_seconds | histogram | Teleport Audit Log | Duration of deletion events from SQS in Parquet-format audit log. |
teleport_audit_parquetlog_batch_size | histogram | Teleport Audit Log | Overall size of events in single batch in Parquet-format audit log. |
teleport_audit_parquetlog_batch_count | counter | Teleport Audit Log | Total number of events in single batch in Parquet-format audit log. |
teleport_audit_parquetlog_last_processed_timestamp | gauge | Teleport Audit Log | Number of last processing time in Parquet-format audit log. |
teleport_audit_parquetlog_age_oldest_processed_message | gauge | Teleport Audit Log | Number of age of oldest event in Parquet-format audit log. |
teleport_audit_parquetlog_errors_from_collect_count | counter | Teleport Audit Log | Number of collect failures in Parquet-format audit log. |
teleport_connected_resources | gauge | Teleport Auth | Number and type of resources connected via keepalives. |
teleport_registered_servers | gauge | Teleport Auth | The number of Teleport services that are connected to an Auth Service instance grouped by version. |
user_login_total | counter | Teleport Auth | Number of user logins. |
teleport_migrations | gauge | Teleport Auth | Tracks for each migration if it is active (1) or not (0). |
watcher_event_sizes | histogram | cache | Overall size of events emitted. |
watcher_events | histogram | cache | Per resource size of events emitted. |
Enhanced Session Recording / BPF
Name | Type | Component | Description |
---|---|---|---|
bpf_lost_command_events | counter | BPF | Number of lost command events. |
bpf_lost_disk_events | counter | BPF | Number of lost disk events. |
bpf_lost_network_events | counter | BPF | Number of lost network events. |
Proxy Service
Name | Type | Component | Description |
---|---|---|---|
failed_connect_to_node_attempts_total | counter | Teleport Proxy | Number of failed SSH connection attempts to the SSH Service. Use with teleport_connect_to_node_attempts_total to get the failure rate. |
failed_login_attempts_total | counter | Teleport Proxy | Number of failed tsh login or tsh ssh logins. |
grpc_client_started_total | counter | Teleport Proxy | Total number of RPCs started on the client. |
grpc_client_handled_total | counter | Teleport Proxy | Total number of RPCs completed on the client, regardless of success or failure. |
grpc_client_msg_received_total | counter | Teleport Proxy | Total number of RPC stream messages received on the client. |
grpc_client_msg_sent_total | counter | Teleport Proxy | Total number of gRPC stream messages sent by the client. |
proxy_connection_limit_exceeded_total | counter | Teleport Proxy | Number of connections that exceeded the Proxy Service connection limit. |
proxy_peer_client_dial_error_total | counter | Teleport Proxy | Total number of errors encountered dialing peer Proxy Service instances. |
proxy_peer_server_connections | gauge | Teleport Proxy | Number of currently opened connection to proxy Proxy Service instances. |
proxy_peer_client_rpc | gauge | Teleport Proxy | Number of current client RPC requests. |
proxy_peer_client_rpc_total | counter | Teleport Proxy | Total number of client RPC requests. |
proxy_peer_client_rpc_duration_seconds | histogram | Teleport Proxy | Duration in seconds of RPCs sent by the client. |
proxy_peer_client_message_sent_size | histogram | Teleport Proxy | Size of messages sent by the client. |
proxy_peer_client_message_received_size | histogram | Teleport Proxy | Size of messages received by the client. |
proxy_peer_server_connections | gauge | Teleport Proxy | Number of currently opened connection to peer Proxy Service clients. |
proxy_peer_server_rpc | gauge | Teleport Proxy | Number of current server RPC requests. |
proxy_peer_server_rpc_total | counter | Teleport Proxy | Total number of server RPC requests. |
proxy_peer_server_rpc_duration_seconds | histogram | Teleport Proxy | Duration in seconds of RPCs sent by the server. |
proxy_peer_server_message_sent_size | histogram | Teleport Proxy | Size of messages sent by the server. |
proxy_peer_server_message_received_size | histogram | Teleport Proxy | Size of messages received by the server. |
proxy_ssh_sessions_total | gauge | Teleport Proxy | Number of active sessions through this Proxy Service instance. |
proxy_missing_ssh_tunnels | gauge | Teleport Proxy | Number of missing SSH tunnels. Used to debug if Teleport instances have discovered all Proxy Service instances. |
remote_clusters | gauge | Teleport Proxy | Number of inbound connections from leaf clusters. |
teleport_connect_to_node_attempts_total | counter | Teleport Proxy | Number of SSH connection attempts to a SSH Service. Use with failed_connect_to_node_attempts_total to get the failure rate. |
teleport_reverse_tunnels_connected | gauge | Teleport Proxy | Number of reverse SSH tunnels connected to the Teleport Proxy Service by Teleport instances. |
trusted_clusters | gauge | Teleport Proxy | Number of outbound connections to leaf clusters. |
Teleport SSH Service
Name | Type | Component | Description |
---|---|---|---|
user_max_concurrent_sessions_hit_total | counter | Teleport SSH | Number of times a user exceeded their concurrent session limit. |
All Teleport instances
Name | Type | Component | Description |
---|---|---|---|
certificate_mismatch_total | counter | Teleport | Number of SSH server login failures due to a certificate mismatch. |
reversetunnel_connected_proxies | gauge | Teleport | Number of known proxies being sought. |
rx | counter | Teleport | Number of bytes received during an SSH connection. |
server_interactive_sessions_total | gauge | Teleport | Number of active sessions. |
teleport_build_info | gauge | Teleport | Provides build information of Teleport including gitref (git describe --long --tags), Go version, and Teleport version. The value of this gauge will always be 1. |
teleport_cache_events | counter | Teleport | Number of events received by a Teleport service cache. Teleport's Auth Service, Proxy Service, and other services cache incoming events related to their service. |
teleport_cache_stale_events | counter | Teleport | Number of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend. |
tx | counter | Teleport | Number of bytes transmitted during an SSH connection. |
Golang runtime metrics
Name | Type | Component | Description |
---|---|---|---|
go_gc_duration_seconds | summary | Internal Golang | A summary of GC invocation durations. |
go_goroutines | gauge | Internal Golang | Number of goroutines that currently exist. |
go_info | gauge | Internal Golang | Information about the Go environment. |
go_memstats_alloc_bytes_total | counter | Internal Golang | Total number of bytes allocated, even if freed. |
go_memstats_alloc_bytes | gauge | Internal Golang | Number of bytes allocated and still in use. |
go_memstats_buck_hash_sys_bytes | gauge | Internal Golang | Number of bytes used by the profiling bucket hash table. |
go_memstats_frees_total | counter | Internal Golang | Total number of frees. |
go_memstats_gc_cpu_fraction | gauge | Internal Golang | The fraction of this program's available CPU time used by the GC since the program started. |
go_memstats_gc_sys_bytes | gauge | Internal Golang | Number of bytes used for garbage collection system metadata. |
go_memstats_heap_alloc_bytes | gauge | Internal Golang | Number of heap bytes allocated and still in use. |
go_memstats_heap_idle_bytes | gauge | Internal Golang | Number of heap bytes waiting to be used. |
go_memstats_heap_inuse_bytes | gauge | Internal Golang | Number of heap bytes that are in use. |
go_memstats_heap_objects | gauge | Internal Golang | Number of allocated objects. |
go_memstats_heap_released_bytes | gauge | Internal Golang | Number of heap bytes released to the OS. |
go_memstats_heap_sys_bytes | gauge | Internal Golang | Number of heap bytes obtained from the system. |
go_memstats_last_gc_time_seconds | gauge | Internal Golang | Number of seconds since the Unix epoch of the last garbage collection. |
go_memstats_lookups_total | counter | Internal Golang | Total number of pointer lookups. |
go_memstats_mallocs_total | counter | Internal Golang | Total number of mallocs. |
go_memstats_mcache_inuse_bytes | gauge | Internal Golang | Number of bytes in use by mcache structures. |
go_memstats_mcache_sys_bytes | gauge | Internal Golang | Number of bytes used for mcache structures obtained from system. |
go_memstats_mspan_inuse_bytes | gauge | Internal Golang | Number of bytes in use by mspan structures. |
go_memstats_mspan_sys_bytes | gauge | Internal Golang | Number of bytes used for mspan structures obtained from system. |
go_memstats_next_gc_bytes | gauge | Internal Golang | Number of heap bytes when next the garbage collection will take place. |
go_memstats_other_sys_bytes | gauge | Internal Golang | Number of bytes used for other system allocations. |
go_memstats_stack_inuse_bytes | gauge | Internal Golang | Number of bytes in use by the stack allocator. |
go_memstats_stack_sys_bytes | gauge | Internal Golang | Number of bytes obtained from the system for stack allocator. |
go_memstats_sys_bytes | gauge | Internal Golang | Number of bytes obtained from the system. |
go_threads | gauge | Internal Golang | Number of OS threads created. |
process_cpu_seconds_total | counter | Internal Golang | Total user and system CPU time spent in seconds. |
process_max_fds | gauge | Internal Golang | Maximum number of open file descriptors. |
process_open_fds | gauge | Internal Golang | Number of open file descriptors. |
process_resident_memory_bytes | gauge | Internal Golang | Resident memory size in bytes. |
process_start_time_seconds | gauge | Internal Golang | Start time of the process since the Unix epoch in seconds. |
process_virtual_memory_bytes | gauge | Internal Golang | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | gauge | Internal Golang | Maximum amount of virtual memory available in bytes. |
Prometheus
Name | Type | Component | Description |
---|---|---|---|
promhttp_metric_handler_requests_in_flight | gauge | prometheus | Current number of scrapes being served. |
promhttp_metric_handler_requests_total | counter | prometheus | Total number of scrapes by HTTP status code. |