Fork me on GitHub

Teleport

Scaling

  • Available for:
  • OpenSource
  • Enterprise

This section explains the recommended configuration settings for large-scale self-hosted deployments of Teleport.

Teleport Team takes care of this setup for you so you can provide secure access to your infrastructure right away.

Get started with a free trial of Teleport Team.

Prerequisites

  • Teleport v14.0.1 Open Source or Enterprise.

Hardware recommendations

Set up Teleport with a High Availability configuration.

ScenarioMax Recommended CountProxyAuth ServerAWS Instance Types
Teleport SSH Nodes connected to Auth Service10,0002x 4 vCPUs, 8GB RAM2x 8 vCPUs, 16GB RAMm4.2xlarge
Teleport SSH Nodes connected to Auth Service50,0002x 4 vCPUs, 16GB RAM2x 8 vCPUs, 16GB RAMm4.2xlarge
Teleport SSH Nodes connected to Proxy Service through reverse tunnels10,0002x 4 vCPUs, 8GB RAM2x 8 vCPUs, 16+GB RAMm4.2xlarge

Auth and Proxy Configuration

Upgrade Teleport's connection limits from the default connection limit of 15000 to 65000.

# Teleport Auth and Proxy
teleport:
  connection_limits:
    max_connections: 65000
    max_users: 1000

Agent configuration

Agents cache roles and other configuration locally in order to make access-control decisions quickly. By default agents are fairly aggressive in trying to re-initialize their caches if they lose connectivity to the Auth Service. In very large clusters, this can contribute to a "thundering herd" effect, where control plane elements experience excess load immediately after restart. Setting the max_backoff parameter to something in the 8-16 minute range can help mitigate this effect:

teleport:
  cache:
    enabled: yes
    max_backoff: 12m

Kernel parameters

Tweak Teleport's systemd unit parameters to allow a higher amount of open files:

[Service]
LimitNOFILE=65536

Verify that Teleport's process has high enough file limits:

cat /proc/$(pidof teleport)/limits

Limit Soft Limit Hard Limit Units

Max open files 65536 65536 files

DynamoDB configuration

When using Teleport with DynamoDB, we recommend using on-demand provisioning. This allow DynamoDB to scale with cluster load.

For customers that can not use on-demand provisioning, we recommend at least 250 WCU and 100 RCU for 10k clusters.

etcd

When using Teleport with etcd, we recommend you do the following.

  • For performance, use the fastest SSDs available and ensure low-latency network connectivity between etcd peers. See the etcd Hardware recommendations guide for more details.
  • For debugging, ingest etcd's Prometheus metrics and visualize them over time using a dashboard. See the etcd Metrics guide for more details.

During an incident, we may ask you to run etcdctl, test that you can run the following command successfully.

etcdctl \ --write-out=table \ --cacert=/path/to/ca.cert \ --cert=/path/to/cert \ --key=/path/to/key.pem \ --endpoints=127.0.0.1:2379 \ endpoint status