Fork me on GitHub

Teleport

Deploying a High Availability Teleport Cluster

Improve

When deploying Teleport in production, you should design your deployment to ensure that users can continue to access infrastructure during an incident in your Teleport cluster. You should also make it possible to scale your Auth Service and Proxy Service as you register more users and resources with Teleport.

In this guide, we will explain the components of a high-availability Teleport deployment.

Teleport Cloud takes care of this setup for you so you can provide secure access to your infrastructure right away.

Get started with a free trial of Teleport Cloud.

Overview

A high-availability Teleport cluster revolves around a group of redundant teleport processes, each of which runs the Auth Service and Proxy Service, plus the infrastructure required to support them.

This includes:

  • A Layer 4 load balancer to direct traffic from users and services to an available teleport process.
  • A cluster state backend. This is a key-value store for cluster state and audit events that all Auth Service instances can access. This requires permissions for Auth Service instances to manage records within the key-value store.
  • A session recording backend. This is an object storage service where the Auth Service uploads session recordings. The session recording backend requires permissions for teleport instances to manage objects within the storage service.
  • A TLS credential provisioning system. You need a way to obtain TLS credentials from a certificate authority like Let's Encrypt (or an internal public key infrastructure), provision teleport instances with them, and renew them periodically.
  • A DNS service you can use to create records for the Teleport Proxy Service. If you are using Let's Encrypt for TLS credentials, the TLS credential provisioner will need to manage DNS records to demonstrate control over your domain name.

Diagram of a high-availability Teleport
architecture

Layer 4 load balancer

The load balancer forwards traffic from users and services to an available Teleport instance. This must not terminate TLS, and must transparently forward the TCP traffic it receives. In other words, this must be a Layer 4 load balancer, not a Layer 7 (e.g., HTTP) load balancer.

We recommend configuring your load balancer to route traffic across multiple zones (if using a cloud provider) or data centers (if using an on-premise solution) to ensure availability.

TLS Routing

Your load balancer configuration depends on whether you will enable TLS Routing in your Teleport cluster.

With TLS Routing, the Teleport Proxy Service uses application-layer protocol negotiation (ALPN) to handle all communication with users and services via the HTTPS port, regardless of protocol. Without TLS Routing, the Proxy Service listens on separate ports for each protocol.

The advantage of TLS Routing is its simplicity: you can expose only a single port on the load balancer to the public internet.

The disadvantage of TLS Routing is that it is impossible to implement Layer 7 load balancing for HTTPS traffic, since traffic that reaches the HTTPS port can use any supported protocol.

The approach we describe in this guide uses only a Layer 4 load balancer to minimize the infrastructure you will deploy, but users that require a separate load balancer for HTTPS traffic should disable TLS Routing.

Configuring the load balancer

Configure the load balancer to forward traffic from the following ports on the load balancer to the corresponding port on an available Teleport instance. The configuration depends on whether you will enable TLS Routing:

PortDescription
443ALPN port for TLS Routing.

These ports are required:

PortDescription
3023SSH port for clients connect to.
3024SSH port used to create reverse SSH tunnels from behind-firewall environments.
443HTTPS connections to authenticate tsh users into the cluster. The same connection is used to serve a Web UI.

You can leave these ports closed if you are not using their corresponding services:

PortDescription
3026HTTPS Kubernetes proxy
3036MySQL port
5432Postgres port

Cluster state backend

The Teleport Auth Service stores cluster state (such as dynamic configuration resources) and audit events as key/value pairs. In high-availability deployments, you must configure the Auth Service to manage this data in a key-value store that runs outside of your cluster of Teleport instances.

The Auth Service supports the following backends for cluster state and audit events:

  • Amazon DynamoDB
  • Google Cloud Firestore
  • etcd (cluster state only)

For Amazon DynamoDB and Google Cloud Firestore, your Teleport configuration (which we will describe in more detail in the Configuration section) names a table or collection where Teleport stores cluster state and audit events.

The Teleport Auth Service manages the creation of any required DynamoDB tables or Firestore collections itself, and does not require them to exist in advance.

The Auth Service can also store cluster state in self-hosted etcd deployments. In this case, Teleport uses namespaces within item keys to identify cluster state data.

Required permissions

In your cloud provider's RBAC solution (e.g., AWS or Google Cloud IAM), your Auth Service instances need permissions to read from and write to your chosen key/value store, as well as to create tables and collections (if your key/value store supports them).

Session recording backend

High-availability Teleport deployments use an object storage service for persisting session recordings. The Teleport Auth Service supports two object storage services:

  • Amazon S3 (or an S3-compatible object store)
  • Google Cloud Storage

In your Teleport configuration (described in the Configuration section), you must name a bucket within Google Cloud Storage or an Amazon S3-compatible service to use for managing session recordings. The Teleport Auth Service creates this bucket, so to prevent unexpected behavior, you should not create it in advance.

Required permissions

In your cloud provider's RBAC solution, your Auth Service instances need permissions to get buckets as well as to create, get, list, and update objects. Since this setup lets Teleport create buckets for you, you should also assign Auth Service instances permissions to create buckets

In Google Cloud Storage, Auth Service instances also need permissions to delete objects.

TLS credential provisioning

High-availability Teleport deployments require a system to fetch TLS credentials from a certificate authority like Let's Encrypt, AWS Certificate Manager, Digicert, or a trusted internal authority. The system must then provision Teleport Proxy Service instances with these credentials and renew them periodically.

If you are running a single instance of the Teleport Auth Service and Proxy Service, you can configure this instance to fetch credentials for itself from Let's Encrypt using the ACME ALPN-01 challenge, where Teleport demonstrates that it controls the ALPN server at the HTTPS address of your Teleport Proxy Service. Teleport also fetches a separate certificate for each application you have registered with Teleport, e.g., grafana.teleport.example.com.

For high-availability deployments that use Let's Encrypt to supply TLS credentials to Teleport instances running behind a load balancer, you will need to use the ACME DNS-01 challenge to demonstrate domain name ownership to Let's Encrypt. In this challenge, your TLS credential provisioning system creates a DNS TXT record with a value expected by Let's Encrypt.

In the configuration we are demonstrating in this guide, each Teleport Proxy Service instance expects TLS credentials for HTTPS to be available at the file paths /etc/teleport-tls/tls.key (private key) and /etc/teleport-tls/tls.crt (certificate).

DNS service

Set up a DNS zone where you can create records for Teleport, e.g., an Amazon Route 53 hosted zone or Google Cloud DNS zone.

Teleport Proxy Service records

Users and services must be able to reach the Teleport Proxy Service in order to connect to your Teleport cluster. Since a high availability setup runs Teleport instances behind a load balancer, you must create a DNS record that points to the load balancer.

Depending on how your infrastructure's DNS is organized, this will be one of the following, assuming your domain is example.com:

Record TypeDomain NameValue
Ateleport.example.comThe IP address of your load balancer
CNAMEteleport.example.comThe domain name of your load balancer

Registering applications with Teleport

Teleport assigns a subdomain to each application you have connected to Teleport (e.g., grafana.teleport.example.com), so you will need to ensure that a DNS record exists for each application-specific subdomain so clients can access your applications via Teleport.

You should create either a separate DNS record for each subdomain or a single record with a wildcard subdomain such as *.teleport.example.com.

Create one of the following wildcard DNS records so you can register any application with Teleport:

Record TypeDomain NameValue
A*.teleport.example.comThe IP address of your load balancer
CNAME*.teleport.example.comThe domain name of your load balancer
Required permissions

If you are using Let's Encrypt to provide TLS credentials to your Teleport instances, the TLS credential system we mentioned earlier needs permissions to manage DNS records in order to satisfy Let's Encrypt's DNS-01 challenge.

If you are using cloud-managed solutions, you should use your cloud provider's RBAC system (e.g., AWS IAM) to grant a role to the Proxy Service to manage DNS records.

Teleport instances

Run the Teleport Auth Service and Proxy Service as a scalable group of compute resources, for example, a Kubernetes Deployment or AWS Auto Scaling group. This requires running the teleport binary on each Kubernetes pod or virtual machine or in your group.

You should deploy your Teleport instances across multiple zones (if using a cloud provider) or data centers (if using an on-premise solution) to ensure availability.

In the Configuration section, we will show you how to configure each binary for high availability.

Open ports

Ensure that, on each Teleport instance, the following ports allow traffic from the load balancer. The Proxy Service uses these ports to communicate with Teleport users and services.

As with your load balancer configuration, the ports you should open on your Teleport instances depend on whether you will enable TLS Routing:

PortDescription
443ALPN port for TLS Routing.

These ports are required:

PortDescription
3023SSH port for clients connect to.
3024SSH port used to create reverse SSH tunnels from behind-firewall environments.
443HTTPS connections to authenticate tsh users into the cluster. The same connection is used to serve a Web UI.

You can leave these ports closed if you are not using their corresponding services:

PortDescription
3026HTTPS Kubernetes proxy
3036MySQL port
5432Postgres port

This is the same table of ports you used to configure the load balancer.

License file

If you are deploying Teleport Enterprise, you need to download a license file and make it available to your Teleport Auth Service instances.

To obtain your license file, visit the Teleport customer dashboard and log in. Click "DOWNLOAD LICENSE KEY". You will see your current Teleport Enterprise account permissions and the option to download your license file:

License File modal

The license file must be available to each Teleport Auth Service instance at /var/lib/teleport/license.pem.

Configuration

Create a configuration file and provide it to each of your Teleport instances at /etc/teleport.yaml. We will explain the required configuration fields for a high-availability Teleport deployment below. These are the minimum requirements, and when planning your high-availability deployment, you will want to follow a more specific deployment guide for your environment.

storage

The first configuration section to write is the storage section, which configures the cluster state backend and session recording backend for the Teleport Auth Service:

version: v3
teleport:
  storage:
    # ...

Consult our Backends Reference for the configuration fields you should set in the storage section.

auth_service and proxy_service

The auth_service and proxy_service sections configure the Auth Service and Proxy Service, which we will run together on each Teleport instance. The configuration will depend on whether you are enabling TLS Routing in your cluster:

To enable TLS Routing in your Teleport cluster, add the following to your Teleport configuration:

version: v3
teleport:
  storage:
  # ...
auth_service:
  enabled: true
  cluster_name: "mycluster.example.com"
  # Remove this if not using Teleport Enterprise
  license_file: "/var/lib/license/license.pem"
proxy_service:
  enabled: true
  public_addr: "mycluster.example.com:443"
  https_keypairs:
  - key_file: /etc/teleport-tls/tls.key
    cert_file: /etc/teleport-tls/tls.crt

This configuration has no fields specific to TLS Routing. In v2, the configuration version we are using here, TLS Routing is enabled by default.

To disable TLS Routing in your Teleport cluster, add the following to your Teleport configuration:

version: v3
teleport:
  storage:
  # ...
auth_service:
  proxy_listener_mode: separate
  enabled: true
  cluster_name: "mycluster.example.com"
  # Remove this if not using Teleport Enterprise
  license_file: "/var/lib/license/license.pem"
proxy_service:
  enabled: true
  listen_addr: 0.0.0.0:3023
  tunnel_listen_addr: 0.0.0.0:3024
  public_addr: "mycluster.example.com:443"
  https_keypairs:
  - key_file: /etc/teleport-tls/tls.key
    cert_file: /etc/teleport-tls/tls.crt

This configuration assigns auth_service.proxy_listener_mode to separate to disable TLS Routing. It also explicitly assigns an SSH port (listen_addr) and reverse tunnel port (tunnel_listen_addr) for the Proxy Service.

The auth_service and proxy_service configurations above have the following required settings for a high-availability Teleport deployment:

  • In the auth_service section, we have enabled the Teleport Auth Service (enabled) and instructed it to find an Enterprise license file at /var/lib/license/license.pem (license_file). Remove the license_file field if you are deploying the open source edition of Teleport.
  • In the proxy_service section, we have enabled the Teleport Proxy Service (enabled) and instructed it to find its TLS credentials in the /etc/teleport-tls directory (https_keypairs).

ssh_service

You can disable the SSH Service on each Teleport instance by adding the following to each instance's configuration file:

version: v3
teleport:
  storage:
  # ...
auth_service:
# ...
proxy_service:
# ...
ssh_service:
  enabled: false

This is suitable for deploying Teleport on Kubernetes, where the teleport pod should not have direct access to the underlying node.

If you are deploying Teleport on a cluster of virtual machines, remove this line to run the SSH Service and enable secure access to the host.

Next steps

Refine your plan

Now that you know the general principles behind a high-availability Teleport deployment, read about how to design your own deployment on Kubernetes or a cluster of virtual machines in your cloud of choice:

Ensure high performance

You should also get familiar with how to ensure that your Teleport deployment is performing as expected:

Deploy Teleport services

Once your high-availability Teleport deployment is up and running, you can add resources by launching Teleport services. You can run these services in a separate network from your Teleport cluster.

To get started, read about registering: