Navigating Access Challenges in Kubernetes-Based Infrastructure
Sep 19
Virtual
Register Today
Teleport logoTry For Free
Fork me on GitHub

Teleport

Storage Backends

A Teleport cluster stores different types of data in different locations. By default everything is stored in a local directory on the Auth Service host.

For self-hosted Teleport deployments, you can configure Teleport to integrate with other storage types based on the nature of the stored data (size, read/write ratio, mutability, etc.).

Data typeDescriptionSupported storage backends
core cluster stateCluster configuration (e.g. users, roles, auth connectors) and identity (e.g. certificate authorities, registered nodes, trusted clusters).Local directory (SQLite), etcd, PostgreSQL, Amazon DynamoDB, GCP Firestore, CockroachDB
audit eventsJSON-encoded events from the audit log (e.g. user logins, RBAC changes)Local directory, PostgreSQL, CockroachDB, Amazon DynamoDB, GCP Firestore
session recordingsRaw terminal recordings of interactive user sessionsLocal directory, AWS S3 (and any S3-compatible product), GCP Cloud Storage, Azure Blob Storage
teleport instance stateID and credentials of a non-auth teleport instance (e.g. node, proxy)Local directory, Kubernetes Secret

Cluster state

Cluster state is stored in a central storage location configured by the Auth Service. The cluster state includes:

  • Agent and Proxy Service membership information, including offline/online status.
  • List of active sessions.
  • List of locally stored users.
  • RBAC configuration (roles and permissions).
  • Dynamic configuration.

There are two ways to achieve High Availability. You can "outsource" this function to the infrastructure. For example, using a highly available network-based disk volumes (similar to AWS EBS) and by migrating a failed VM to a new host. In this scenario, there's nothing Teleport-specific to be done.

If High Availability cannot be provided by the infrastructure (perhaps you're running Teleport on a bare metal cluster), you can still configure Teleport to run in a highly available fashion.

Teleport Enterprise Cloud takes care of this setup for you so you can provide secure access to your infrastructure right away.

Get started with a free trial of Teleport Enterprise Cloud.

Auth Service State

To run multiple instances of the Teleport Auth Service, you must switch to one of the high-availability secrets backend listed below first.

Once you have a high-availability secrets backend and multiple instances of the Auth Service running, you'll need to create a load balancer to evenly distribute traffic to all Auth Service instances and have a single point of entry for all components that need to communicate with the Auth Service. Use the address of the load balancer in the auth_server field when configuring other components of Teleport.

Configure your load balancer to use Layer 4 (TCP) load balancing, round-robin load balancing, and a 300 second idle timeout.

NOTE

With multiple instances of the Auth Service running, special attention needs to be paid to keeping their configuration identical. Settings like cluster_name, tokens, storage, etc. must be the same.

Proxy Service State

The Teleport Proxy is stateless which makes running multiple instances trivial.

If using the default configuration, configure your load balancer to forward port 3080 to the servers that run the Teleport Proxy Service. If you have configured your Proxy Service to not use TLS Routing and/or are using non-default ports, you will need to configure your load balancer to forward the ports you specified for listen_addr, tunnel_listen_addr, and web_listen_addr in teleport.yaml.

Configure your load balancer to use Layer 4 (TCP) load balancing, round-robin load balancing, and a 300 second idle timeout.

NOTE

If you terminate TLS with your own certificate for web_listen_addr at your load balancer you'll need to run Teleport with --insecure-no-tls

If your load balancer supports HTTP health checks, configure it to hit the /readyz diagnostics endpoint on machines running Teleport. This endpoint must be enabled by using the --diag-addr flag to teleport start:

teleport start --diag-addr=0.0.0.0:3000

The /readyz endpoint will reply {"status":"ok"} if the Teleport service is running without problems. The endpoint must be exposed on a proxy interface for the load balancer health checks to succeed. You should only do this on the proxy instances and ensure that port 3000 is not exposed to the public internet, just the load balancers. For other services, continue to use the 127.0.0.1 local loopback interface.

We'll cover how to use etcd, PostgreSQL, DynamoDB, and Firestore storage backends to make Teleport highly available below.

Etcd

Teleport can use etcd as a storage backend to achieve highly available deployments. You must take steps to protect access to etcd in this configuration because that is where Teleport secrets like keys and user records will be stored.

IMPORTANT

etcd can only currently be used to store Teleport's internal database in a highly-available way. This will allow you to have multiple Auth Service instances in your cluster for an High Availability deployment, but it will not also store Teleport audit events for you in the same way that DynamoDB or Firestore will. etcd is not designed to handle large volumes of time series data like audit events.

To configure Teleport for using etcd as a storage backend:

  • Make sure you are using etcd versions 3.3 or newer.
  • Follow etcd's cluster hardware recommendations. In particular, leverage SSD or high-performance virtualized block device storage for best performance.
  • Install etcd and configure peer and client TLS authentication using the etcd security guide.
  • Configure all Teleport Auth Service instances to use etcd in the "storage" section of the config file as shown below.
  • Deploy several Auth Service instances connected to etcd backend.
  • Deploy several Proxy Service instances that have auth_server pointed to the Auth Service to connect to.
teleport:
  storage:
     type: etcd

     # List of etcd peers to connect to:
     peers: ["https://172.17.0.1:4001", "https://172.17.0.2:4001"]

     # Required path to TLS client certificate and key files to connect to etcd.
     #
     # To create these, follow
     # https://coreos.com/os/docs/latest/generate-self-signed-certificates.html
     # or use the etcd-provided script
     # https://github.com/etcd-io/etcd/tree/master/hack/tls-setup.
     tls_cert_file: /var/lib/teleport/etcd-cert.pem
     tls_key_file: /var/lib/teleport/etcd-key.pem

     # Optional file with trusted CA authority
     # file to authenticate etcd nodes
     #
     # If you used the script above to generate the client TLS certificate,
     # this CA certificate should be one of the other generated files
     tls_ca_file: /var/lib/teleport/etcd-ca.pem

     # Alternative password-based authentication, if not using TLS client
     # certificate.
     #
     # See https://etcd.io/docs/v3.4.0/op-guide/authentication/ for setting
     # up a new user.
     username: username
     password_file: /mnt/secrets/etcd-pass

     # etcd key (location) where teleport will be storing its state under.
     # make sure it ends with a '/'!
     prefix: /teleport/

     # NOT RECOMMENDED: enables insecure etcd mode in which self-signed
     # certificate will be accepted
     insecure: false

     # Optionally sets the limit on the client message size.
     # This is usually used to increase the default which is 2MiB
     # (1.5MiB server's default + gRPC overhead bytes).
     # Make sure this does not exceed the value for the etcd
     # server specified with `--max-request-bytes` (1.5MiB by default).
     # Keep the two values in sync.
     #
     # See https://etcd.io/docs/v3.4.0/dev-guide/limit/ for details
     #
     # This bumps the size to 15MiB as an example:
     etcd_max_client_msg_size_bytes: 15728640

PostgreSQL

PostgreSQL cluster state and audit log storage is available starting from Teleport 13.3.

Teleport can use PostgreSQL as a storage backend to achieve high availability. You must take steps to protect access to PostgreSQL in this configuration because that is where Teleport secrets like keys and user records will be stored. The PostgreSQL backend supports two types of Teleport data:

  • Cluster state
  • Audit log events

The PostgreSQL backend requires PostgreSQL 13 or later, and, for the cluster state only, the wal2json logical decoding plugin. The plugin is available in packages for all stable versions in the PostgreSQL Apt and Yum repositories for Debian- and RPM-based Linux distributions respectively, or it can be compiled following the instructions provided in its repository. The plugin is pre-installed with no extra steps to take in Azure Database for PostgreSQL.

Note

CockroachDB can be used as a PostgreSQL drop-in replacement to store audit events (requires Teleport version >= 15.4.2).

Teleport can store the cluster state in CockroachDB but this require CockroachDB-specific configuration. See the CockroachDB backend section for more details.

Teleport needs separate databases for the cluster state and the audit log, and it will attempt to create them if given permissions to do so; it will also set up the database schemas as needed, so we recommend giving the user ownership over the databases.

The PostgreSQL backend for cluster state relies on the ability to use logical decoding to get a stream of changes from the database; because of that, the wal_level parameter must be set to logical and max_replication_slots must be set to at least as many Teleport Auth Service instances as you'll be running (a higher number is recommended, to account for network conditions).

The Teleport Auth Service needs to be able to create a replication slot when starting and when reestablishing a new connection to the PostgreSQL cluster, and any long-running transaction will prevent that. It's therefore only advisable to store the Teleport cluster state on a shared PostgreSQL cluster if the other workloads on the cluster only consist of short-lived transactions.

wal_level can only be set at server start, so it should be set in postgresql.conf:

# the default value for wal_level is replica
wal_level = logical

# the default value for max_replication_slots is 10
max_replication_slots = 10

In addition, the database user must have the initiating replication role attribute. In the psql shell:

postgres=# CREATE USER new_user WITH REPLICATION;
CREATE ROLE

postgres=# ALTER ROLE existing_user WITH LOGIN REPLICATION;
ALTER ROLE

Since replication permissions allow for essentially full read access over the entire cluster (with a physical replication connection) or to all databases that the user can connect to, it's recommended to prevent the user from opening replication connections, or from connecting to databases other than the ones used for Teleport, if the PostgreSQL cluster is shared between Teleport and other applications.

For convenience, Teleport will attempt to grant itself the initiating replication role attribute, to accommodate the ability of some managed services (such as Azure Database for PostgreSQL) to create superuser accounts through their API; this should only be leveraged if the entire PostgreSQL cluster is dedicated to Teleport.

To configure Teleport to use PostgreSQL:

  • Configure all Teleport Auth Service instances to use the PostgreSQL backend in the storage section of teleport.yaml as shown below.
  • Deploy several Auth Service instances connected to the PostgreSQL storage backend.
  • Deploy several Proxy Service nodes.
  • Make sure that the Proxy Service instances and all Teleport agent services that connect directly to the Auth Service have the auth_server configuration setting populated with the address of a load balancer for Auth Service instances.
PgBouncer

Teleport must connect directly to the Postgres server. pgbouncer is incompatible with the Teleport PostgreSQL storage backend.

teleport:
  storage:
    type: postgresql

    # conn_string is a libpq-compatible connection string (see
    # https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING);
    # pool_max_conns is an additional parameter that determines the maximum
    # number of connections in the connection pool used for the cluster state
    # database (the change feed uses an additional connection), defaulting to
    # a value that depends on the number of available CPUs.
    #
    # If your certificates are not stored at the default ~/.postgresql
    # location, you will need to specify them with the sslcert, sslkey, and
    # sslrootcert parameters.
    conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full&pool_max_conns=20

    # In certain managed environments it can be necessary or convenient to
    # use a different user or different settings for the connection used
    # to set up and make use of logical decoding. If specified, Teleport
    # will use the connection string in change_feed_conn_string for that,
    # instead of the one in conn_string. Available in Teleport 13.4 and later.
    change_feed_conn_string: postgresql://replication_user_name@database-address/teleport_backend?sslmode=verify-full

    # An audit_events_uri with a scheme of postgresql:// will use the
    # PostgreSQL backend for audit log storage; the URI is a libpq-compatible
    # connection string just like the cluster state conn_string, but cannot be
    # specified as key=value pairs. It's possible to specify completely
    # different PostgreSQL clusters for cluster state and audit log.
    #
    # If your certificates are not stored at the default ~/.postgresql
    # location, you will need to specify them with the sslcert, sslkey, and
    # sslrootcert parameters.
    audit_events_uri:
      - postgresql://user_name@database-address/teleport_audit?sslmode=verify-full

Audit log events are periodically deleted after a default retention period of 8766 hours (one year); it's possible to select a different retention period or to disable the cleanup entirely, by specifying the retention_period or the disable_cleanup parameters in the fragment of the URI:

teleport:
  storage:
    audit_events_uri:
      - postgresql://user_name@database-address/teleport_audit?sslmode=verify-full#disable_cleanup=false&retention_period=2160h

Authentication

We strongly recommend using client certificates to authenticate Teleport to PostgreSQL, as well as enforcing the use of TLS and verifying the server certificate on the client side.

You will need to update your pg_hba.conf file to include the following lines to ensure connections to Teleport use client certificates. See The pg_hba.conf file in the PostgreSQL documentation for more details.

# TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
hostssl teleport        all             ::/0                    cert
hostssl teleport        all             0.0.0.0/0               cert

If the use of passwords is unavoidable, we recommend configuring them in the ~/.pgpass file rather than storing them in Teleport's configuration file.

Azure AD authentication

If you are running Teleport on Azure, Teleport can make use of Azure AD authentication to connect to an Azure Database for PostgreSQL server without having to manage any secrets:

teleport:
  storage:
    type: postgresql

    conn_string: postgresql://[email protected]/teleport_backend?sslmode=verify-full&pool_max_conns=20
    auth_mode: azure

    audit_events_uri:
      - postgresql://[email protected]/teleport_audit?sslmode=verify-full#auth_mode=azure

When auth_mode is set to azure, Teleport will automatically fetch short-lived tokens from the credentials available to it, to be used as database passwords. The database user must be configured to allow connections using Azure AD.

Teleport will make use of the Azure AD credentials specified by environment variables, Azure AD Workload Identity credentials, or managed identity credentials.

Google Cloud IAM authentication

If you are running Teleport on Google Cloud, Teleport can make use of IAM Authentication to connect to an GCP Cloud SQL for PostgreSQL without having to manage any secrets:

teleport:
  storage:
    type: postgresql
    auth_mode: gcp-cloudsql

    # GCP connection name has the format <project>:<location>:<instance>.
    gcp_connection_name: project:location:instance

    # The type of IP address to use for connecting to the Cloud SQL instance. Valid options are:
    # - "" (default to "public")
    # - "public"
    # - "private"
    # - "psc" (for Private Service Connect)
    gcp_ip_type: public

    # Leave host and port empty as they are not required.
    conn_string: postgresql://[email protected]@/teleport_backend

    audit_events_uri:
      - postgresql://[email protected]@/teleport_audit#auth_mode=gcp-cloudsql&gcp_connection_name=project:location:instance&gcp_ip_type=public

To enable IAM authentication and logical replication for Cloud SQL, make sure flags cloudsql.iam_authentication and cloudsql.logical_decoding are set to on for the Cloud SQL instance. The database user must also have the REPLICATION role attribute for using the logical decoding features. See set up logical replication and decoding for more details.

In order for Teleport to use the Cloud SQL Go Connector with IAM authentication, the service account of the target database user must have "Cloud SQL Client"/roles/cloudsql.client and "Cloud SQL Instance User"/roles/cloudsql.instanceUser roles assigned to the service account.

Teleport will make use of the credentials specified through the GOOGLE_APPLICATION_CREDENTIALS environment variable, Workload Identity Federation with service account impersonation, or service account credentials attached to VMs.

If the service account used in the PostgreSQL connection string is different from the service account of the default credentials, Teleport will impersonate the service account used in the connection string as a Service Account Token Creator using the default credentials.

Development

If you are not ready to connect Teleport to a production instance of PostgreSQL, you can use the following instructions to set up a throwaway instance of PostgreSQL using Docker.

First copy the following script to disk and run it to generate the CA, client certificate, and server certificate used by Teleport and PostgreSQL to establish a secure mutually authenticated connection:

#!/bin/bash

# Create the certs directory.
mkdir -p ./certs
cd certs/

# Create CA key and self-signed certificate.
openssl genpkey -algorithm RSA -out ca.key
openssl req -x509 -new -key ca.key -out ca.crt -subj "/CN=root"

# Function to create certificates.
create_certificate() {
    local name="$1"
    local dns_name="$2"

    openssl genpkey \
        -algorithm RSA \
        -out "${name}.key"
    openssl req -new \
        -key "${name}.key" \
        -out "${name}.csr" \
        -subj "/CN=${dns_name}"
    openssl x509 -req \
        -in "${name}.csr" \
        -CA ca.crt \
        -CAkey ca.key \
        -out "${name}.crt" \
        -extfile <(printf "subjectAltName=DNS:${dns_name}") \
        -CAcreateserial

    chmod 0600 "${name}.key"
}

# Create client certificate with SAN.
create_certificate "client" "teleport"

# Create server certificate with SAN.
create_certificate "server" "localhost"

echo "Certificates and keys generated successfully."

Next, create a Dockerfile using the official PostgreSQL Docker image and add wal2json to it:

FROM postgres:15.0
RUN apt-get update
RUN apt-get install -y postgresql-15-wal2json

Create an init.sql file that will ensure the Teleport user is created upon startup of the container:

CREATE USER teleport WITH REPLICATION CREATEDB;

Create a pg_hba.conf file to enforce certificate-based authentication for connections to PostgreSQL:

# TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
local   all             all                                     trust
hostssl all             all             ::/0                    cert
hostssl all             all             0.0.0.0/0               cert

Create a postgresql.conf file that configures the WAL level and certificates used for authentication:

listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB

wal_level=logical
max_replication_slots=10

ssl=on
ssl_ca_file='/certs/ca.crt'
ssl_cert_file='/certs/server.crt'
ssl_key_file='/certs/server.key'

Start the PostgreSQL container with the following command:

docker run --rm --name postgres \
    -e POSTGRES_DB=db \
    -e POSTGRES_USER=user \
    -e POSTGRES_PASSWORD=password \
    -v $(pwd)/data:/var/lib/postgresql/data \
    -v $(pwd)/certs:/certs \
    -v $(pwd)/postgresql.conf:/etc/postgresql/postgresql.conf \
    -v $(pwd)/pg_hba.conf:/etc/postgresql/pg_hba.conf \
    -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql \
    -p 5432:5432 \
    $(docker build -q .) \
    postgres \
    -c hba_file=/etc/postgresql/pg_hba.conf \
    -c config_file=/etc/postgresql/postgresql.conf

Lastly, update the storage section in teleport.yaml to use PostgreSQL and start Teleport:

teleport:
  storage:
    type: postgresql
    conn_string: "postgresql://teleport@localhost:5432/teleport_backend?sslcert=/path/to/certs/client.crt&sslkey=/path/to/certs/client.key&sslrootcert=/path/to/certs/ca.crt&sslmode=verify-full&pool_max_conns=20"

S3 (Session Recordings)

Teleport supports using S3 as a backend for both session recordings and audit logs. S3 cannot be used as the cluster state backend. This section covers the use of S3 as a session recording backend. For information on using S3 for audit logs, see the Athena section.

S3 buckets must have versioning enabled, which ensures that a session log cannot be permanently altered or deleted. Teleport will always look at the oldest version of a recording.

Authenticating to AWS

The Teleport Auth Service must be able to read AWS credentials in order to authenticate to S3.

Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS.

  • If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method
  • If you are running the Teleport Auth Service in Kubernetes, you can use IAM Roles for Service Accounts (IRSA)
  • Otherwise, you must use environment variables

Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.

The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.

Refer to IAM Roles for Service Accounts (IRSA) to set up an OIDC provider in AWS and configure an AWS IAM role that allows the pod's service account to assume the role.

Teleport's built-in AWS client reads credentials from the following environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

When you start the Teleport Auth Service, the service reads environment variables from a file at the path /etc/default/teleport. Obtain these credentials from your organization. Ensure that /etc/default/teleport has the following content, replacing the values of each variable:

AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>

Teleport's AWS client loads credentials from different sources in the following order:

  • Environment Variables
  • Shared credentials file
  • Shared configuration file (Teleport always enables shared configuration)
  • EC2 Instance Metadata (credentials only)

While you can provide AWS credentials via a shared credentials file or shared configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE environment variable assigned to the name of your profile of choice.

If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.

Configuring the S3 backend

Below is an example of how to configure the Teleport Auth Service to store the recorded sessions in an S3 bucket.

teleport:
  storage:
      # The region setting sets the default AWS region for all AWS services
      # Teleport may consume (DynamoDB, S3)
      region: us-east-1

      # Path to S3 bucket to store the recorded sessions in.
      audit_sessions_uri: "s3://Example_TELEPORT_S3_BUCKET/records"

      # Teleport assumes credentials. Using provider chains, assuming IAM role or
      # standard .aws/credentials in the home folder.

You can add optional query parameters to the S3 URL. The Teleport Auth Service reads these parameters to configure its interactions with S3:

s3://bucket/path?region=us-east-1&endpoint=mys3.example.com&insecure=false&disablesse=false&acl=private&use_fips_endpoint=true

  • region=us-east-1 - set the Amazon region to use.

  • endpoint=mys3.example.com - connect to a custom S3 endpoint. Optional.

  • insecure=true - set to true or false. If true, TLS will be disabled. Default value is false.

  • disablesse=true - set to true or false. The Auth Service checks this value before uploading an object to an S3 bucket.

    If this is false, the Auth Service will set the server-side encryption configuration of the upload to use AWS Key Management Service and, if sse_kms_key is set, configure the upload to use this key.

    If this value is true, the Auth Service will not set an explicit server-side encryption configuration for the object upload, meaning that the upload will use the bucket-level server-side encryption configuration.

  • sse_kms_key=kms_key_id - If set to a valid AWS KMS CMK key ID, all objects uploaded to S3 will be encrypted with this key (as long as disablesse is false). Details can be found below.

  • acl=private - set the canned ACL to use. Must be one of the predefined ACL values.

  • use_fips_endpoint=true - Configure S3 FIPS endpoints

S3 IAM policy

On startup, the Teleport Auth Service checks whether the S3 bucket you have configured for session recording storage exists. If it does not, the Auth Service attempts to create and configure the bucket.

The IAM permissions that the Auth Service requires to manage its session recording bucket depends on whether you expect to create the bucket yourself or enable the Auth Service to create and configure it for you:

Note that Teleport will only use S3 buckets with versioning enabled. This ensures that a session log cannot be permanently altered or deleted, as Teleport will always look at the oldest version of a recording.

You'll need to replace these values in the policy example below:

Placeholder valueReplace with
your-sessions-bucketName to use for the Teleport S3 session recording bucket
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketActions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucketVersions",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:GetEncryptionConfiguration",
                "s3:GetBucketVersioning"
            ],
            "Resource": "arn:aws:s3:::your-sessions-bucket"
        },
        {
            "Sid": "ObjectActions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectVersion",
                "s3:GetObjectRetention",
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": "arn:aws:s3:::your-sessions-bucket/*"
        }
    ]
}

You'll need to replace these values in the policy example below:

Placeholder valueReplace with
your-sessions-bucketName to use for the Teleport S3 session recording bucket
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketActions",
            "Effect": "Allow",
            "Action": [
                "s3:PutEncryptionConfiguration",
                "s3:PutBucketVersioning",
                "s3:ListBucketVersions",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:GetEncryptionConfiguration",
                "s3:GetBucketVersioning",
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::your-sessions-bucket"
        },
        {
            "Sid": "ObjectActions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectVersion",
                "s3:GetObjectRetention",
                "s3:*Object",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": "arn:aws:s3:::your-sessions-bucket/*"
        }
    ]
}

S3 Server Side Encryption

Teleport supports using a custom AWS KMS Customer Managed Key for encrypting objects uploaded to S3. This allows you to restrict who can read objects like session recordings separately from those that have read access to a bucket by restricting key access.

The sse_kms_key parameter above can be set to any valid KMS CMK ID corresponding to a symmetric standard spec KMS key. Example template KMS key policies are provided below for common usage cases. IAM users do not have access to any key by default. Permissions have to be explicitly granted in the policy.

Encryption/Decryption

This policy allows an IAM user to encrypt and decrypt objects. This allows a cluster auth to write and play back session recordings.

Replace [iam-key-admin-arn] with the IAM ARN of the user(s) that should have administrative key access and [auth-node-iam-arn] with the IAM ARN of the user the Teleport auth nodes are using.

{
  "Id": "Teleport Encryption and Decryption",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Teleport CMK Admin",
      "Effect": "Allow",
      "Principal": {
        "AWS": "[iam-key-admin-arn]"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Teleport CMK Auth",
      "Effect": "Allow",
      "Principal": {
        "AWS": "[auth-node-iam-arn]"
      },
      "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    }
  ]
}

Encryption/Decryption with separate clusters

This policy allows specifying separate IAM users for encryption and decryption. This can be used to set up a multi cluster configuration where the main cluster cannot play back session recordings but only write them. A separate cluster authenticating as a different IAM user with decryption access can be used for playing back the session recordings.

Replace [iam-key-admin-arn] with the IAM ARN of the user(s) that should have administrative key access, [iam-node-write-arn] with the IAM ARN of the user the main write-only cluster auth nodes are using and [iam-node-read-arn] with the IAM ARN of the user used by the read-only cluster.

For this to work the second cluster has to be connected to the same audit log as the main cluster. This is needed to detect session recordings.

{
  "Id": "Teleport Separate Encryption and Decryption",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Teleport CMK Admin",
      "Effect": "Allow",
      "Principal": {
        "AWS": "[iam-key-admin-arn]"
      },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "Teleport CMK Auth Encrypt",
      "Effect": "Allow",
      "Principal": {
        "AWS": "[auth-node-write-arn]"
      },
      "Action": [
        "kms:Encrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Teleport CMK Auth Decrypt",
      "Effect": "Allow",
      "Principal": {
        "AWS": "[auth-node-read-arn]"
      },
      "Action": [
        "kms:Decrypt",
        "kms:DescribeKey"
      ],
      "Resource": "*"
    }
  ]
}

ACL example: transferring object ownership

If you are uploading from AWS account A to a bucket owned by AWS account B and want A to retain ownership of the objects, you can take one of two approaches.

Without ACLs

If ACLs are disabled, object ownership will be set to Bucket owner enforced and no action will be needed.

With ACLs

  • Set object ownership to Bucket owner preferred (under Permissions in the management console).
  • Add acl=bucket-owner-full-control to audit_sessions_uri.

To enforce the ownership transfer, set B's bucket's policy to only allow uploads that include the bucket-owner-full-control canned ACL.

{
    "Version": "2012-10-17",
    "Id": "[id]",
    "Statement": [
        {
            "Sid": "[sid]",
            "Effect": "Allow",
            "Principal": {
                "AWS": "[ARN of account A]"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::BucketName/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        }
    ]
}

For more information, see the AWS Documentation.

DynamoDB

If you are running Teleport on AWS, you can use DynamoDB as a storage backend to achieve High Availability. DynamoDB backend supports two types of Teleport data:

  • Cluster state
  • Audit log events

Teleport uses DynamoDB and DynamoDB Streams endpoints for its storage backend management.

DynamoDB cannot store the recorded sessions. You are advised to use AWS S3 for that as shown above.

Authenticating to AWS

The Teleport Auth Service must be able to read AWS credentials in order to authenticate to DynamoDB.

Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS.

  • If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method
  • If you are running the Teleport Auth Service in Kubernetes, you can use IAM Roles for Service Accounts (IRSA)
  • Otherwise, you must use environment variables

Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.

The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.

Refer to IAM Roles for Service Accounts (IRSA) to set up an OIDC provider in AWS and configure an AWS IAM role that allows the pod's service account to assume the role.

Teleport's built-in AWS client reads credentials from the following environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

When you start the Teleport Auth Service, the service reads environment variables from a file at the path /etc/default/teleport. Obtain these credentials from your organization. Ensure that /etc/default/teleport has the following content, replacing the values of each variable:

AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>

Teleport's AWS client loads credentials from different sources in the following order:

  • Environment Variables
  • Shared credentials file
  • Shared configuration file (Teleport always enables shared configuration)
  • EC2 Instance Metadata (credentials only)

While you can provide AWS credentials via a shared credentials file or shared configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE environment variable assigned to the name of your profile of choice.

If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.

The IAM role that the Teleport Auth Service authenticates as must have the policies specified in the next section.

IAM policies

Make sure that the IAM role assigned to Teleport is configured with sufficient access to DynamoDB.

On startup, the Teleport Auth Service checks whether the DynamoDB table you have specified in its configuration file exists. If the table does not exist, the Auth Service attempts to create one.

The IAM permissions that the Auth Service requires to manage DynamoDB tables depends on whether you expect to create a table yourself or enable the Auth Service to create and configure one for you:

If you choose to manage DynamoDB tables yourself, you must take the following steps, which we will explain in more detail below:

  • Create a cluster state table.
  • Create an audit event table.
  • Create an IAM policy and attach it to the Teleport Auth Service's IAM identity.

Create a cluster state table

The cluster state table must have the following attribute definitions:

NameType
HashKeyS
FullPathS

The table must also have the following key schema elements:

NameType
HashKeyHASH
FullPathRANGE

Create an audit event table

The audit event table must have the following attribute definitions:

NameType
SessionIDS
EventIndexN
CreatedAtDateS
CreatedAtN

The table must also have the following key schema elements:

NameType
CreatedAtDateHASH
CreatedAtRANGE

Create and attach an IAM policy

Create the following IAM policy and attach it to the Teleport Auth Service's IAM identity.

You'll need to replace these values in the policy example below:

Placeholder valueReplace with
us-west-2AWS region
1234567890AWS account ID
teleport-helm-backendDynamoDB table name to use for the Teleport backend
teleport-helm-eventsDynamoDB table name to use for the Teleport audit log (must be different to the backend table)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ClusterStateStorage",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:DescribeStream",
                "dynamodb:UpdateItem",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:DescribeTable",
                "dynamodb:GetShardIterator",
                "dynamodb:GetItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:UpdateTable",
                "dynamodb:GetRecords",
                "dynamodb:UpdateContinuousBackups"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend",
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend/stream/*"
            ]
        },
        {
            "Sid": "ClusterEventsStorage",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:PutItem",
                "dynamodb:DescribeTable",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:UpdateItem",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateContinuousBackups"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events",
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events/index/*"
            ]
        }
    ]
}

Note that you can omit the dynamodb:UpdateContinuousBackups permission if disabling continuous backups.

You'll need to replace these values in the policy example below:

Placeholder valueReplace with
us-west-2AWS region
1234567890AWS account ID
teleport-helm-backendDynamoDB table name to use for the Teleport backend
teleport-helm-eventsDynamoDB table name to use for the Teleport audit log (must be different to the backend table)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ClusterStateStorage",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:DescribeStream",
                "dynamodb:UpdateItem",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:GetShardIterator",
                "dynamodb:GetItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:UpdateTable",
                "dynamodb:GetRecords",
                "dynamodb:UpdateContinuousBackups"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend",
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-backend/stream/*"
            ]
        },
        {
            "Sid": "ClusterEventsStorage",
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:BatchWriteItem",
                "dynamodb:UpdateTimeToLive",
                "dynamodb:PutItem",
                "dynamodb:DescribeTable",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:UpdateItem",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:UpdateTable",
                "dynamodb:UpdateContinuousBackups"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events",
                "arn:aws:dynamodb:us-west-2:1234567890:table/teleport-helm-events/index/*"
            ]
        }
    ]
}

Configuring the DynamoDB backend

To configure Teleport to use DynamoDB:

  • Configure all Teleport Auth servers to use DynamoDB backend in the "storage" section of teleport.yaml as shown below.
  • Auth servers must be able to reach DynamoDB and DynamoDB Streams endpoints.
  • Deploy up to two auth servers connected to DynamoDB storage backend.
  • Deploy several proxy nodes.
  • Make sure that all Teleport resource services have the auth_servers configuration setting populated with the addresses of your cluster's Auth Service instances.

AWS can throttle DynamoDB if more than two processes are reading from the same stream's shard simultaneously, so you must not deploy more than two Auth Service instances that read from a DynamoDB backend. For details on DynamoDB Streams, read the AWS documentation.

teleport:
  storage:
    type: dynamodb
    # Region location of dynamodb instance, https://docs.aws.amazon.com/en_pv/general/latest/gr/rande.html#ddb_region
    region: us-east-1

    # Name of the DynamoDB table. If it does not exist, Teleport will create it.
    table_name: Example_TELEPORT_DYNAMO_TABLE_NAME

    # This setting configures Teleport to send the audit events to three places:
    # To keep a copy in DynamoDB, a copy on a local filesystem, and also output the events to stdout.
    # NOTE: The DynamoDB events table has a different schema to the regular Teleport
    # database table, so attempting to use the same table for both will result in errors.
    # When using highly available storage like DynamoDB, you should make sure that the list always specifies
    # the High Availability storage method first, as this is what the Teleport web UI uses as its source of events to display.
    audit_events_uri:  ['dynamodb://events_table_name', 'file:///var/lib/teleport/audit/events', 'stdout://']

    # This setting configures Teleport to save the recorded sessions in an S3 bucket:
    audit_sessions_uri: s3://Example_TELEPORT_S3_BUCKET/records

    # By default, Teleport stores audit events with an AWS TTL of 1 year.
    # This value can be configured as shown below. If set to 0 seconds, TTL is disabled.
    retention_period: 365d

    # Enables either Pay Per Request or Provisioned billing for the DynamoDB table. Set when Teleport creates the table.
    # Possible values: "pay_per_request" and "provisioned"
    # default: "pay_per_request"
    billing_mode: "pay_per_request"

    # continuous_backups is used to optionally enable continuous backups.
    # default: false
    continuous_backups: true
  • Replace us-east-1 and Example_TELEPORT_DYNAMO_TABLE_NAME with your own settings. Teleport will create the table automatically.
  • Example_TELEPORT_DYNAMO_TABLE_NAME and events_table_name must be different DynamoDB tables. The schema is different for each. Using the same table name for both will result in errors.
  • Audit log settings above are optional. If specified, Teleport will store the audit log in DynamoDB and the session recordings must be stored in an S3 bucket, i.e. both audit_xxx settings must be present. If they are not set, Teleport will default to a local file system for the audit log, i.e. /var/lib/teleport/log on an Auth Service instance.

The optional GET parameters shown below control how Teleport interacts with a DynamoDB endpoint.

dynamodb://events_table_name?region=us-east-1&endpoint=dynamo.example.com&use_fips_endpoint=true

  • region=us-east-1 - set the Amazon region to use.
  • endpoint=dynamo.example.com - connect to a custom S3 endpoint.
  • use_fips_endpoint=true - Configure DynamoDB FIPS endpoints.

DynamoDB Continuous Backups

When setting up DynamoDB it's important to enable backups so that cluster state can be restored if needed from a snapshot in the past.

DynamoDB On-Demand

For best performance it is recommended to use On-Demand mode instead of configuring capacity manually via Provisioned mode. This helps prevent any DynamoDB throttling due to underestimated usage or increased usage from impacting Teleport.

Configuring AWS FIPS endpoints

This config option applies to Amazon S3 and Amazon DynamoDB.

Set use_fips_endpoint to true or false. If true, FIPS Dynamo endpoints will be used. If false, normal Dynamo endpoints will be used. If unset, the AWS Environment Variable AWS_USE_FIPS_ENDPOINT will determine which endpoint is used. FIPS endpoints will also be used if Teleport is run with the --fips flag.

Config option priority is applied in the following order:

  • Setting the use_fips_endpoint query parameter as shown above
  • Using the --fips flag when running Teleport
  • Using the AWS environment variable
A warning about AWS_USE_FIPS_ENDPOINT

Setting this environment variable to true will enable FIPS endpoints for all AWS resource types. Some FIPS endpoints are not supported in certain regions or environments or are only supported in GovCloud.

Athena

The Athena audit log backend is available starting from Teleport v14.0.

If you are running Teleport on AWS, you can use an Athena-based audit log system that manages Parquet files stored on S3 as a storage backend to achieve high availability. The Athena backend supports only one type of Teleport data, audit events.

The Athena audit backend is better at scale and search than DynamoDB.

The Athena audit logs are eventually consistent. It may take up to one minute (depending on the batchMaxInterval setting and event load) until you can view events in the Teleport Web UI.

Infrastructure setup

The Auth Service uses an SQS queue subscribed to an SNS topic for event publishing. A single Auth Service instance reads events in batches from SQS, converts them into Parquet format, and sends the resulting data to S3. During queries, the Athena engine searches for events on S3, reading metadata from a Glue table.

You can set up the required infrastructure to support the Athena backend with the following Terraform script:

variable "aws_region" {
  description = "AWS region"
  default     = "us-west-2"
}

variable "sns_topic_name" {
  description = "Name of the SNS topic used for publishing audit events"
}

variable "sqs_queue_name" {
  description = "Name of the SQS queue used for subscription for audit events topic"
}

variable "sqs_dlq_name" {
  description = "Name of the SQS Dead-Letter Queue used for handling unprocessable events"
}

variable "max_receive_count" {
  description = "Number of times a message can be received before it is sent to the DLQ"
  default     = 10
}

variable "kms_key_alias" {
  description = "The alias of a custom KMS key"
}

variable "long_term_bucket_name" {
  description = "Name of the long term storage bucket used for storing audit events"
}

variable "transient_bucket_name" {
  description = "Name of the transient storage bucket used for storing query results and large events payloads"
}

variable "database_name" {
  description = "Name of Glue database"
}

variable "table_name" {
  description = "Name of Glue table"
}

variable "workgroup" {
  description = "Name of Athena workgroup"
}

variable "workgroup_max_scanned_bytes_per_query" {
  description = "Limit per query of max scanned bytes"
  default     = 1073741824 # 1GB
}

# search_event_limiter variables allows to configured rate limit on top of
# search events API to prevent increasing costs in case of aggressive use of API.
# In current version Athena Audit logger is not prepared for polling of API.
# Burst=20, time=1m and amount=5, means that you can do 20 requests without any
# throttling, next requests will be throttled, and tokens will be filled to
# rate limit bucket at amount 5 every 1m.
variable "search_event_limiter_burst" {
  description = "Number of tokens available for rate limit used on top of search event API"
  default     = 20
}

variable "search_event_limiter_time" {
  description = "Duration between the addition of tokens to the bucket for rate limit used on top of search event API"
  default     = "1m"
}

variable "search_event_limiter_amount" {
  description = "Number of tokens added to the bucket during specific interval for rate limit used on top of search event API"
  default     = 5
}

variable "access_monitoring_trusted_relationship_role_arn" {
  description = "AWS Role ARN that will be used to configure trusted relationship between provided role and Access Monitoring role allowing to assume Access Monitoring role by the provided role"
  default     = ""
}

variable "access_monitoring" {
  description = "Enabled Access Monitoring"
  type        = bool
  default     = false
}

variable "access_monitoring_prefix" {
  description = "Prefix for resources created by Access Monitoring"
  default     = ""
}

provider "aws" {
  region = var.aws_region
}

data "aws_caller_identity" "current" {}

resource "aws_kms_key" "audit_key" {
  description         = "KMS key for Athena audit log"
  enable_key_rotation = true
}

resource "aws_kms_key_policy" "audit_key_policy" {
  key_id = aws_kms_key.audit_key.id
  policy = jsonencode({
    Statement = [
      {
        Action = [
          "kms:*"
        ]
        Effect = "Allow"
        Principal = {
          AWS = data.aws_caller_identity.current.account_id
        }
        Resource = "*"
        Sid      = "Default Policy"
      },
      {
        Action = [
          "kms:GenerateDataKey",
          "kms:Decrypt"
        ]
        Effect = "Allow"
        Principal = {
          Service = "sns.amazonaws.com"
        }
        Resource = "*"
        Sid      = "SnsUsage"
        Condition = {
          StringEquals = {
            "aws:SourceAccount" = data.aws_caller_identity.current.account_id
          }
          ArnLike = {
            "aws:SourceArn" : aws_sns_topic.audit_topic.arn
          }
        }
      },
    ]
    Version = "2012-10-17"
  })
}

resource "aws_kms_alias" "audit_key_alias" {
  name          = "alias/${var.kms_key_alias}"
  target_key_id = aws_kms_key.audit_key.key_id
}

resource "aws_sns_topic" "audit_topic" {
  name              = var.sns_topic_name
  kms_master_key_id = aws_kms_key.audit_key.arn
}

resource "aws_sqs_queue" "audit_queue_dlq" {
  name                              = var.sqs_dlq_name
  kms_master_key_id                 = aws_kms_key.audit_key.arn
  kms_data_key_reuse_period_seconds = 300
  message_retention_seconds         = 604800 // 7 days which is three days longer than default 4 of sqs queue
}

resource "aws_sqs_queue" "audit_queue" {
  name                              = var.sqs_queue_name
  kms_master_key_id                 = aws_kms_key.audit_key.arn
  kms_data_key_reuse_period_seconds = 300

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.audit_queue_dlq.arn
    maxReceiveCount     = var.max_receive_count
  })
}

resource "aws_sns_topic_subscription" "audit_sqs_target" {
  topic_arn            = aws_sns_topic.audit_topic.arn
  protocol             = "sqs"
  endpoint             = aws_sqs_queue.audit_queue.arn
  raw_message_delivery = true
}

data "aws_iam_policy_document" "audit_policy" {
  statement {
    actions = [
      "SQS:SendMessage",
    ]
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["sns.amazonaws.com"]
    }
    resources = [aws_sqs_queue.audit_queue.arn]
    condition {
      test     = "ArnEquals"
      variable = "aws:SourceArn"
      values   = [aws_sns_topic.audit_topic.arn]
    }
  }
}

resource "aws_sqs_queue_policy" "audit_policy" {
  queue_url = aws_sqs_queue.audit_queue.url
  policy    = data.aws_iam_policy_document.audit_policy.json
}

resource "aws_s3_bucket" "long_term_storage" {
  bucket        = var.long_term_bucket_name
  force_destroy = true
  # On production we recommend enabling object lock to provide deletion protection.
  object_lock_enabled = false
}

resource "aws_s3_bucket_server_side_encryption_configuration" "long_term_storage" {
  bucket = aws_s3_bucket.long_term_storage.id
  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.audit_key.arn
      sse_algorithm     = "aws:kms"
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_ownership_controls" "long_term_storage" {
  bucket = aws_s3_bucket.long_term_storage.id
  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

resource "aws_s3_bucket_versioning" "long_term_storage" {
  bucket = aws_s3_bucket.long_term_storage.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_public_access_block" "long_term_storage" {
  bucket                  = aws_s3_bucket.long_term_storage.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket" "transient_storage" {
  bucket        = var.transient_bucket_name
  force_destroy = true
  # On production we recommend enabling lifecycle configuration to clean transient data.
}

resource "aws_s3_bucket_server_side_encryption_configuration" "transient_storage" {
  bucket = aws_s3_bucket.transient_storage.id
  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.audit_key.arn
      sse_algorithm     = "aws:kms"
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_ownership_controls" "transient_storage" {
  bucket = aws_s3_bucket.transient_storage.id
  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

resource "aws_s3_bucket_versioning" "transient_storage" {
  bucket = aws_s3_bucket.transient_storage.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_public_access_block" "transient_storage" {
  bucket                  = aws_s3_bucket.transient_storage.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_glue_catalog_database" "audit_db" {
  name = var.database_name
}

resource "aws_glue_catalog_table" "audit_table" {
  name          = var.table_name
  database_name = aws_glue_catalog_database.audit_db.name
  table_type    = "EXTERNAL_TABLE"
  parameters = {
    "EXTERNAL"                            = "TRUE",
    "projection.enabled"                  = "true",
    "projection.event_date.type"          = "date",
    "projection.event_date.format"        = "yyyy-MM-dd",
    "projection.event_date.interval"      = "1",
    "projection.event_date.interval.unit" = "DAYS",
    "projection.event_date.range"         = "NOW-4YEARS,NOW",
    "storage.location.template"           = format("s3://%s/events/$${event_date}/", aws_s3_bucket.long_term_storage.bucket)
    "classification"                      = "parquet"
    "parquet.compression"                 = "SNAPPY",
  }
  storage_descriptor {
    location      = format("s3://%s", aws_s3_bucket.long_term_storage.bucket)
    input_format  = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"
    output_format = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"
    ser_de_info {
      name                  = "example"
      parameters            = { "serialization.format" = "1" }
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }
    columns {
      name = "uid"
      type = "string"
    }
    columns {
      name = "session_id"
      type = "string"
    }
    columns {
      name = "event_type"
      type = "string"
    }
    columns {
      name = "event_time"
      type = "timestamp"
    }
    columns {
      name = "event_data"
      type = "string"
    }
    columns {
      name = "user"
      type = "string"
    }
  }
  partition_keys {
    name = "event_date"
    type = "date"
  }
}

resource "aws_athena_workgroup" "workgroup" {
  name          = var.workgroup
  force_destroy = true
  configuration {
    bytes_scanned_cutoff_per_query = var.workgroup_max_scanned_bytes_per_query
    engine_version {
      selected_engine_version = "Athena engine version 3"
    }
    result_configuration {
      output_location = format("s3://%s/results", aws_s3_bucket.transient_storage.bucket)
      encryption_configuration {
        encryption_option = "SSE_KMS"
        kms_key_arn       = aws_kms_key.audit_key.arn
      }
    }
  }
}


output "athena_url" {
  value = format("athena://%s.%s?%s",
    aws_glue_catalog_database.audit_db.name,
    aws_glue_catalog_table.audit_table.name,
    join("&", [
      format("topicArn=%s", aws_sns_topic.audit_topic.arn),
      format("largeEventsS3=s3://%s/large_payloads", aws_s3_bucket.transient_storage.bucket),
      format("locationS3=s3://%s/events", aws_s3_bucket.long_term_storage.bucket),
      format("workgroup=%s", aws_athena_workgroup.workgroup.name),
      format("queueURL=%s", aws_sqs_queue.audit_queue.url),
      format("queryResultsS3=s3://%s/query_results", aws_s3_bucket.transient_storage.bucket),
      format("limiterBurst=%d", var.search_event_limiter_burst),
      format("limiterRefillAmount=%s", var.search_event_limiter_amount),
      format("limiterRefillTime=%s", var.search_event_limiter_time),
    ])
  )
}

Configuring the Athena audit log backend

To configure Teleport to use Athena:

  • Make sure you are using Teleport version 14.0.0 or newer.
  • Prepare infrastructure
  • Specify an Athena URL inside the audit_events_uri array in your Teleport configuration file:
teleport:
  storage:
    # This setting configures Teleport to keep a copy of the audit log in Athena
    # and a copy on a local filesystem, and also to output the events to stdout.
    audit_events_uri:
      # More details about the full Athena URL are shown below.
      - 'athena://database.table?params'
      - 'file:///var/lib/teleport/audit/events'
      - 'stdout://'

Here is an example of an Amazon Athena URL within the audit_events_uri configuration field:

athena://db.table?topicArn=arn:aws:sns:region:account_id:topic_name&largeEventsS3=s3://transient/large_payloads&locationS3=s3://long-term/events&workgroup=workgroup&queueURL=https://sqs.region.amazonaws.com/account_id/queue_name&queryResultsS3=s3://transient/query_results

The URL hostname consist of database.table, which points to the Glue database and a table which will be used by the Athena audit logger.

Other parameters are specified as query parameters within the Athena URL.

The following parameters are required:

Parameter nameExample valueDescription
topicArnarn:aws:sns:region:account_id:topic_nameARN of SNS topic where events are published
locationS3s3://long-term/eventsS3 bucket used for long-term storage
largeEventsS3s3://transient/large_payloadsS3 bucket used for transient storage for large events
queueURLhttps://sqs.region.amazonaws.com/account_id/queue_nameSQS URL used for a subscription to an SNS topic
workgroupworkgroup_nameAthena workgroup used for queries
queryResultsS3s3://transient/resultsS3 bucket used for transient storage for query results

The following parameters are optional:

Parameter nameExample valueDescription
regionus-east-1AWS region. If empty, defaults to one from the AuditConfig or ambient AWS credentials
batchMaxItems20000defines the maximum number of events allowed for a single Parquet file (default 20000)
batchMaxInterval1mdefines the maximum interval used to buffer incoming data before creating a Parquet file (default 1m)

Authenticating to AWS

The Teleport Auth Service must be able to read AWS credentials in order to authenticate to Athena.

Grant the Teleport Auth Service access to credentials that it can use to authenticate to AWS.

  • If you are running the Teleport Auth Service on an EC2 instance, you may use the EC2 Instance Metadata Service method
  • If you are running the Teleport Auth Service in Kubernetes, you can use IAM Roles for Service Accounts (IRSA)
  • Otherwise, you must use environment variables

Teleport will detect when it is running on an EC2 instance and use the Instance Metadata Service to fetch credentials.

The EC2 instance should be configured to use an EC2 instance profile. For more information, see: Using Instance Profiles.

Refer to IAM Roles for Service Accounts (IRSA) to set up an OIDC provider in AWS and configure an AWS IAM role that allows the pod's service account to assume the role.

Teleport's built-in AWS client reads credentials from the following environment variables:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION

When you start the Teleport Auth Service, the service reads environment variables from a file at the path /etc/default/teleport. Obtain these credentials from your organization. Ensure that /etc/default/teleport has the following content, replacing the values of each variable:

AWS_ACCESS_KEY_ID=00000000000000000000
AWS_SECRET_ACCESS_KEY=0000000000000000000000000000000000000000
AWS_DEFAULT_REGION=<YOUR_REGION>

Teleport's AWS client loads credentials from different sources in the following order:

  • Environment Variables
  • Shared credentials file
  • Shared configuration file (Teleport always enables shared configuration)
  • EC2 Instance Metadata (credentials only)

While you can provide AWS credentials via a shared credentials file or shared configuration file, you will need to run the Teleport Auth Service with the AWS_PROFILE environment variable assigned to the name of your profile of choice.

If you have a specific use case that the instructions above do not account for, consult the documentation for the AWS SDK for Go for a detailed description of credential loading behavior.

The IAM role that the Teleport Auth Service authenticates as must have the policies specified in the next section.

IAM policies

Make sure that the IAM role assigned to Teleport is configured with sufficient access to Athena. Below you can find the IAM permissions that the Auth Service requires to use Athena Audit logs as an audit event backend.

You'll need to replace these values in the policy example below:

Placeholder valueReplace with
eu-central-1AWS region
1234567890AWS account ID
audit-long-termS3 bucket used for long-term storage
audit-transientS3 bucket used for transient storage
audit-sqsSNS topic name
audit-snsSQS name
kms_idKMS key ID used for server-side encryption of SNS/SQS/S3
audit_dbGlue database used for audit logs
audit_tableGlue table used for audit logs
audit_workgroupAthena workgroup used for audit logs
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketLocation",
                "s3:ListBucketVersions",
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::audit-transient",
                "arn:aws:s3:::audit-long-term"
            ],
            "Sid": "AllowListingMultipartUploads"
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:ListMultipartUploadParts",
                "s3:GetObjectVersion",
                "s3:GetObject",
                "s3:DeleteObjectVersion",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::audit-transient/results/*",
                "arn:aws:s3:::audit-transient/large_payloads/*",
                "arn:aws:s3:::audit-long-term/events/*"
            ],
            "Sid": "AllowMultipartAndObjectAccess"
        },
        {
            "Action": "sns:Publish",
            "Effect": "Allow",
            "Resource": "arn:aws:sns:eu-central-1:1234567890:audit-sns",
            "Sid": "AllowPublishSNS"
        },
        {
            "Action": [
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:sqs:eu-central-1:1234567890:audit-sqs",
            "Sid": "AllowReceiveSQS"
        },
        {
            "Action": [
                "glue:GetTable",
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:GetQueryExecution"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:glue:eu-central-1:1234567890:table/audit_db/audit_table",
                "arn:aws:glue:eu-central-1:1234567890:database/audit_db",
                "arn:aws:glue:eu-central-1:1234567890:catalog",
                "arn:aws:athena:eu-central-1:1234567890:workgroup/audit_workgroup"
            ],
            "Sid": "AllowAthenaQuery"
        },
        {
            "Action": [
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:kms:eu-central-1:1234567890:key/kms_id",
            "Sid": "AllowAthenaKMSUsage"
        }
    ]
}

Migration from Dynamo to the Athena audit logs backend

Tip

Migration is only needed if you used Amazon DynamoDB for audit logs and you want to keep old data.

Migration consist of following steps:

  1. Set up Athena infrastructure
  2. Dual write to both DynamoDB and Athena, and query from DynamoDB
  3. Migrate old data from DynamoDB to Athena
  4. Dual write to both DynamoDB and Athena, and query from Athena
  5. Disable writing to DynamoDB

In the Teleport storage configuration, audit_events_uri accepts multiple URLs. Those URLs are used to configure connections to the different audit loggers. If more than 1 is used, then events are written to each audit system, and queries are executed from first one.

Tip

If anything goes wrong during migration steps 1-4, roll back to the Amazon DynamoDB solution by making sure its URL is the first value in the audit_events_uri field and removing the Athena URL.

Each of these steps is explained in more detail below.

Dual write to both DynamoDB and Athena, and query from DynamoDB

The second step of migration requires setting the following configuration:

teleport:
  storage:
    audit_events_uri:
    - 'dynamodb://events_table_name'
    - 'athena://db.table?otherQueryParams'

When an Auth Service instance is restarted, you should verify that Parquet files are stored in the S3 bucket specified using the locationS3 parameter.

Migrate old data from DynamoDB to Athena

This step requires using the client machine to export data from Amazon DynamoDB and publish it to the Athena logger. We recommend using, for example, an EC2 instance with a disk size at least 2x bigger than the table size in Amazon DynamoDB.

Instructions for how to use the migration tool can be found on GitHub.

You should set exportTime to the time when dual writing began.

We recommend running your first migration with the -dry-run flag because it validates the exported data. If no errors are reported, proceed to a real migration without the -dry-run flag.

Dual write to both DynamoDB and Athena, and query from Athena

Change the order of the audit_events_uri values in your Teleport configuration file:

teleport:
  storage:
    audit_events_uri:
    - 'athena://db.table?otherQueryParams'
    - 'dynamodb://events_table_name'

When the Auth Service is restarted, you should verify that events are visible on the Audit Logs page.

Disable writing to DynamoDB

Disabling writing to DynamoDB means that you won't be able to roll back to DynamoDB without losing data. Dual writing to both Athena and DynamoDB does not have a significant performance impact, and it's recommended to keep dual writing for some time, even if your system already executes queries from Athena.

To disable writing to DynamoDB, remove the DynamoDB URL from the audit_events_uri array.

GCS

Google Cloud Storage (GCS) can be used as storage for recorded sessions. GCS cannot store the audit log or the cluster state. Below is an example of how to configure a Teleport Auth Service to store the recorded sessions in a GCS bucket.

teleport:
  storage:
      # Path to GCS to store the recorded sessions in.
      audit_sessions_uri: 'gs://$BUCKET_NAME/records?projectID=$PROJECT_ID&credentialsPath=$CREDENTIALS_PATH'

We recommend creating a bucket in Dual-Region mode with the Standard storage class to ensure cluster performance and high availability. Replace the following variables in the above example with your own values:

  • $BUCKET_NAME with the name of the desired GCS bucket. If the bucket does not exist it will be created. Please ensure the following permissions are granted for the given bucket:

    • storage.buckets.get
    • storage.objects.create
    • storage.objects.get
    • storage.objects.list
    • storage.objects.update
    • storage.objects.delete

    storage.objects.delete is required in order to clean up multipart files after they have been assembled into the final blob.

    If the bucket does not exist, please also ensure that the storage.buckets.create permission is granted.

  • $PROJECT_ID with a GCS-enabled GCP project.

  • $CREDENTIALS_PATH with the path to a JSON-formatted GCP credentials file configured for a service account applicable to the project.

Firestore

If you are running Teleport on GCP, you can use Firestore as a storage backend to achieve high availability. Firestore backend supports two types of Teleport data:

  • Cluster state
  • Audit log events

Firestore cannot store the recorded sessions. You are advised to use Google Cloud Storage (GCS) for that as shown above. To configure Teleport to use Firestore:

  • Configure all Teleport Auth servers to use Firestore backend in the "storage" section of teleport.yaml as shown below.
  • Deploy several auth servers connected to Firestore storage backend.
  • Deploy several proxy nodes.
  • Make sure that all Teleport resource services have the auth_servers configuration setting populated with the addresses of your cluster's Auth Service instances or use a load balancer for Auth Service instances in high availability mode.
teleport:
  storage:
    type: firestore
    # Project ID https://support.google.com/googleapi/answer/7014113?hl=en
    project_id: Example_GCP_Project_Name

    # Name of the Firestore table.
    collection_name: Example_TELEPORT_FIRESTORE_TABLE_NAME

    credentials_path: /var/lib/teleport/gcs_creds

    # This setting configures Teleport to send the audit events to three places:
    # To keep a copy in Firestore, a copy on a local filesystem, and also write the events to stdout.
    # NOTE: The Firestore events table has a different schema to the regular Teleport
    # database table, so attempting to use the same table for both will result in errors.
    # When using highly available storage like Firestore, you should make sure that the list always specifies
    # the High Availability storage method first, as this is what the Teleport web UI uses as its source of events to display.
    audit_events_uri:  ['firestore://Example_TELEPORT_FIRESTORE_EVENTS_TABLE_NAME', 'file:///var/lib/teleport/audit/events', 'stdout://']

    # This setting configures Teleport to save the recorded sessions in GCP storage:
    audit_sessions_uri: gs://Example_TELEPORT_GCS_BUCKET/records
  • Replace Example_GCP_Project_Name and Example_TELEPORT_FIRESTORE_TABLE_NAME with your own settings. Teleport will create the table automatically.
  • Example_TELEPORT_FIRESTORE_TABLE_NAME and Example_TELEPORT_FIRESTORE_EVENTS_TABLE_NAME must be different Firestore tables. The schema is different for each. Using the same table name for both will result in errors.
  • The GCP authentication setting above can be omitted if the machine itself is running on a GCE instance with a Service Account that has access to the Firestore table.
  • Audit log settings above are optional. If specified, Teleport will store the audit log in Firestore and the session recordings must be stored in a GCS bucket, i.e. both audit_xxx settings must be present. If they are not set, Teleport will default to a local filesystem for the audit log, i.e. /var/lib/teleport/log on an Auth Service instance.

Azure Blob Storage

Azure Blob Storage for session storage is available starting from Teleport 13.3.

Azure Blob Storage can be used as storage for recorded sessions. Azure Blob Storage cannot store the audit log or the cluster state. Below is an example of how to configure a Teleport Auth Service instance to store the recorded sessions in an Azure Blob Storage storage account.

teleport:
  storage:
    audit_sessions_uri: azblob://account-name.blob.core.windows.net

Teleport makes use of two containers in the account, whose names default to inprogress and session, but they can be configured with parameters in the fragment of the URI.

teleport:
  storage:
    audit_sessions_uri: azblob://account-name.core.blob.windows.net#session_container=session_container_name&inprogress_container=inprogress_container_name

Permissions

Teleport needs the following permissions on the inprogress container:

  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write
  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete (only on the inprogress container)

In addition, Teleport will check if the containers exist at startup, and it will attempt to create them if they can't be confirmed to exist; giving Teleport Microsoft.Storage/storageAccounts/blobServices/containers/read will allow for checking and Microsoft.Storage/storageAccounts/blobServices/containers/write will allow for creating them.

It's highly recommended to set up a time-based retention policy for the session container, as well as a lifecycle management policy, so that recordings are kept in an immutable state for a given period, then deleted. Teleport will not delete recordings automatically.

With a time-based retention policy in place, it's safe to give Teleport the "Blob Storage Data Contributor" role scoped to the containers, instead of having to define a custom role for it.

Authentication

Teleport will make use of the Azure AD credentials specified by environment variables, Azure AD Workload Identity credentials, or managed identity credentials.

SQLite

The Auth Service uses the SQLite backend when no type is specified in the storage section in the Teleport configuration file, or when type is set to sqlite or dir. The SQLite backend is not designed for high throughput and it's not capable of serving the needs of Teleport's High Availability configurations.

If you are planning to use SQLite as your backend, scale your cluster slowly and monitor the number of warning messages in the Auth Service's logs that say SLOW TRANSACTION, as that's a sign that the cluster has outgrown the capabilities of the SQLite backend.

As a stopgap measure until it's possible to migrate the cluster to use a HA-capable backend, you can configure the SQLite backend to reduce the amount of disk synchronization, in exchange for less resilience against system crashes or power loss. For an explanation on what the options mean, see the official SQLite docs. No matter the configuration, we recommend you take regular backups of your cluster state.

To reduce disk synchronization:

teleport:
  storage:
    type: sqlite
    sync: NORMAL

To disable disk synchronization altogether:

teleport:
  storage:
    type: sqlite
    sync: "OFF"

When running on a filesystem that supports file locks (i.e. a local filesystem, not a networked one) it's possible to also configure the SQLite database to use Write-Ahead Logging (see the official docs on WAL mode) for significantly improved performance without sacrificing reliability:

teleport:
  storage:
    type: sqlite
    sync: NORMAL
    journal: WAL

The SQLite backend and other required data will be written to the Teleport data directory. By default, Teleport's data directory is /var/lib/teleport. To modify the location set the data_dir value within the Teleport configuration file.

teleport:
  data_dir: /var/lib/teleport_data

CockroachDB

Enterprise

Use of the CockroachDB storage backend requires Teleport Enterprise.

Teleport can use CockroachDB as a storage backend to achieve high availability and survive regional failures. You must take steps to protect access to CockroachDB in this configuration because that is where Teleport secrets like keys and user records will be stored.

At a minimum you must configure CockroachDB to allow Teleport to create tables. Teleport will create the database if given permission to do so but this is not required if the database already exists.

CREATE DATABASE database_name;
CREATE USER database_user;
GRANT CREATE ON DATABASE database_name TO database_user;

You must also enable change feeds in CockroachDB's cluster settings. Teleport will configure this setting itself if granted SYSTEM MODIFYCLUSTERSETTING.

SET CLUSTER SETTING kv.rangefeed.enabled = true;

There are several ways to deploy and configure CockroachDB, the details of which are not in scope for this guide. To learn about deploying CockroachDB, see CockroachDB's deployment options. To learn about how to configure multi-region survival goals, see multi-region survival goals.

To configure Teleport to use CockroachDB as a storage backend:

  • Configure all Teleport Auth Service instances to use the CockroachDB backend in the storage section of teleport.yaml as shown below.
  • Deploy several Auth Service instances connected to the CockroachDB storage backend.
  • Deploy several Proxy Service instances.
  • Make sure that the Proxy Service instances and all Teleport agent services that connect directly to to the Auth Service have the auth_server configuration setting populated with the address of a load balancer for Auth Service instances.
teleport:
  storage:
    type: cockroachdb

    # conn_string is a required parameter. It is a PostgreSQL connection string used
    # to connect to CockroachDB using the PostgreSQL wire protocol. Client
    # parameters may be specified using the URL. For a detailed list of available
    # parameters see https://www.cockroachlabs.com/docs/stable/connection-parameter
    #
    # If your certificates are not stored at the default ~/.postgresql
    # location, you will need to specify them with the sslcert, sslkey, and
    # sslrootcert parameters.
    #
    # pool_max_conns is an additional parameter that determines the maximum
    # number of connections in the connection pool used for the cluster state
    # database (the change feed uses an additional connection), defaulting to
    # a value that depends on the number of available CPUs.
    conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full&pool_max_conns=20

    # change_feed_conn_string is an optional parameter. When unspecified Teleport
    # will default to using the same value specified for conn_string. It may be used
    # to configure Teleport to use a different user or connection parameters when
    # establishing a change feed connection.
    #
    # If your certificates are not stored at the default ~/.postgresql
    # location, you will need to specify them with the sslcert, sslkey, and
    # sslrootcert parameters.
    change_feed_conn_string: postgresql://user_name@database-address/teleport_backend?sslmode=verify-full

    # ttl_job_cron is an optional parameter which configures the interval at which CockroachDB will expire backend
    # items based on their time to live. By default this is configured to run every
    # 20 minutes. This is used by Teleport to clean up old resources that are no longer
    # connected to or needed by Teleport. Note that configuring this to run more
    # frequently may have performance implications for CockroachDB.
    ttl_job_cron: '*/20 * * * *'