# Troubleshooting Server Auto-Discovery

Common issues and diagnostic steps for server auto-discovery.

The `tctl discovery nodes` command lists Teleport's recent attempts to enroll cloud instances and reports whether each succeeded. It reads enrollment audit events emitted by the Discovery Service (`ssm.run` for AWS, `azure.run` for Azure).

Use `--last` to widen the audit-event lookback window past the default one hour. The flag accepts Go-style durations (for example `--last=30m`, `--last=24h`); widen it when investigating failures older than the default window.

## AWS EC2

### Inspect instance enrollment status

The `tctl discovery nodes` command lists Teleport's recent attempts to enroll AWS EC2 instances and reports whether each succeeded. It reads `ssm.run` audit events emitted by the Discovery Service.

```
$ tctl discovery nodes --cloud=aws
$ tctl discovery nodes --cloud=aws --failures-only
$ tctl discovery nodes --cloud=aws --last=24h
$ tctl discovery nodes --cloud=aws --format=json
```

Sample output:

```
Cloud Account      Region       Instance ID         Time                 Status                 Details
----- ------------ ------------ ------------------- -------------------- ---------------------- -------------------------------------------------
AWS   123456789012 eu-central-1 i-0000000000aaaaaa1 2026-04-02T15:16:03Z Online
AWS   123456789012 eu-central-1 i-0000000000bbbbbb2 2026-04-02T15:00:24Z Failed (exit code=-1)  SSM Agent lost connection
AWS   123456789012 eu-central-1 i-0000000000cccccc3 2026-04-02T15:00:19Z Failed (exit code=104) SSM Script failure
AWS   123456789012 eu-central-1 i-0000000000dddddd4 2026-04-02T15:00:15Z Installed (offline)    Script output: "Offloading the installation pa...
```

Each row's Status column tells you what stage the instance reached:

- `Online`: the instance joined the cluster.
- `Installed (offline)`: the install script succeeded but the agent isn't currently connected. Either the agent installed but never joined the cluster (bad join token, misconfiguration, or network), or it joined and later went offline.
- `Failed (exit code=N)`: the install script ran and exited non-zero. The Details column shows the script output. See [Installation script exit codes](#installation-script-exit-codes) above for known values.

Use `--last` to widen the audit-event lookback window past the default one hour (e.g. `--last=24h`) when investigating older failures or retries.

Use `--format=json` for machine-readable output that includes the full script output, the originating user task ID, and AWS-specific instance metadata.

See [`tctl discovery nodes`](https://goteleport.com/docs/ver/19.x/reference/cli/tctl.md#tctl-discovery-nodes) for the full flag reference.

If Installs are showing failed or instances are failing to appear check the Command history in AWS Systems Manager -> Node Management -> Run Command. Select the instance-id of the Target to review Errors.

### Installation script exit codes

When enrolling an instance into Teleport, the installation script returns a specific exit code when facing well-known issues:

- 100: `bash` binary is missing
- 101: `sudo` binary is missing
- 102: `curl` binary is missing
- 103: `/opt` or `/` do not have the required minimum space to install Teleport, at least 1250MB is required
- 104: host is unable to connect to the Teleport Proxy Services's HTTPS endpoint
- 150: Teleport was installed but the agent failed to join the cluster

You can customize the installation script - as describe in the **Use a custom installation script** section above - and implement other checks with specific exit codes.

Whether the installation exits with a pre-flight check or with your own custom installation script, the exit code appears in `ssm.run` Teleport audit events, in the `exit_code` field. It also appears in AWS Systems Manager -> Node Management -> Run Command -> Command history.

Example of the `ssm.run` event with an exit code:

```
{
  "code": "TDS00W",
  "event": "ssm.run",
  "instance_id": "i-0000",
  "region": "<region>",
  "invocation_url": "https://<region>.console.aws.amazon.com/systems-manager/run-command/<uuid>/i-0000",
  "platform_name": "Ubuntu",
  "platform_type": "Linux",
  "platform_version": "24.04",
  "status": "curl is not installed in the instance. Please install all required tools (bash, sudo, curl) and try again.",
  "stderr": "failed to run commands: exit status 102",
  "stdout": "curl is missing\n",
  "exit_code": 102
}

```

### Interpreting exit code `150` (`join failure`)

Exit code `150` means Teleport installed successfully, but join health checks did not complete. The `status` field is intentionally high-level; use `stderr` for root-cause details.

Common `stderr` patterns:

- `ERROR: node did not become ready (join cluster) within <duration>` means Teleport kept polling `/readyz` until the configured join-health timeout elapsed without a successful join.
- `systemd service state: ActiveState="...", SubState="...", Result="..."` is a best-effort snapshot captured at timeout for troubleshooting context.
- `join failure: token is expired or not found; ...` can appear as an additional enriched line when journal logs contain an explicit token-expiry signal.
- `failed to run commands: exit status 150` can appear at the end when AWS SSM's run-shell wrapper reports the installer exit code.

Example `ssm.run` event for exit code `150`:

```
{
  "code": "TDS00W",
  "event": "ssm.run",
  "instance_id": "i-0000",
  "region": "<region>",
  "invocation_url": "https://<region>.console.aws.amazon.com/systems-manager/run-command/<uuid>/i-0000",
  "platform_name": "Ubuntu",
  "platform_type": "Linux",
  "platform_version": "24.04",
  "status": "Teleport was installed successfully but the agent did not become ready within the configured timeout. Check standard error output for join diagnostics.",
  "stderr": "...\nERROR: node did not become ready (join cluster) within <duration>\n\nsystemd service state: ActiveState=\"active\", SubState=\"running\", Result=\"success\"\n\njoin failure: token is expired or not found; systemd service state: ActiveState=\"active\", SubState=\"running\", Result=\"success\"\n\tnode did not become ready (join cluster) within <duration>\n\nJournal output:\n...\nfailed to run commands: exit status 150",
  "stdout": "",
  "exit_code": 150
}

```

### `cannot unmarshal object into Go struct field`

If you encounter an error similar to the following:

```
invalid format in plugin properties map[destinationPath:/tmp/installTeleport.sh sourceInfo:map[url:[https://example.teleport.sh:443/webapi/scripts/installer/preprod-installer](https://example.teleport.sh/webapi/scripts/installer/preprod-installer)] sourceType:HTTP];
error json: cannot unmarshal object into Go struct field DownloadContentPlugin.sourceInfo of type string

```

It is likely that you're running an older SSM agent version. Upgrade to SSM agent version 3.1 or greater to resolve.

### `InvalidInstanceId: Instances [[i-123]] not in a valid state for account 456`

The following problems can cause this error:

- The Discovery Service doesn't have permission to access the managed node.
- AWS Systems Manager Agent (SSM Agent) isn't running. Verify that SSM Agent is running.
- SSM Agent isn't registered with the SSM endpoint. Try reinstalling SSM Agent.
- The discovered instance does not have permission to receive SSM commands, verify the instance includes the AmazonSSMManagedInstanceCore IAM policy.

See SSM RunCommand error codes and troubleshooting information in AWS documentation for more details:

- <https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-managed-instances.html>
- [https://docs.aws.amazon.com/systems-manager/latest/APIReference/API\_SendCommand.html#API\_SendCommand\_Errors](https://docs.aws.amazon.com/systems-manager/latest/apireference/api_sendcommand.html#api_sendcommand_errors)

## Azure VM

### Inspect instance enrollment status

The `tctl discovery nodes` command lists Teleport's recent attempts to enroll Azure VMs and reports whether each succeeded. It reads `azure.run` audit events emitted by the Discovery Service.

```
$ tctl discovery nodes --cloud=azure
$ tctl discovery nodes --cloud=azure --failures-only
$ tctl discovery nodes --cloud=azure --last=24h
$ tctl discovery nodes --cloud=azure --format=json
```

Sample output:

```
Cloud Account                              Region  Instance      Time                 Status                 Details
----- ---------------------------------- ------- ------------- -------------------- ---------------------- ------------------------------------------------
Azure abcdef01-2345-6789-abcd-ef01234... EastUS  example-vm-0  2026-04-29T15:25:48Z Online
Azure abcdef01-2345-6789-abcd-ef01234... EastUS  example-vm-1  2026-04-29T15:06:08Z Failed (exit code=103) Enrollment failed. Script output: "insufficien...
Azure abcdef01-2345-6789-abcd-ef01234... EastUS  example-vm-2  2026-04-29T15:05:56Z Failed (API error)     VM agent not available. API error: "PUT https...
Azure abcdef01-2345-6789-abcd-ef01234... EastUS  example-vm-3  2026-04-29T15:05:54Z Failed (exit code=104) Enrollment failed. Script output: "proxy is u...
```

Each row's Status column tells you what stage the instance reached:

- `Online`: the VM joined the cluster.
- `Installed (offline)`: the install script succeeded but the agent isn't currently connected. Either the agent installed but never joined the cluster (bad join token, misconfiguration, or network), or it joined and later went offline.
- `Failed (exit code=N)`: the install script ran and exited non-zero. The Details column shows the script output.
- `Failed (API error)`: the install command never executed because the Azure run command API rejected it, typically a missing `runCommands/write` permission or a VM without the agent installed.

Use `--last` to widen the audit-event lookback window past the default one hour (e.g. `--last=24h`) when investigating older failures or retries.

Use `--format=json` for machine-readable output that includes the full script output, the originating user task ID, and Azure-specific instance metadata.

See [`tctl discovery nodes`](https://goteleport.com/docs/ver/19.x/reference/cli/tctl.md#tctl-discovery-nodes) for the full flag reference.

### Teleport reports no error but VM does not join

Check your Discovery Service config and make sure that the VM you want to discover matches. In debug mode, Teleport will log the Subscription IDs and names of VMs it discovers.

The Azure run command API does not report the output of commands, so Teleport has no way of knowing if a command succeeded or failed. Run command logs can be found on the targeted VM at `/var/log/azure/run-command-handler/handler.log`.
