Troubleshooting Server Auto-Discovery
Common issues and diagnostic steps for server auto-discovery.
The tctl discovery nodes command lists Teleport's recent attempts to enroll
cloud instances and reports whether each succeeded. It reads enrollment audit
events emitted by the Discovery Service (ssm.run for AWS, azure.run for
Azure).
Use --last to widen the audit-event lookback window past the default one
hour. The flag accepts Go-style durations (for example --last=30m,
--last=24h); widen it when investigating failures older than the default
window.
AWS EC2
Inspect instance enrollment status
The tctl discovery nodes command lists Teleport's recent attempts to enroll
AWS EC2 instances and reports whether each succeeded. It reads ssm.run audit
events emitted by the Discovery Service.
tctl discovery nodes --cloud=awstctl discovery nodes --cloud=aws --failures-onlytctl discovery nodes --cloud=aws --last=24htctl discovery nodes --cloud=aws --format=json
Sample output:
Cloud Account Region Instance ID Time Status Details----- ------------ ------------ ------------------- -------------------- ---------------------- -------------------------------------------------AWS 123456789012 eu-central-1 i-0000000000aaaaaa1 2026-04-02T15:16:03Z OnlineAWS 123456789012 eu-central-1 i-0000000000bbbbbb2 2026-04-02T15:00:24Z Failed (exit code=-1) SSM Agent lost connectionAWS 123456789012 eu-central-1 i-0000000000cccccc3 2026-04-02T15:00:19Z Failed (exit code=104) SSM Script failureAWS 123456789012 eu-central-1 i-0000000000dddddd4 2026-04-02T15:00:15Z Installed (offline) Script output: "Offloading the installation pa...
Each row's Status column tells you what stage the instance reached:
Online: the instance joined the cluster.Installed (offline): the install script succeeded but the agent isn't currently connected. Either the agent installed but never joined the cluster (bad join token, misconfiguration, or network), or it joined and later went offline.Failed (exit code=N): the install script ran and exited non-zero. The Details column shows the script output. See Installation script exit codes above for known values.
Use --last to widen the audit-event lookback window past the default one
hour (e.g. --last=24h) when investigating older failures or retries.
Use --format=json for machine-readable output that includes the full script
output, the originating user task ID, and AWS-specific instance metadata.
See tctl discovery nodes
for the full flag reference.
If Installs are showing failed or instances are failing to appear check the Command history in AWS Systems Manager -> Node Management -> Run Command. Select the instance-id of the Target to review Errors.
Installation script exit codes
When enrolling an instance into Teleport, the installation script returns a specific exit code when facing well-known issues:
- 100:
bashbinary is missing - 101:
sudobinary is missing - 102:
curlbinary is missing - 103:
/optor/do not have the required minimum space to install Teleport, at least 1250MB is required - 104: host is unable to connect to the Teleport Proxy Services's HTTPS endpoint
- 150: Teleport was installed but the agent failed to join the cluster
You can customize the installation script - as describe in the Use a custom installation script section above - and implement other checks with specific exit codes.
Whether the installation exits with a pre-flight check or with your own custom installation script, the exit code appears in ssm.run Teleport audit events, in the exit_code field.
It also appears in AWS Systems Manager -> Node Management -> Run Command -> Command history.
Example of the ssm.run event with an exit code:
{
"code": "TDS00W",
"event": "ssm.run",
"instance_id": "i-0000",
"region": "<region>",
"invocation_url": "https://<region>.console.aws.amazon.com/systems-manager/run-command/<uuid>/i-0000",
"platform_name": "Ubuntu",
"platform_type": "Linux",
"platform_version": "24.04",
"status": "curl is not installed in the instance. Please install all required tools (bash, sudo, curl) and try again.",
"stderr": "failed to run commands: exit status 102",
"stdout": "curl is missing\n",
"exit_code": 102
}
Interpreting exit code 150 (join failure)
Exit code 150 means Teleport installed successfully, but join health checks did not complete.
The status field is intentionally high-level; use stderr for root-cause details.
Common stderr patterns:
ERROR: node did not become ready (join cluster) within <duration>means Teleport kept polling/readyzuntil the configured join-health timeout elapsed without a successful join.systemd service state: ActiveState="...", SubState="...", Result="..."is a best-effort snapshot captured at timeout for troubleshooting context.join failure: token is expired or not found; ...can appear as an additional enriched line when journal logs contain an explicit token-expiry signal.failed to run commands: exit status 150can appear at the end when AWS SSM's run-shell wrapper reports the installer exit code.
Example ssm.run event for exit code 150:
{
"code": "TDS00W",
"event": "ssm.run",
"instance_id": "i-0000",
"region": "<region>",
"invocation_url": "https://<region>.console.aws.amazon.com/systems-manager/run-command/<uuid>/i-0000",
"platform_name": "Ubuntu",
"platform_type": "Linux",
"platform_version": "24.04",
"status": "Teleport was installed successfully but the agent did not become ready within the configured timeout. Check standard error output for join diagnostics.",
"stderr": "...\nERROR: node did not become ready (join cluster) within <duration>\n\nsystemd service state: ActiveState=\"active\", SubState=\"running\", Result=\"success\"\n\njoin failure: token is expired or not found; systemd service state: ActiveState=\"active\", SubState=\"running\", Result=\"success\"\n\tnode did not become ready (join cluster) within <duration>\n\nJournal output:\n...\nfailed to run commands: exit status 150",
"stdout": "",
"exit_code": 150
}
cannot unmarshal object into Go struct field
If you encounter an error similar to the following:
invalid format in plugin properties map[destinationPath:/tmp/installTeleport.sh sourceInfo:map[url:[https://example.teleport.sh:443/webapi/scripts/installer/preprod-installer](https://example.teleport.sh/webapi/scripts/installer/preprod-installer)] sourceType:HTTP];
error json: cannot unmarshal object into Go struct field DownloadContentPlugin.sourceInfo of type string
It is likely that you're running an older SSM agent version. Upgrade to SSM agent version 3.1 or greater to resolve.
InvalidInstanceId: Instances [[i-123]] not in a valid state for account 456
The following problems can cause this error:
- The Discovery Service doesn't have permission to access the managed node.
- AWS Systems Manager Agent (SSM Agent) isn't running. Verify that SSM Agent is running.
- SSM Agent isn't registered with the SSM endpoint. Try reinstalling SSM Agent.
- The discovered instance does not have permission to receive SSM commands, verify the instance includes the AmazonSSMManagedInstanceCore IAM policy.
See SSM RunCommand error codes and troubleshooting information in AWS documentation for more details:
- https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-managed-instances.html
- https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_SendCommand.html#API_SendCommand_Errors
Azure VM
Inspect instance enrollment status
The tctl discovery nodes command lists Teleport's recent attempts to enroll
Azure VMs and reports whether each succeeded. It reads azure.run audit
events emitted by the Discovery Service.
tctl discovery nodes --cloud=azuretctl discovery nodes --cloud=azure --failures-onlytctl discovery nodes --cloud=azure --last=24htctl discovery nodes --cloud=azure --format=json
Sample output:
Cloud Account Region Instance Time Status Details----- ---------------------------------- ------- ------------- -------------------- ---------------------- ------------------------------------------------Azure abcdef01-2345-6789-abcd-ef01234... EastUS example-vm-0 2026-04-29T15:25:48Z OnlineAzure abcdef01-2345-6789-abcd-ef01234... EastUS example-vm-1 2026-04-29T15:06:08Z Failed (exit code=103) Enrollment failed. Script output: "insufficien...Azure abcdef01-2345-6789-abcd-ef01234... EastUS example-vm-2 2026-04-29T15:05:56Z Failed (API error) VM agent not available. API error: "PUT https...Azure abcdef01-2345-6789-abcd-ef01234... EastUS example-vm-3 2026-04-29T15:05:54Z Failed (exit code=104) Enrollment failed. Script output: "proxy is u...
Each row's Status column tells you what stage the instance reached:
Online: the VM joined the cluster.Installed (offline): the install script succeeded but the agent isn't currently connected. Either the agent installed but never joined the cluster (bad join token, misconfiguration, or network), or it joined and later went offline.Failed (exit code=N): the install script ran and exited non-zero. The Details column shows the script output.Failed (API error): the install command never executed because the Azure run command API rejected it, typically a missingrunCommands/writepermission or a VM without the agent installed.
Use --last to widen the audit-event lookback window past the default one
hour (e.g. --last=24h) when investigating older failures or retries.
Use --format=json for machine-readable output that includes the full script
output, the originating user task ID, and Azure-specific instance metadata.
See tctl discovery nodes
for the full flag reference.
Teleport reports no error but VM does not join
Check your Discovery Service config and make sure that the VM you want to discover matches. In debug mode, Teleport will log the Subscription IDs and names of VMs it discovers.
The Azure run command API does not report the output of commands,
so Teleport has no way of knowing if a command succeeded or failed. Run command
logs can be found on the targeted VM at
/var/log/azure/run-command-handler/handler.log.