SSM send command to EC2 instance Failed
This can happen when you don't have SSM agent installed on the instance you're trying to access. For a list of instances where you can run SSM commands, run:
aws ssm describe-instance-information --output text
From there, you can grab an instance ID and then run the send_command
command with that instance.
As documented here in AWS' troubleshooting guide there are a range of possible causes for this error.
The accepted answer aws ssm describe-instance-information
checks for instances which are both available, in a valid state and have the SSM agent installed, so that covers several of troubleshooting steps in one line (nice ;) ).
If you're using boto3
the same can be achieved with:
ssm.client.describe_instance_information()
I'm not certain whether it checks permissions but presume so. If your instance_id is missing from the list, you can ensure correct permissions by following the step by step here.
However, there is another cause (last but definitely not least as it's not obvious):
Freshly created instances take a little while to show up in the describe_instance_information
list.
This is even after waiting for the instance to complete post-creation. So for example doing:
# Key names are the same as the keyword arguments required by boto
params = {
'ImageId': image_id_to_use,
'InstanceType': instance_type_to_launch,
'MinCount': 1,
'MaxCount': 1,
'UserData': user_data_script,
'SecurityGroups': ['your groups'],
'KeyName': 'yourkeyname',
}
# Run the instance and wait for it to start
reservation = ec2.client.run_instances(**params)
instance = ec2.resource.Instance(reservation['Instances'][0]['InstanceId'])
instance.wait_until_running()
# Also wait status checks to complete
waiter = ec2.client.get_waiter('instance_status_ok')
waiter.wait(InstanceIds=[instance.id])
# Apply the IAM roles required (this instance will need access to, e.g., S3)
response = ec2.client.associate_iam_instance_profile(
IamInstanceProfile={
'Arn': 'your_arn',
'Name': 'ApplicableRoleEGAdministratorAccess'
},
InstanceId=instance.id
)
print('Instance id just created:', instance.id)
print('Instances in the SSM instances list right now:')
print(ssm.client.describe_instance_information()['InstanceInformationList'])
Will highlight this problem (if present - it certainly was for me).
This may be due to the time taken to execute the UserData script (see this SO post for a possibly-related discussion on waiting for user data to complete), but I can't tell (without more effort than I'm willing to take!) whether it's that, or just the time inherent in AWS updating its services database.
To solve this, I wrote a short waiter (with a timeout exception to handle other failure modes) that repeatedly called describe_instance_information() until the instance id showed up in the list.