I was fighting with an AWS bug today seeming just ridiculous for a system that makes ssh the only way to access the server. With no console login EC2 users are completely dependent on flawless sshd operation and Amazon team just did a really bad job preparing the RHEL 6.4 image.
So here are the symptoms: you create an image from a perfectly working system with the purpose to clone it. Then you create new instance from the AMI and it seems working fine too, you can ssh to it no problem. After a reboot or too it suddenly stops responding to ssh thought the rest of the system seems working fine (HTTP server for instance). Server logs do not show any errors. Reboots do not help and EC2 doesn’t provide console access to the server.
Here is the solution:
Well, first of all check if you’re connecting to right IP/dns name. After each reboot most likely they changed for your instance. It’s kind of obvious thing still frequently overlooked. If the destination address is right, but ssh is still not responding then follow this procedure:
- Terminate the node. Sorry, it’s a goner in a no-console world
- ssh to the node from which you created the AMI
- check out /etc/rc.d/rc.local file. The last three lines of the file is a total mess create by Amazon engineers as an ugly attempt to fix sshd_config. It looks approximately like this:
… and this is horrible. This file not only keeps patching sshd_config on every reboot, it also does it wrong without providing the necessary newline before the first line. As the result the sshd_config gets completely screwed. Remove these three lines (or comment them out, but better remove).
- No open /etc/ssh/sshd_config and fix what was ruined by rc.local. The last two lines of the config should look like this:
These parameters should be only mentioned once and one per line
- Once the config is patched create new AMI and new instance from the corrected AMI. It should work fine.
Good luck with AWS, you will need it.