Quick Tip: How to recover when you can’t SSH into your Amazon EC2 instance

Cloud computing is awesome until there is an error on booting and the SSH service doesn’t load for you. In that case, there isn’t an out-of-band console(ipmi/DRAC) to let you fix the issue like you’d have with a physical machine.

Recently, in response to the heartbleed/openssl vulnerability, we patched openssl. Then, after a reboot, we realized we couldn’t ssh into the instance, because something wasn’t right with sshd, so it didn’t load:

ssh: connect to host W.X.Y.Z port 22: Connection refused

The system logs from the ec2 management console showed ssh didn’t start because of a library conflict.

Starting sshd: /usr/sbin/sshd: /usr/local/firefox/libnssutil3.so: version `NSSUTIL_3.15' not found (required by /usr/lib64/libssl3.so) Cannot load /etc/httpd/modules/mod_ldap.so into server: /usr/local/firefox/libnssutil3.so: version `NSSUTIL_3.15' not found (required by /usr/lib64/libssl3.so)

Time to debug this by detaching and attaching that EBS root volume to another similar instance type:

  1. Create a similar instance “instance 2″ in the same availability zone
  2. Stop instance 1
  3. Take a snapshot of your root volume on instance 1 (for safety) and detach the root volume
  4. On the new instance, instance 2, attach the root volume as a secondary volume see: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html
  5. Mount the volume: $ sudo mount /dev/xvdf /mnt/secondaryvol/ (/dev/xvdf as specified it in management console)
  6. chroot your secondary volume: sudo chroot /mnt/secondaryvol/
  7. DEBUG! (in this case: mv /usr/local/firefox/libnssutil3.so /usr/local/firefox/libnss3.so.old; mv /usr/local/firefox/libnss3.so /usr/local/firefox/libnssutil3.so.old)
  8. type ‘exit’ to exit from chroot
  9. Umount the volume: $ sudo umount /mnt/secondaryvol/
  10. Detach the volume from instance 2
  11. Attach the volume back to instance 1 as /dev/sda1
  12. Start the instance