Cloud computing is awesome until there is an error on booting and the SSH service doesn’t load for you. In that case, there isn’t an out-of-band console(ipmi/DRAC) to let you fix the issue like you’d have with a physical machine.

Recently, in response to the heartbleed/openssl vulnerability, we patched openssl. Then, after a reboot, we realized we couldn’t ssh into the instance, because something wasn’t right with sshd, so it didn’t load:

ssh: connect to host W.X.Y.Z port 22: Connection refused

The system logs from the ec2 management console showed ssh didn’t start because of a library conflict.

Starting sshd: /usr/sbin/sshd: /usr/local/firefox/libnssutil3.so: version `NSSUTIL_3.15' not found (required by /usr/lib64/libssl3.so) Cannot load /etc/httpd/modules/mod_ldap.so into server: /usr/local/firefox/libnssutil3.so: version `NSSUTIL_3.15' not found (required by /usr/lib64/libssl3.so)

Time to debug this by detaching and attaching that EBS root volume to another similar instance type:

  1. Create a similar instance “instance 2″ in the same availability zone
  2. Stop instance 1
  3. Take a snapshot of your root volume on instance 1 (for safety) and detach the root volume
  4. On the new instance, instance 2, attach the root volume as a secondary volume see: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html
  5. Mount the volume: $ sudo mount /dev/xvdf /mnt/secondaryvol/ (/dev/xvdf as specified it in management console)
  6. chroot your secondary volume: sudo chroot /mnt/secondaryvol/
  7. DEBUG! (in this case: mv /usr/local/firefox/libnssutil3.so /usr/local/firefox/libnss3.so.old; mv /usr/local/firefox/libnss3.so /usr/local/firefox/libnssutil3.so.old)
  8. type ‘exit’ to exit from chroot
  9. Umount the volume: $ sudo umount /mnt/secondaryvol/
  10. Detach the volume from instance 2
  11. Attach the volume back to instance 1 as /dev/sda1
  12. Start the instance