After leaving my computer on for the night, I awoke to a frozen screen. My laptop is power by Ubuntu 10.10 and has some problems with hibernate at times. So I proceeded hard power down to reboot. Only when it came up I got a blank screen. Oh crap did my video card die? Disk, are you corrupt?
So it it may be something screwed something up last night via my hadoop installation, but not sure what. I also was suspicious it could be a disk-hw issue. The following is a log of what I had to do to bring it back up.
1) First when installing ubuntu, you get a recovery mode installed that allows you bypass all the gnome stuff. I used this to get to the recovery menu.
I picked "root login with network only" to get a bash shell
Next i typed in
It logged in into ubuntu desktop under my root account. I ran
2) Now I tried to logon via my bash to see if it was the gdmsetup (the login window) or the login itself
Ouch, so it can't seem to find my home directory I think, which is set by /etc/passwd.
Opening that file and I changed three different accounts and changed login shells, /bin/sh, /bin/dash and /bin/bash. All accounts failed.
Now the clues I have, root login is happy and no user account can login. Likely permissions. So I check the /home/user folders, and perform a chown -R jag:jag /home/jag
3) Desperate, and 2 hours gone in mucking around, I searched and found this:
http://linuxgazette.net/issue52/okopnik.html
Though 10 years old, the flow chart of the linux login is still relevant. What it told me is that the LOGIN DID work, and really only thing wrong is permission to the shell /bin/sh.
So I checked permission on /bin/sh and its librarys
everything was 755 or better. Nothing wrong here it seems. i go through permission of all the lib folders /lib /usr/lib, /user/local/lib . All seems correct and undisturbed.
4) Now I started wondering, no permission, login working... what about logs
in /var/log
syslog.log
5) 3 hours and many missteps and reboots, I remain confused, but convinced that only way an executable can't run is corruption or permission. The former is not possible because under my root account its the same problem
So I do an strace on "login jag". I find the failure is indeed just the /bin/bash
Okay to cut a long story short, I ran
How bad was it ? I read that you can get a Manifest file on what permission should be. Luckily for me I can tell that all the permission of files throughout the system seem good (i.e. I didn't run a chown -R, or chmod -R). But the root "/" are 600. I logged into another ubuntu server in the cloud, and low and behold, ALL ROOT FOLDERS are 755, with exception of paths /root and /lost+found are 600 and tmp is 777.
And there you go. So the moral of the story is, if Linux breaks, 9/10 times its YOU that broke it. Review your ~/.bash_history to see if you ran something malicious with chmod or chown. And this is not Window, you do not need to re-install the OS everytime a registry settings goes bad.
Some other things I did that didn't help but could in other cases
- Ensure that passwd file is valid with $ pwconv
- run $ passwd -u <name> to ensure the account is not locked out
- Create a new user with useradd, passwd to have a fresh example
- Removed the ~/.Xsession in case we have some cached login
- Never change permission on system files, without 100% being sure. You can
- Ran apt-get upgrade --show-upgraded in case we had some broken packages
- Ran dpkg-reconfigure gdmsetup ubuntu-desktop to check if packages had been deselected
- Ran ssh login to see if it was isolated to gnome or just general login.
So it it may be something screwed something up last night via my hadoop installation, but not sure what. I also was suspicious it could be a disk-hw issue. The following is a log of what I had to do to bring it back up.
1) First when installing ubuntu, you get a recovery mode installed that allows you bypass all the gnome stuff. I used this to get to the recovery menu.
I picked "root login with network only" to get a bash shell
Next i typed in
$ ifconfig eth0 up
$ startx
It logged in into ubuntu desktop under my root account. I ran
$touch somefile.txtAlso df, mount, dmesg, to check if there were hw errors that cause the drive to mount read-only. I found none. At this time I could of ran fsck, but didn't look like a simple corruption so I decided to skip consistency checking and keep it in my back pocket
2) Now I tried to logon via my bash to see if it was the gdmsetup (the login window) or the login itself
$ login jag
No directory, logging in with HOME=/
Cannot execute /bin/dash: Permission denied
Ouch, so it can't seem to find my home directory I think, which is set by /etc/passwd.
Opening that file and I changed three different accounts and changed login shells, /bin/sh, /bin/dash and /bin/bash. All accounts failed.
Now the clues I have, root login is happy and no user account can login. Likely permissions. So I check the /home/user folders, and perform a chown -R jag:jag /home/jag
3) Desperate, and 2 hours gone in mucking around, I searched and found this:
http://linuxgazette.net/issue52/okopnik.html
Though 10 years old, the flow chart of the linux login is still relevant. What it told me is that the LOGIN DID work, and really only thing wrong is permission to the shell /bin/sh.
So I checked permission on /bin/sh and its librarys
$ ls -alF /bin/sh
lrwxrwxrwx 1 root root 4 2011-11-03 18:36 /bin/sh -> dash*
$ ls -alF /bin/dash
-rwxr-xr-x 1 root root 105704 2010-06-24 13:02 /bin/dash*
$ ldd /bin/dash
linux-vdso.so.1 => (0x00007fff2c793000)
libc.so.6 => /lib/libc.so.6 (0x00007fbe75206000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbe755ae000)
everything was 755 or better. Nothing wrong here it seems. i go through permission of all the lib folders /lib /usr/lib, /user/local/lib . All seems correct and undisturbed.
4) Now I started wondering, no permission, login working... what about logs
in /var/log
syslog.log
auth.logFeb 11 14:38:38 jsrawan-xps16 x-session-manager[2347]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Feb 11 14:38:38 jsrawan-xps16 x-session-manager[2347]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Feb 11 14:38:42 jsrawan-xps16 NetworkManager[1173]: <warn> error requesting auth for org.freedesktop.NetworkManager.use-user-connections: (5) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.PolicyKit1 was not provided by any .service files
Feb 11 14:38:42 jsrawan-xps16 NetworkManager[1173]: <warn> error requesting auth for org.freedesktop.NetworkManager.network-control: (5) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.PolicyKit1 was not provided by any .service files
Feb 11 16:08:39 jsrawan-xps16 login[3660]: pam_unix(login:session): session opened for user testuser by jsrawan(uid=0)Okay the auth login doesn't given any better errors. Syslog doesn't seem good, but I have no clue if that is the problem. One thing I realize si the Syslog is occuring on bootup, not PER login. So I rule that out for now assuming its a red herring.
Feb 11 16:08:39 jsrawan-xps16 login[3660]: pam_unix(login:session): session closed for user testuser
5) 3 hours and many missteps and reboots, I remain confused, but convinced that only way an executable can't run is corruption or permission. The former is not possible because under my root account its the same problem
So I do an strace on "login jag". I find the failure is indeed just the /bin/bash
strace -s 10000 -vfo login.jag login jag
4135 execve("/bin/bash", ["-bash"], ["TERM=xterm", "LANG=en_US.UTF-8", "HOME=/", "SHELL=/bin/bash", "USER=jsrawan", "LOGNAME=jag", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games", "APPLICATION_ENV=development", "MAIL=/var/mail/jag", "HUSHLOGIN=FALSE"]) = -1 EACCES (Permission denied)Awe, I realize that the /bin/bash is trying to access /HOME, which is probably fine and then it will cd to your ~. But the /home is usually root owned, and what did it provide for 'other' permissions? What should the permission be?
Okay to cut a long story short, I ran
$ls -alF /home /And found out that the actually DIRECTORY was a 600 (rw- --- ---). This meant that the bash login could not launch in that directory, which is a problem.
How bad was it ? I read that you can get a Manifest file on what permission should be. Luckily for me I can tell that all the permission of files throughout the system seem good (i.e. I didn't run a chown -R, or chmod -R). But the root "/" are 600. I logged into another ubuntu server in the cloud, and low and behold, ALL ROOT FOLDERS are 755, with exception of paths /root and /lost+found are 600 and tmp is 777.
And there you go. So the moral of the story is, if Linux breaks, 9/10 times its YOU that broke it. Review your ~/.bash_history to see if you ran something malicious with chmod or chown. And this is not Window, you do not need to re-install the OS everytime a registry settings goes bad.
Some other things I did that didn't help but could in other cases
- Ensure that passwd file is valid with $ pwconv
- run $ passwd -u <name> to ensure the account is not locked out
- Create a new user with useradd, passwd to have a fresh example
- Removed the ~/.Xsession in case we have some cached login
- Never change permission on system files, without 100% being sure. You can
- Ran apt-get upgrade --show-upgraded in case we had some broken packages
- Ran dpkg-reconfigure gdmsetup ubuntu-desktop to check if packages had been deselected
- Ran ssh login to see if it was isolated to gnome or just general login.