punkwalrus (punkwalrus) wrote,

Tech - Answer these questions, please

Recently, I have had the pleasure of interviewing System Administrator applicants that were to replace me when I left my current job. During these interviews, I asked some of the following questions (usually based on what they said their skills were on their resume), and almost all of them got them wrong. Most didn't even come close. Now I am wondering if these are too hard. In all these problems, I know what's wrong, because they are based on real events that either happened to me, or someone else I worked with. All of them are to test most basic, fundamental troubleshooting skills. The only "trick" here is that some of the data I give you is superfluous; not really part of the problem. I am looking for basic stuff here, like, if I ask, "Joe says the NFS server has stopped responding," I am not expecting you to answer how to tune kernel device drivers for the network card, I am looking for, "I'd look at xxxx logs, restart the service xxxxx, then check for xxxxx in xxxxxx." I only ask if you check a config file or log, specify which ones.

Comments are screened so and not to give anyone ideas. Real answers given later. If you see the answer later, and think, "Aw, man, how was I supposed to know THAT?" don't feel cheated. I am not asking you to actually guess the exact problem, I am testing what you'd do to generally find the problem in the first place. Remember, this is for FUN, and to see how many of my tech friends get close or actually guess the right answer. If none of you get them, I am going to assume these are too hard.

Those who think they might fail because they think they are not skilled are invited to answer anyway. If you totally don't know, I encourage you to make something really funny up. I need a good laugh, and no matter how "wrong" your answer is, I will not deride you in my head.

Hint: A few of these require you to "think outside the box." Assume all these systems are Linux, but I'll accept Sun-based answers, too.

Punkie is Interviewing you. How do you answer these questions?

  1. You are asked to do a kernel patch on a Red Hat 3.0 AS box. There are no errors, but when you reboot the box, the web server does not come up. This happens to be one of the top web servers on your Intranet, and the help desk is flooded with calls that people can't reach the ticketing system, which has a web interface on this machine connected to a database back end. List the first few (three or more, in the order you'd personally do them) things you would try do to diagnose and fix the problem in the next 5 minutes.
  2. You are paged in the middle of dinner, and it's one of the developers from the UK, and since it's long distance, you know it's important. He says the primary database server (in the UK) has been hacked, he's seen so in the system logs, and you're the only guy with root access. He wants you to shut it down, and switch to the backup database server. The only problem is you happen to know the backup server has been acting funny recently, with a lot of timeouts, and you haven't had the time to diagnose the problem. You explain your hesitation, and the UK guy says this is really urgent, this is really bad, and he's going to call your boss (who has been really crabby recently), get Interpol involved, and so on. What do you do so the boss doesn't have to be paged?
  3. You recently pushed out an upgrade package that fixed some database problems by tuning the kernel and the connection of some of the raw character devices the database writes to. Out of the 20 boxes that got the update, 2 of them would not reboot. When you go to the screen, it won't even get past the bootloader. Yet all the other boxes are fine. List the steps you would take to find out what happened, and how you might fix an error you found through the diagnosis.
  4. You have a 4 x P4 2.8ghz processor box with 4GB of RAM, and a huge SCSI RAID 5 array. This thing is screamingly fast. One of the main purposes of this box is to run a PHP message board with a PostGreSQL backend. Recently, you have noticed that during peak times, the box just freezes. When you reboot the box, the last thing the system logs state is massive memory errors, intermixed with some partition problems. List what you would look at to see what might be causing this, and how you would fix them.
  5. BoxA and BoxB are two replicate NFS servers that house some of the code of your developers. Recently, some rsynch processes between the boxes has been failing. It seems that while BoxA and B can ping each other, you can only ssh from BoxB to BoxA, but not the other way around. Every time you do an 'ssh rsynchbackupuser@boxb' from BoxA, the connection just hangs for a long time before it times out. What would you check to diagnose the problem, and what files would you look at?
  6. A manager from another department comes to your office and states he can't get to a Windows share, and after calling a dozen people, he finally got someone to tell him it's on a Linux box, you're "the Linux Guy," and you better damn well fix it, because he's sick of not having access to it. Your boss comes in during this heated conversation, and promises the guy you'll see to it right away. List how you would diagnose this issue.
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded