• Category Archives SysAdmin
  • [Sysadmin 101] Keeping Config Files Synced

    Posted on by Kevin Sonney

    One of the things I go REALLY tired of when I was setting up new machines without shared home directories was copying over ALL my personalized configurations over and over and over again. So I wrote a script to copy everything over. And then I realized that, somewhere along system number 10, the configuration files I was copying were tuned differently from the earlier ones, and in order to keep my settings consistent, I needed to go back and recopy everything…

    …if you’re a developer or a sysadmin, you know where this is going. I checked everything into a version control system – git in this case – and started checking out the files on the new locations, and pulling updates on the others. This makes perfect sense, and as things get updated on one box, I can just pull the changes on the others on demand.

    And this REALLY works well for me.

    I recently added Dropbox to the mix. Howso? I moved the “master” git repository int a Dropbox folder. So on the machines where I have Dropbox (i.e. all my home systems, my work desktop, etc) I no longer have to pull the changes – Dropbox is doing that for me. And for the systems where I can’t use Dropbox (like the VM I have at Lylix or my hosted server out at Dash Systems) I can still check in & out the files like I always did.

    I realize this is probably old news to some of you (and if you have a shared drive for your home directory, you probably don’t need this at all). Be aware that file conflicts might come up if you’re not careful, but in general it’s been the perfect solution for me.

    So, for those who want to know “How I do that” and have a fair grip of the Linux and/or OSX command line, here is the step-by-step  :
    (more…)

    Bookmark/FavoritesDeliciousEmailEvernoteShare

  • Listen to your gut

    Over the course of my career, I’ve been through several re-organizational “events.” I’ve also come to the point, now, where my gut can tell me roughly where things stand.

    Basically, at this point in my career, if I can’t see it coming, I’m a moron. Most of us have a hind-brain function that recognizes when something is amiss in the workplace, and in a lot of cases, we’ll dismiss it, or project it elsewhere in our lives (our loved ones in particular will take the brunt of this). The hindbrain is a wonderful thing, and if we’d just learn to pay attention to it more, we’d probably have a better idea why the “fight or flight” reflex seems to be twigged out.

    Let me say it this way : If you think something is amiss in your workplace, it probably is. I realize we’re REALLY fucked up as a species, and will take all kinds of abuse at work we won’t take anywhere else, so we think a lot of what we feel at work is NORMAL. Guess what?

    It’s not. I’ve been in high-stress jobs, I’ve driven myself to a breakdown because of them, I’ve not seen it all, but I’ve been enough places – both good and bad, or that have transitioned from one to the other – that I can say confidently : If you think it is time to change jobs, change jobs. If you think the company is about to do something that will impact you negatively, it probably is. And if, by any chance, your management team starts to behave differently…

    …well, I use that as a sign to update my resume. And you should too.

    Bookmark/FavoritesDeliciousEmailEvernoteShare

  • [SysAdmin 101] The Scotty vs. LaForge SysAdmins

    I found myself summarizing this concept AGAIN on IRC, and figured it was time I actually wrote it down.

    Over the years, I’ve found that there are primarily two kinds of Systems administrators[1] : the Scottys and the La Forges. Anyone who is familiar with Star Trek[2] will get the reference. But for those who don’t, let me explain, and then I’ll talk about the pros and cons of each.

    Scotty[3] is the Systems Administrator who feels the need to make everything into a miracle. All jobs are extremely difficult or just plain impossible, but they can get that task done. Scotty will tell you it’ll take four hours and won’t be that impressive, then custom builds a solution in two hours with bells and whistles. Server hardware fails at 3am? Scotty emails you that it’ll be tomorrow before it can be working, and then has it back up before 8am.

    Scotty, in essence, under promises and over delivers, usually with one-off or custom solutions.

    La Forge[4], on the other hand, is the System Administrator who makes everything routine. Everything, from the simplest task to the most difficult, gets done in a timely manner with little or no grandstanding. If La Forge says it’ll take two hours, it takes two hours, and is delivered to spec. La Forge can think on his feet, is cool under pressure, and keeps things running smoothly, even in an emergency. Server down at 4am? The system is redundant, so no-one even knows it went down. Need secure access to a customer site? La Forge builds it so that it can be done the exact same way for the next 50 sites.

    La Forge, in essence, delivers to spec and on time, and plans for contingencies.

    There are pros and cons to each. And each will excel in different environments.

    In a small company, a Scotty will do well. Resources are tight, miracles need to happen regularly, and for the first year or so, one-off solutions aren’t a bad thing. The down side to a Scotty is twofold. One, the miracles they’ve worked for the last year or so can only be supported by Scotty. Sooner or later there are five (or more) different solutions to achieve the same results, each one better than the last, and all five solutions are in production. Documentation is sporadic – either over-complete or missing entirely.

    Scotty will either get phased out or leave a small organization when it grows past a certain point. In larger companies, Scotty is either part of an R&D group, or a minor player in a much larger IT department, where these sorts of one-off or customized solutions have a low impact on the company as a whole.

    La Forge, on the other hand, excels in medium to large companies. With a focus on stability, repeatability, and scalability, a La Forge is what a lot of people think of when they hear the words “IT Manager.” he can do the job, he has talented staff do delegate to, and has the knowledge to keep big, complex systems running smoothly. La Forge has a plan for almost every contingency, and if they’re hit with something new, you can be sure the solution will be documented, and the exact same solution used next time it comes up. La Forge will make sure that documentation is complete, up-to-date, and that the organization can go on without them.

    In Medium to Large companies, La Forge usually rises to either team lead of department manager. In smaller organizations, they often come in after a Scotty has left, or are hired to manage a Scotty (or group of Scottys). A La Forge won’t fit very will in a startup.

    Scottys can become La Forges and vice-versa. It’s not an easy transition, and generally a Scotty will transform into a La Forge as time goes on – as the company grows, as they mature as a sysadmin, or as the needs of the organization (big or small) change. If not, a Scotty will move on to the next job (voluntarilly or when they are no longer a “good fit” organizationally).

    A La Forge can also turn into a Scotty if the job demands it. But this transition is usually much more painful, and they will do everything they can to re-assert their La Forge tendancies as the job goes on.

    As for me? I’ve been both. And in the end, I MUCH prefer the La Forge style to the Scotty style. I get more sleep, my systems run better, and my customers are happier. Right now, though, I have to be a Scotty/La Forge hybrid, trying to convert a set of systems built by a Scotty into something scalable, managable, and repeatable. Something the next sysadmin after me can manage, maintain, and grow.

    (Thanks to Scott and Ian for reviewing this post before publication)

    [1] And I do mean primary – one of my reviewers of this post identified at least two other types. I’ll talk about those in a later blog post.
    [2] Flame wars aside over which series was better, if you haven’t seen or heard of Star Trek, where have you been for the last 40 years?
    [3] Chief Engineer Montgomery Scott
    [4] Chief Engineer Geordi La Forge

    Bookmark/FavoritesDeliciousEmailEvernoteShare

  • SysAdmin Wonkery

    So when I took this job, back in May of last year, I knew there would be some challenges. I was taking over after they were without a Linux admin for close to 8 months, the prior systems philosophy had been that each server was a unique and special hand-crafted snowflake, that the client sites are all unique and unpredictable, and that it has to be that way because, well, no two sites or internal uses are alike.

    I call bullshit on all of that.

    I have spent the last 7 months making that a scalable, standardized, and maintainable environment :

    And then there’s all the other bits and pieces that go with day-to-day systems administration. But those? Those are the milestones.

    On my list for 2011 :

    • Finish retiring the monitoring scripts that send emails from the remote servers[4]
    • Implement a proper request tracking system
    • Complete a cleanup of the internal wiki or migrate all the pertinent data to a new, clean one
    • Implement a NAS or SAN, depending on what gets approved
    • Server redundancy for the virtual infrastructure[5]

    You can expect to hear about some of that as I go about it this year. And if you have suggestions or comments, I’m always open to hear how others are solving similar problems.

    But for now? I’m off to kill webmin with extreme prejudice on all the company servers.

    [1] To replace, and I am not making this up, a set of SSH tunnels to a set of SSH reverse tunnels
    [2] Hey, you know what’s fun? 20 servers all with apache httpd & tomcat installed, and no two have the same version, configuration file layout, or installation location.
    [3] When I got here, the servers were being named after Star Wars characters and locations. I am not making this up. The only thing more cliche I could come up with was the Matrix, Dilbert, or Star Trek. And I still have to fight the urge to yell “It’s a TRAP!” every time someone complains about server ackbar being problematic.
    [4] Hey, you know what else is fun? Finding out that there is a script running to email you every minute to tell you that the mail relay is unreachable, and that the server can’t send email. *headdesk*
    [5] I like to sleep at night, so I’ve come to grips with[6] the fact that, as of right now, if I lose any ONE of the ESXi servers I inherited, we’re dead in the water until it and all it’s VMs can be rebuilt.
    [6] Where “I’ve come to grips with” = “I drink heavily to forget”

    Bookmark/FavoritesDeliciousEmailEvernoteShare

  • OpenNMS Remote Polling – How I’m doing it

    [Disclaimer: This particular post is really for the technical readers of this blog. I'm just warning you. It's a long one, and not for the faint of heart, or the people who are here for the witty things Ursula and I do. But I figured, this is something I'm doing that almost nobody else is doing, and I should share for others who are might find it useful.]

    I work for a company named RadarFind (yes, the website needs love. No, this is NOT part of my job there). You can look at the particulars of what we do at the homepage. In general, what I do is manage customer devices at remote sites. One of these is the primary appliance, which is a Linux server. This server talks to a bunch of other devices that handle the actual tracking and data collection.

    So I’ve got a TON of remote devices that I need to know, at any given time, if they are up or down. And if they go down, I need to know it. If they come back up, I need to know it. The original implementer used a shell script on each Linux box to poll the devices and send an alert if a device was up or down. However, this told me NOTHING about the site other than up or down information. What we need is a monitoring solution that can tell me not only the up/down state, but also alert me to same AND gather useful statistics about the site.

    Enter OpenNMS.
    (more…)

    Bookmark/FavoritesDeliciousEmailEvernoteShare

  • Why your SysAdmin SHOULD be goofing off

    There is a myth out there that Systems Administrators should be busy, frantic, and overworked. But in truth, SysAdmins should be “goofing off” at work. And that is a good thing. If the SysAdmin in your organization is browsing Slashdot, their Google Reader feeds, or even playing a game, it doesn’t mean they’re not doing their job. Far from it. It means that they’re doing their job perfectly.

    Now, let me explain and clarify that. Because I know some of you are calling “bullshit” on that last sentence. First up, a couple of questions :

    Do you complain when you see firemen behind the station having a cookout and playing basketball?  Do you complain when you see a group of EMTs sitting around a table playing cards in the hospital break room? No, you don’t. Why is that?

    If the firemen or EMTs are able to take “down time” on the clock, it means everything is OK. The equipment has been checked and is working, the vehicles are all clean and fueled, and, most importantly, there are no emergencies right now. And we know that when the alert goes off, they’ll drop everything to do their jobs. And once the engines and equipment are back in place, refilled, and repaired (and cleaned if needed) they go back to what they were doing.

    Systems Administrators should be no different, no matter what organization they are working for. But often, they are treated like the accounting staff, or  the janitors. If they aren’t doing something RIGHT THIS SECOND, obviously, they aren’t doing their job. Which is the exact opposite of what’s really going on.

    Systems Administrators are somewhat different than their Corporate Brethren. They spend an inordinate amount of time in late, in early, and working weekends. Their work happens in the middle of the night, and goes largely unseen. The security patch that reduced the spam in the VP’s inbox? That probably happened around 2am. The updates to the web server that allow it to run faster? Saturday or Sunday, when the customers aren’t working. One of my favorites :  restoring the production server after a hard drive failure just before end of business for the day. And the list goes on – just ask your nearest sysadmin.

    A Systems Administrator lives a life of interruptions, emergencies, and rapid changes. They have to keep themselves up on the latest changes in technology, software, and computer security.  In addition, they’re expected to anticipate the computing needs of the organization before the organization itself knows it needs an upgrade, or expansion, or new technology.

    If your Systems Administrator is “goofing off” at work, it should mean that, like the emergency personnel above, everything is running smoothly, all the patches are applied, and he or she is caught up on the rest of their duties. All is well on the network. But rest assured when the pager goes off, or the monitors indicate a server is about to fail, they’ll jump into action.

    Because that’s what they do. And when all is settled and stable, and quiet, you can be sure they’ll be getting ready for the next emergency. And maybe goofing off. Just a little.

    [Addendum]  It’s also worth noting that if your Systems Administrator is ALWAYS dealing with emergencies, or never seems to have down time, that something is wrong on a much more fundamental level. This is usually caused by one of the following :

    1. The systems are more complex than the current staff can manage. It’s time to expand the systems staff.
    2. The systems are insufficient to handle the tasks at hand. It’s time to look at the design of the systems.
    3. The systems administrator isn’t doing their job.

    In cases 1 and 2, the sysadmin staff will usually be telling their management that this is the situation. In many cases, an organization will go through options one and two before realizing the problem isn’t the environment, but the admin (case 3). The admin is either slacking off in general or doesn’t know what they’re doing. If you happen to be that admin, let me give you some advice – up your game. Pick up some new skills, find a mentor, swallow your pride and get some additional help…if you value your job and your reputation, do what you have to do to make that system shine.

    Because if you don’t, there’s someone waiting in the wings to solve all the problems in your environment, often at the expense of your  budget –  and in some cases, your job. Not a good place be, in any economy.

    Bookmark/FavoritesDeliciousEmailEvernoteShare


  • dinamic_sidebar 4 none

©2012 Kevin Sonney Entries (RSS) and Comments (RSS)  Raindrops Theme