A Long Day: Status Update on Yesterday’s Server Issues

Yesterday was admittedly a long day for me. Jim, Martha, and I were all traveling down to Atlanta to prepare for the Domain Incubator conference that Emory University is putting on Friday and Saturday. It’s an exciting opportunity to brainstorm how schools can use services like this (or DIY) to build a Domain of One’s Own type of program and I particularly like that Atlanta is thinking of it as a regional hub rather than simply attempting to shoehorn their institution into buying into it wholesale. But I found myself yesterday scrambling to fix issues on our servers that I had never seen before and since the issues persisted for much of the day I wanted to provide a little redux post of what happened here.

Wednesday night I received a report from someone that their site was running really slow and sometimes unavailable. I checked on it and I could load her website but I did notice occasionally it wouldn’t connect. I would refresh and it would come right back. I checked the server she was on to see if there were any load issues or high-traffic sites causing issues. I didn’t see anything abnormal but I did a few various tweaks to some settings and asked her to let me know if it continued and I went to bed. In the morning I woke up to a few messages not just from the one customer anymore, but a few others reporting spotty availability. What immediately struck me was that the customers were across multiple servers. Either there was a coordinated attack on all our servers at the same time, or there was a larger issue perhaps with our data center. As I mentioned yesterday we were getting ready to head out of town so I had to get my daughter to daycare and get packed up but as soon as I got in the office I put in a support request to our server provider asking if there were any network issues we should know about and explaining the issues we were having. They replied that they were unaware of any issues at that time.

So I spent the better part of the morning trying every trick in the book I could think of to resolve the issue. Rebuild Apache, check MySQL, suspend a few accounts with high traffic temporarily, reboot, reboot again, check the firewall, disable the firewall. Nothing was working and the issue persisted on both servers we own. I asked the company if we could pay them to look at this issue as I had no other ideas left to try and I had never seen a spotty connection like this before (many websites were actually fully functional, but most could not access their dashboard, edit pages, do any administrative tasks without lots of refreshing and frustration). By this point I had to get on a plane to Atlanta and figured I’d have to continue working on this once I got to the hotel.

Luckily our airplane had wifi so I was able to keep working on things. I finally received an email during the flight (this was about 5:30pm) from the company that said “We have narrowed down the cause of the issue and We will be performing emergency maintenance tonight at 10pm.” Hallelujah! It turned out to be a bad network switch that routes the network to and from both of our servers (along with many others I’m sure). I updated everyone on Twitter and sure enough, when the switch was replaced at 10pm the problem resolved itself.

Obviously no time is a good time for downtime but I’m especially aware of the fact that many final projects are due around this time period as schools begin to finish up their Spring semester. Ironically I was using my own domain for a final project in a graduate program I’m in and I host it here (eating my own dogfood) so I too was feeling that pain. This was an especially difficult issue because it’s one of the first times that we’ve had to look above our own servers to our network provider as the cause of the issue (and the company initially did not indicate there was an issue at all). In the future as Reclaim Hosting grows we will likely be able to find ways to mitigate this by diversifying the companies we use for servers, the location of those servers, etc. I’ve never compared us to enterprise hosting like GoDaddy, Bluehost, Dreamhost, MediaTemple, etc because I believe what we’re doing here is and can be fundamentally different. With that comes some growing pains of course, but I’m glad to have you all be a part of what we’re doing here and I look forward to continue to build Reclaim right alongside you all.

Thanks for your support,

Tim Owens

Egypt Calling or, Why Open Rules

Maha Bali BlogI had been procrastinating a bit on the “Reclaim Your Domain” workshop I’ll be running in just about an hour’s time at the Sloan-C Emerging Technologies Conference.  This is the first time I’ve workshopped what we’re doing at UMW with Domain of One’s Own, and thanks to the service Tim Owens and I started, appropriately named Reclaim Hosting, it’s been pretty cool. I already talked about the setup of the workshop here.

You see the workshop will also include virtual participants, In fact, there will be folks joining in as far away as Egypt as I learned on Twitter.

So I wanted to be sure I had documentation and a platform that would enable the virtual folks to particpate seamlessly.

As a result I learned how awesome Maha Bali is (I was originally referring to her as Bali because of her domain, I suck). Two days ago Tim and I got the reclaimdemo.com site up and running (admittedly later than promised) and thanks to Maha, who was kind enough to offer testing rom afar, I think it might actually work.

But more than testing out the site, and test it out she did, she also wrote a post on her sandbox domain at bali.reclaimdemo.com calling out some key questions I need to address more broadly as part of the Reclaim push:

I feel there is an assumption here that people taking this workshop already buy into the idea of “owning” or “reclaiming” one’s own domain. I am not clear on all the arguments for that yet (need to read and discuss some more) but I definitely do feel like my online presence is distributed and I would like to have it all in one place under my control. I just don’t know if that will complicate my life more than I need

This is the real question I need to get at whether or not the platofrm works. I assume folks see the value, and I forget that I have to try and make this whole thing relevant to someone who isn’t necessarily an edtech fanatic like me. I need to step back a bit and start thinking about what a domain and web hosting has offered me on personal, professional and practical levels as an educator, edtech, father, and more generally a person online in the 21st century.

This was truly invaluable feedback, but I am not surpised because Maha seems ot be a truly cool person who embodies the spirit of open collaboration and is a welcome reminder that these networks lead us to real people. This is what happens when you openly and actively engage people online. Openness is like the force in Star Wars: “a river from which many can drink.” But Maha says all this much better on her blog, which may even be hosted on her own domaina nd web hosting with a shiny new domain sometime soon :)