Ansible and Configuration Management at Reclaim Hosting

Ansible and Configuration Management at Reclaim Hosting

When we started Reclaim Hosting it was with a lot of hopes and dreams and a single server. Hard to believe we only had a single server to manage for the first ~8 months of Reclaim's existence (and Hippie Hosting was always so small that it always only had a single shared server for all customers). Those days however are long gone, today Reclaim Hosting manages a fleet of over 100 servers for customers and institutions across the globe. As you can likely imagine, it's been a hell of a learning curve to get to a point where we are comfortable with such a large infrastructure.

Unlike a high-availability setup where you might have a bunch of servers spinning up and down but you'd never be logging into a particular one, when it comes to web hosting each server is managed individually. SSH Keys and a shared password management system go a long way to alleviate headaches with access. But the biggest hurdle has been configuration management and I finally feel we're starting to get to a place where I'm comfortable with it (I think it will always be a process). When I say configuration management I mean keeping track of and managing everything from the versions of PHP installed, what system binaries are installed and available like git, upgrades to software over time, and things of that nature. With an infrastructure this large split currently across 3 different companies, many different datacenters, and even a few different operating systems and versions it is inevitable to find that what you thought was running on one server is actually out of date or didn't get installed at the time of provisioning.

There are two things thus far I have done to tackle this issue. The first was to take a hard look at our provisioning process. In the past this was completely manual. Fire up a server and start working through a list of things that needed to be installed. If I normally installed something but forgot to mention it to Jim then there's a great chance it wouldn't get installed. And if the coffee hasn't kicked in then the whole system by its very nature is just prone to user error resulting in an incomplete system, not to mention this could take between a few days to up to a week to complete if other stuff got in the way. It may seem simple, but I created a bash script as a very simple way to get at the deploy process. There are a few prerequisites that have to be installed before running it (namely git since it's in a private repo and screen so we can run it with the session closed) but what used to be a process measured in days can now complete the majority of work in about an hour and a half. Here's everything thus far that the script does:

  • Install git, screen, nano, and ImageMagick
  • Install cPanel
  • Run through some first-time setup options for cPanel configuration
  • Compile PHP with all extensions and Apache modules we need
  • Install Installatron and configure settings
  • Install Configserver scripts (firewall, mail management, exploit manager) and configure with settings
  • Update php.ini with approved values
  • Install Let's Encrypt
  • Install Bitninja (a distributed firewall product we use)
  • Setup custom cPanel plugin Reclaim Hosting uses for application icons
  • Configure automatic SSL certificates
  • Reboot server

Mostly my process has been working backward from what things we used to install and configured and finding what commands I could use to do the same process (installation is always with cli but often configuration is with the GUI so finding the right way to configure was a bit more time consuming and sometimes involved editing files directly using sed). We have more that could be done, again I treat this all as a process where we can continue to refine, but this has gone a long way to making that initial hurdle of setting up servers a "set it and forget it" approach.

So with our deployment process becoming more streamlined the second piece of the puzzle was to get a handle on long term configuration of servers, the various changes and settings that we have to manage. If Apache is getting an upgrade it should be tested on a dev server and with approval it should be pushed to all servers to avoid any mismatch. In the past that meant me opening up a bunch of tabs and getting to work but that's not a scalable approach. I've taken to learning more about Ansible as a configuration management tool and it has already saved me countless hours.

Ansible doesn't require any agent to be installed on our machines, it uses SSH access to run its commands. Commands are put together with "Playbooks" which are nothing more than .yml text files. There are various "roles" that can be used to handle everything from installing software with yum and moving files back and forth, to more complex tasks, and people can write their own roles so there is lot out there already for popular approaches to configuring and managing servers. At this point you might be thinking "Well if it can do all that why did you write a bash script instead of using Ansible for deploying servers?" and you're not wrong Walter. Long term it probably does make sense to do that, but Ansible playbooks have a very specific way of doing things that is replicable across a lot of servers and frankly it would require a lot of work to rewrite the deployment method that way so it's a goal but not a major issue in my eyes.

Now with Ansible if I decide I want to roll out HTTP2 support to our servers I can write a small playbook that installs it via yum and then run that against our entire fleet. If a server already has the support for it then it doesn't have to make a change so there's no harm in running playbooks on a variety of servers that may or may not have common configurations to get them all up-to-date. If anything the biggest challenge is not writing the playbooks (which I actually enjoy), it's keeping our inventory file that holds data for all of our servers up-to-date. A dream would be to use the Digital Ocean API to dynamically update our inventory in Ansible so if a new server is added there it's automatically added to our inventory.

I'm confident that developing these configuration management processes will help us ensure our customers benefit from the latest software and better support with a standard environment they can count on from one server to the next. And if it saves us time in the process to devote to new development work even better.

Ansible and Configuration Management at Reclaim Hosting

Ansible and Configuration Management at Reclaim Hosting

When we started Reclaim Hosting it was with a lot of hopes and dreams and a single server. Hard to believe we only had a single server to manage for the first ~8 months of Reclaim's existence (and Hippie Hosting was always so small that it always only had a single shared server for all customers). Those days however are long gone, today Reclaim Hosting manages a fleet of over 100 servers for customers and institutions across the globe. As you can likely imagine, it's been a hell of a learning curve to get to a point where we are comfortable with such a large infrastructure.

Unlike a high-availability setup where you might have a bunch of servers spinning up and down but you'd never be logging into a particular one, when it comes to web hosting each server is managed individually. SSH Keys and a shared password management system go a long way to alleviate headaches with access. But the biggest hurdle has been configuration management and I finally feel we're starting to get to a place where I'm comfortable with it (I think it will always be a process). When I say configuration management I mean keeping track of and managing everything from the versions of PHP installed, what system binaries are installed and available like git, upgrades to software over time, and things of that nature. With an infrastructure this large split currently across 3 different companies, many different datacenters, and even a few different operating systems and versions it is inevitable to find that what you thought was running on one server is actually out of date or didn't get installed at the time of provisioning.

There are two things thus far I have done to tackle this issue. The first was to take a hard look at our provisioning process. In the past this was completely manual. Fire up a server and start working through a list of things that needed to be installed. If I normally installed something but forgot to mention it to Jim then there's a great chance it wouldn't get installed. And if the coffee hasn't kicked in then the whole system by its very nature is just prone to user error resulting in an incomplete system, not to mention this could take between a few days to up to a week to complete if other stuff got in the way. It may seem simple, but I created a bash script as a very simple way to get at the deploy process. There are a few prerequisites that have to be installed before running it (namely git since it's in a private repo and screen so we can run it with the session closed) but what used to be a process measured in days can now complete the majority of work in about an hour and a half. Here's everything thus far that the script does:

  • Install git, screen, nano, and ImageMagick
  • Install cPanel
  • Run through some first-time setup options for cPanel configuration
  • Compile PHP with all extensions and Apache modules we need
  • Install Installatron and configure settings
  • Install Configserver scripts (firewall, mail management, exploit manager) and configure with settings
  • Update php.ini with approved values
  • Install Let's Encrypt
  • Install Bitninja (a distributed firewall product we use)
  • Setup custom cPanel plugin Reclaim Hosting uses for application icons
  • Configure automatic SSL certificates
  • Reboot server

Mostly my process has been working backward from what things we used to install and configured and finding what commands I could use to do the same process (installation is always with cli but often configuration is with the GUI so finding the right way to configure was a bit more time consuming and sometimes involved editing files directly using sed). We have more that could be done, again I treat this all as a process where we can continue to refine, but this has gone a long way to making that initial hurdle of setting up servers a "set it and forget it" approach.

So with our deployment process becoming more streamlined the second piece of the puzzle was to get a handle on long term configuration of servers, the various changes and settings that we have to manage. If Apache is getting an upgrade it should be tested on a dev server and with approval it should be pushed to all servers to avoid any mismatch. In the past that meant me opening up a bunch of tabs and getting to work but that's not a scalable approach. I've taken to learning more about Ansible as a configuration management tool and it has already saved me countless hours.

Ansible doesn't require any agent to be installed on our machines, it uses SSH access to run its commands. Commands are put together with "Playbooks" which are nothing more than .yml text files. There are various "roles" that can be used to handle everything from installing software with yum and moving files back and forth, to more complex tasks, and people can write their own roles so there is lot out there already for popular approaches to configuring and managing servers. At this point you might be thinking "Well if it can do all that why did you write a bash script instead of using Ansible for deploying servers?" and you're not wrong Walter. Long term it probably does make sense to do that, but Ansible playbooks have a very specific way of doing things that is replicable across a lot of servers and frankly it would require a lot of work to rewrite the deployment method that way so it's a goal but not a major issue in my eyes.

Now with Ansible if I decide I want to roll out HTTP2 support to our servers I can write a small playbook that installs it via yum and then run that against our entire fleet. If a server already has the support for it then it doesn't have to make a change so there's no harm in running playbooks on a variety of servers that may or may not have common configurations to get them all up-to-date. If anything the biggest challenge is not writing the playbooks (which I actually enjoy), it's keeping our inventory file that holds data for all of our servers up-to-date. A dream would be to use the Digital Ocean API to dynamically update our inventory in Ansible so if a new server is added there it's automatically added to our inventory.

I'm confident that developing these configuration management processes will help us ensure our customers benefit from the latest software and better support with a standard environment they can count on from one server to the next. And if it saves us time in the process to devote to new development work even better.

Ansible and Configuration Management at Reclaim Hosting

Ansible and Configuration Management at Reclaim Hosting

When we started Reclaim Hosting it was with a lot of hopes and dreams and a single server. Hard to believe we only had a single server to manage for the first ~8 months of Reclaim's existence (and Hippie Hosting was always so small that it always only had a single shared server for all customers). Those days however are long gone, today Reclaim Hosting manages a fleet of over 100 servers for customers and institutions across the globe. As you can likely imagine, it's been a hell of a learning curve to get to a point where we are comfortable with such a large infrastructure.

Unlike a high-availability setup where you might have a bunch of servers spinning up and down but you'd never be logging into a particular one, when it comes to web hosting each server is managed individually. SSH Keys and a shared password management system go a long way to alleviate headaches with access. But the biggest hurdle has been configuration management and I finally feel we're starting to get to a place where I'm comfortable with it (I think it will always be a process). When I say configuration management I mean keeping track of and managing everything from the versions of PHP installed, what system binaries are installed and available like git, upgrades to software over time, and things of that nature. With an infrastructure this large split currently across 3 different companies, many different datacenters, and even a few different operating systems and versions it is inevitable to find that what you thought was running on one server is actually out of date or didn't get installed at the time of provisioning.

There are two things thus far I have done to tackle this issue. The first was to take a hard look at our provisioning process. In the past this was completely manual. Fire up a server and start working through a list of things that needed to be installed. If I normally installed something but forgot to mention it to Jim then there's a great chance it wouldn't get installed. And if the coffee hasn't kicked in then the whole system by its very nature is just prone to user error resulting in an incomplete system, not to mention this could take between a few days to up to a week to complete if other stuff got in the way. It may seem simple, but I created a bash script as a very simple way to get at the deploy process. There are a few prerequisites that have to be installed before running it (namely git since it's in a private repo and screen so we can run it with the session closed) but what used to be a process measured in days can now complete the majority of work in about an hour and a half. Here's everything thus far that the script does:

  • Install git, screen, nano, and ImageMagick
  • Install cPanel
  • Run through some first-time setup options for cPanel configuration
  • Compile PHP with all extensions and Apache modules we need
  • Install Installatron and configure settings
  • Install Configserver scripts (firewall, mail management, exploit manager) and configure with settings
  • Update php.ini with approved values
  • Install Let's Encrypt
  • Install Bitninja (a distributed firewall product we use)
  • Setup custom cPanel plugin Reclaim Hosting uses for application icons
  • Configure automatic SSL certificates
  • Reboot server

Mostly my process has been working backward from what things we used to install and configured and finding what commands I could use to do the same process (installation is always with cli but often configuration is with the GUI so finding the right way to configure was a bit more time consuming and sometimes involved editing files directly using sed). We have more that could be done, again I treat this all as a process where we can continue to refine, but this has gone a long way to making that initial hurdle of setting up servers a "set it and forget it" approach.

So with our deployment process becoming more streamlined the second piece of the puzzle was to get a handle on long term configuration of servers, the various changes and settings that we have to manage. If Apache is getting an upgrade it should be tested on a dev server and with approval it should be pushed to all servers to avoid any mismatch. In the past that meant me opening up a bunch of tabs and getting to work but that's not a scalable approach. I've taken to learning more about Ansible as a configuration management tool and it has already saved me countless hours.

Ansible doesn't require any agent to be installed on our machines, it uses SSH access to run its commands. Commands are put together with "Playbooks" which are nothing more than .yml text files. There are various "roles" that can be used to handle everything from installing software with yum and moving files back and forth, to more complex tasks, and people can write their own roles so there is lot out there already for popular approaches to configuring and managing servers. At this point you might be thinking "Well if it can do all that why did you write a bash script instead of using Ansible for deploying servers?" and you're not wrong Walter. Long term it probably does make sense to do that, but Ansible playbooks have a very specific way of doing things that is replicable across a lot of servers and frankly it would require a lot of work to rewrite the deployment method that way so it's a goal but not a major issue in my eyes.

Now with Ansible if I decide I want to roll out HTTP2 support to our servers I can write a small playbook that installs it via yum and then run that against our entire fleet. If a server already has the support for it then it doesn't have to make a change so there's no harm in running playbooks on a variety of servers that may or may not have common configurations to get them all up-to-date. If anything the biggest challenge is not writing the playbooks (which I actually enjoy), it's keeping our inventory file that holds data for all of our servers up-to-date. A dream would be to use the Digital Ocean API to dynamically update our inventory in Ansible so if a new server is added there it's automatically added to our inventory.

I'm confident that developing these configuration management processes will help us ensure our customers benefit from the latest software and better support with a standard environment they can count on from one server to the next. And if it saves us time in the process to devote to new development work even better.

Dr. Reclaimlove or: How I learned to Stop Worrying and Love Devops.

One of the best things (besides the /giphy function in Slack) about getting some time each month to work for Reclaim Hosting is how it has put tasks at my “traditional” full-time IT job into perspective; contrasting my full-time IT environment, which is pretty old fashioned (physical stuff), with an environment that relies heavily on Devops, virtual IT, and cloud administration. Fundamentally, Reclaim is really a model example of how to effectively run a lean startup, manage virtual IT, and stay mostly hands off, and it’s been a good introspective experiment for someone like myself who still grips precariously upon the edge of physical infrastructure and an old-school IT background.

Devops is kind of a contentious term for many “traditional” (read: mostly this means hands-on) IT people, because it represents a massive shift in the way IT work is defined and performed. Since people, especially IT people, are often prone to some degree of change-averseness (guilty) and paranoia (doubly guilty) about their precious hardware racks, “if my infrastructure goes away then my job will go away” is not an entirely unreasonable conclusion to arrive at. We are looking at a fairly unprecedented degree of change in our industry and at blazing speed. Our jobs, though not the same as they were 10 or 15 years ago, did not start becoming substantially different until about 3-4 years ago. Neckbeard the Elder would probably be a high performer at most existing IT generalist jobs in 2013 and 2014…maybe 2015, too. The next generation of IT generalists (and IT generalists will still exist) will not rely upon Neckbeard the Not-Quite-Elder (that’s us) unless we decide, right now, to acknowledge that these changes are happening and that we will perish if we don’t adapt.

So how do we embrace the changes if we’re in traditional IT and not Devops IT? First, we have to acknowledge what Devops actually means…and since the “real” definition of Devops is still up for some discussion, let’s try to define it in the context of traditional IT work:

Devops is a collection of hands-off methodologies designed to reduce the need for physical infrastructure in favor of virtual, managed infrastructure over a hosted medium.

I.E., “use the cloud, and write some scripts.” This is in comparison to the Wikipedia definition of Devops:

DevOps is a software development method that stresses communication, collaboration, integration, automation, and measurement of cooperation between software developers and other information-technology (IT) professionals.”

“Woah woah woah. I’m in IT. I’m not a software developer. I don’t want to have to deal with them.”

This sort of makes it sound like you have to be a software developer in order to be successful in IT, which is not entirely true, but I will stress this: if you want to be successful in IT in 2015+, you need to know something about how to code. Your code doesn’t have to be flashy, and you don’t need to be an expert, but it should be effective and reasonably efficient. And in “code” I recommend learning Bash, Python, or Powershell (if you are in a Windows-heavy environment). I dabble in all three of these languages, and though I am not terribly good, I understand some of the thought processes that developers go through when iterating on their previous code and it helps me “get into the head” of a developer a bit. It’s also a huge opportunity for me, and it can be for you too.

If you’re like me, you are an overworked, overstressed IT admin. I have began to embrace Devops because it gives me an avenue for working less…if I commit to the avenue. Basically, I don’t really want to do any work. I would rather be doing other things that are more fun, but I still need to have a job so I can do those other fun things. Some of those fun things are actually “work” but they’re not really “work” to me but I still get paid to do them? Anyway, I can turn this into a win-win using Devops, and I am actually going to use a very Windows-centric example because it’s the easiest to understand, and because most software developers I know do not use Windows and I am trying to keep something of a line there.

Setting up Active Directory and doing it well is difficult. Setting up AD well using Microsoft Azure is less difficult because I don’t have to worry about unscalable and unstable hardware, vendors, CALs, incredibly obtuse licensing, etc. So what is the opportunity? Setting up AD (in the cloud), integrating it with VMM (in the cloud), and creating a “developer hook” (could be as simple as a batch file run from the requestor’s desktop) so developers in the correct AD groups can request the creation of dev and staging machines (and having those machines created for them) without me having to really touch AD anymore, except also “literally” touch it because that hardware doesn’t exist in my universe. There is nothing for developers to break because if the OS gets completely destroyed somehow they can just request a new virtual machine. Microsoft does a lot of the “devops” work for you in this example because of their integration tools, but you could also Powershell a lot of that work away, intelligently, and then maybe you could even have a real lunch break! By the way, the work you do linking Azure (or AWS, or Google) VPN to your network? That will not be work software developers are going to be doing in the foreseeable future.

I am not some Microsoft fanboi, but this is a simple example of how “thinking” in a Devops way can be hugely beneficial and not so scary at all because it illustrates the need for traditional IT expertise with the development and automation expertise. Few of these opportunities existed even 5 years ago. (OK, so that is a little scary.) If you are reading this blog, you might be thinking “well, he’s preaching to the choir a bit,” but I promise you, based on the things that I have seen and heard, I’m not. If you’re not convinced, look around at what some IT specialists are doing on LinkedIn. IT needs to “get real” on these things soon, and start getting their people trained to think in a way that fosters collaboration and automation.

Instead of being wary of Devops, make it work for you, as we are doing at Reclaim. I’m currently deploying a network and server monitoring solution in the cloud for Reclaim (also a “traditional IT” task) and am creating an opportunity for myself to script or program away the SNMP configuration of hosts I’d like to add to the monitoring solution and making it as close to “zero-touch” as possible. In a more advanced environment, you could do something like this using Chef or Puppet, but for this task in particular, I don’t even really need that solution. I am greatly expanding on my Bash/Shell skills (Dev) while incorporating my security, file transmission, service configuration, and permissions skills (Ops). When the prep work is done, the operator will be able to go through a simple series of conditionals that will copy the SNMP config file over to the machines to be monitored, with no additional input needed from “the IT guy.” This is not scary. Self service is good service. Eventually we may even get to the point of auto-discovery. But that’s TNG, we’re still in Star Trek. 🙂

Jim and Tim have built the powerful engine of Reclaim Hosting using simple, powerful DevOps methodologies and thought processes. In doing so, they can focus on the customer and not let the hardware get in their way, and that is the essence of an effective business. “Think Devops” can be the essence of your IT infrastructure, support organization, and even your sanity, if you learn, as I have, how to embrace these changes and live by this mantra. If you can’t commit yet, just start with “think automation.”

I will post more, in the coming weeks, about the monitoring platform, what we are doing, and I will also post some sample config files either here or on Github that you can port right into your own Linux machines if you’d like to start experimenting with the SNMP daemon. Until then, happy reclaiming!