Seven posts over three years about (and on) Ghost is not that much in the end (running out of punny titles), but reading over them whiling writing this I realized there’s a lot of learning wrapped up in trying to figure out AWS, Bitnami images, command line, Docker containers, and Cloudron. All stuff I have been trying to focus on more an more, so this side site in many ways lives up to its subtitle: “Letters from the Cloud.” And I came back to it recently because while I blogged about setting up Ghost through Cloudron back in September, my Ghost instance on Reclaim had been terminated when we decided to no longer offer it through Reclaim Hosting. Given my Ghost blogging had been dormant for a while, I totally forgot I was hosting it through Reclaim and it vanished. Luckily I blogged everything on Ghost through the bava, so nothing was lost, and I had backups of all images, etc. So, I used the occasion of things finally slowing down at Reclaim Hosting and my being under the weather to finally get BavaGhost back online, and now it is!
Omeka continues to be a huge draw for a variety of students, faculty, and librarians using Reclaim Hosting. And the good folks at the Roy Rosenzweig Center for History and New Media have been champions of our service from the beginning, and that has made a huge difference for us. One of the issues that has come up regularly is storage for Omeka sites, which by design usually have large archives of documents, images, etc. We tend to keep our storage space for our Student and Faculty plans fairly low (2 GBs and 10 Gbs respectively) because we are trying to keep costs low, and the sales line of “unlimited” storage space for shared hosting is impractical for us. We recently introduced an Organization plan that has 100 GBs for just these instances because the need is there. That said, if you have a lot of resources you might be better off with a service like Amazon’s S3—the backup redundancy is insane and you can’t beat the price.
Over 8 months ago Tim Owens figured out Omeka has the option for pushing all uploaded files to S3 built into their code. It’s just a matter of setting up an Amazon S3 bucket with the right permissions and adding the credentials to your Omeka’s config.ini file to get it running. I was intrigued by the process, but Tim had taken care of it so I knew it was theoretically possible—but never tried it. Yesterday, however, I had the opportunity to help a Reclaimer get this up and running for their Omeka install. With some help from Tim on a couple of details I missed, I got it figured out. The rest of this post will be a step-by-step for setting up S3 storage with a self-hosted Omeka site. Continue reading “Setting Up S3 Storage for Omeka”
This past week marked the 2 year anniversary of Reclaim Hosting and what started as something of an experiment has turned into a successful business and one of the most rewarding things I've been a part of in my professional career. We've come a long way in 2 years but we're learning everyday and one thing on my ever-growing bucket list of bugs and improvements has been our website. Not the design mind you (though I do have lots of plans to give that a makeover), rather the architecture and hosting of it.
When we started the company we had just one lonely server for people's sites and that included our own. Today I manage many servers but in part out of laziness I never did move our site off the same shared hosting server our customers were on. I could justify this for awhile because it's not like we get a ton of traffic (though we're not small), but also I felt at the time it was a good gauge for our service if we were on the same system everyone else was using. Alas, I started to see holes in that methodology when people who had various issues with accessing their account be it firewall issues or otherwise would also no longer be able to get to our site. More and more it seemed like reclaimhosting.com should always be up and always be available regardless of the status of any given server we manage.
Meanwhile for the past year I've dug more and more into Amazon Web Services for server architecture. I helped UMW move UMW Blogs to AWS last summer and in fact the DNS for reclaimhosting.com has been running through Route53 for at least a year now. With a few recent projects being candidates for a multi-layered cloud approach with AWS I thought it was high time we moved the main website for Reclaim Hosting up there as a proof-of-concept and best-practice example of running WordPress in the cloud.
I started with the video below (the audio for it is incredibly quiet but it's worth it) which helped me figure out a few neat tricks for caching, staging environments, and other things I hadn't yet experimented with. The author also shares a lot of information in the notes that point to his GitHub repo for the project.
Today I flipped the DNS and at this point I believe the site is now resolving for most folks on the new setup. Here's a quick laundry list of some of the items at play with our new setup for reclaimhosting.com:
A single EC2 server serves as a staging environment that utilizes the same database as production servers.
The staging environment changes are committed to a private repo on GitHub (mostly just plugin and theme modifications since the database stores most all content)
Our database is using a Multi-AZ (Availability Zone) RDS instance for high availability.
We have an Elastic Load Balancer that is receiving the requests to the site and sending them to production servers.
OpsWorks is setup to deploy changes from the git repo to 4 production EC2 servers.
Each production server is located in a separate datacenter for high availability.
All uploads are sent to S3 storage and served on the website using the CloudFront CDN for faster response times globally.
A Redis object cache stores objects across all production servers in memory on an ElastiCache instance.
The new setup also has me digging a bit into Chef which is used in the deployment process by OpsWorks but not something I previously had any experience with. The upgraded setup also comes with a lot of security enhancements. Logins are only available via white-listed IP to our staging environment and wp-login.php is completely removed during deployment so it's virtually impossible for someone to get in to our site. Security groups (essentially firewalls) are configured in a way that each layer only has the necessary ports open to the necessary servers accessing them. For example the database can only be accessed by the running EC2 servers, the production servers will only access ports 80 and 443 (http/https) from the load balancer, and staging is only accessible via SSH with a private key.
I started out with T2-Micro instances across everything (the cheapest Amazon offers), bringing the total cost of a setup like this to around $60/month running 24/7 with 5 servers, a production database, an in-memory cache instance, and ~1GB of data stored in S3 with CDN caching. At the moment performance seems to be pretty great considering the small size of the production servers.
This is no doubt an incredibly complex environment and not something I would recommend anyone tackle for their personal sites, but for large production instances of WordPress used by your campus or sites that absolutely need to be online at all times and highly-available globally? Absolutely! And heck, if you're unsure how you might accomplish the same hire Reclaim Hosting to help you get setup or host it all for you in the cloud!
Anyway, the image above is a look at a potential setup for a large WordPress Multisite instance on AWS. It has a couple of elements worth discussing in some detail because I want to try and get my head around each of them. The first is a load balancer that runs in its own EC2 instance.
What the load balancer does is direct traffic to the Ec2 instance running the WordPress core files with the least load. So if you have four EC2 instances each running WordPress’s core files, the one with the least usage get the next request. Additionally, if all the instances have too great a load another could, theoretically, be spun up to meet the demand. That’s one of the core ideas behind elastic computing. The load balancer Tim used for UMW Blogs was HAProxy.
As mentioned above, you can setup a series of instances of EC2 on AWS with the core WordPress files save the wp-content directory, which is the only directory folks write to. But you will notice in the fourth instances Tim switched things up. He suggested here that we could have an EC2 instance running Docker that could then run several WordPress instances within it. What’s the difference? And this is where I am still struggling a bit, but from what I understand this allows you to spin up new instances quicker, isolate instances from each other for more security, and upgrade and switch out instances seamlessly. It effectively makes WordPress upgrades in a large environment trivial.
We have yet another EC2 instance that is the Network File Storage, this holds the wp-content files. The uploads, plugins, themes, upgrades, etc. And each of the above instances share this instance. The all write to this, but one of the issues here is that this can be a single point of failure, kinda like the load balancer. So, Tim suggested there is BitTorrent Sync (BTSync), which I still don’t totally understand but sounds awesome. It’s basically technology that synchs files from your computer to spot on the internet, or between spaces in the internet, etc. So, what if we had several bucket where the various instances of WordPress core files were writing the upload files, themes, plugins, etc, and those buckets used BTSync to share between them almost immediately. So then you wouldn’t have a single point of failure, you would have the various instances writing to various buckets of files that would be constantly synching using the technology behind BitTorrent. Far out, right?
Another option, and I think this was before we started talking about BTSync, but not sure if this would be in possible in addition to BTync, is have the blogs.dir folder for a WordPress Multisite that handle all the individual site files uploaded be sent to S3, Amazon’s file storage service.
You get the sense that part of what’s happening when you move an application like WordPress Multisite onto AWS, or some other cloud-based, virtualized server environment, is each element is abstracted out to it basic functions. Core files that are read-only are separate from anything that is written to, whether that be themes, plugins, or uploads. Additionally, the database is also abstracted out, and you can run an EC2 instance on AWS with Docker containers each running MySQL (with a sharDB or HyberDB to further break up load) that can also replicate various writes and calls using BTSync? No single point of failure, and you greatly reduce the load on a WPMS which is completely I’m out of my depth here, but if I I accomplished anything here it might be giving you an insight to my confusion, which is also my excitement about figuring out the possibilities.
I have no idea if this makes sense, and I would really love any feedback from anyone who knows what they are talking about because I’m admittedly writing this to try and understand it. Regardless, it was pretty awesome hearing Tim lay it out because it certainly provides a pretty impressive solution to running large, resource intensive WordPress Multisite instance.
While folks were bouncing ideas around on how to bring the site up again while still struggling with the outage, I mentioned that I could pretty quickly migrate the site over to Amazon Web Services and run it in Docker containers there. The higher-ups gave me the go-ahead and a credit card (very important, heh) and told me to get it setup. The idea was to have it there so we could fail over to the cloud if we were unable to resolve the outage in a reasonable time.
TL;DR – I did, it was easy, and we failed over all external traffic to the cloud. Details below.
He goes on to describe his process in some detail, and it struck me how the shift in IT infrastructure is moving, and also made me wonder how many IT organizations in higher ed are truly rethinking their architecture along these lines. It’s one thing to push your services to a third party vendor that hosts all your stuff, it’s all together different to bring in a team that understands and is prepared to move a university’s infrastructure into a container-based model that can be hosted in the cloud. Not to mention what this might soon mean for personal options, and a robust menu teaching and learning applications heretofore unimaginable. This would make the LAMP environment options Domain of One’s Own offers look like Chucky from Child’s Play
I know Tim and I are looking forward to thinking about what such a container-based architecture might means for an educational hosting environment that is simple, personalized, and expansive. Tim turned me on to Tutum recently, which starts to get at the idea of a personalized cloud across various providers—something Tim Klapdor gets at brilliantly:
MYOS is very much the model the Jon Udell laid out as “hosted life bits” – a number of interconnected services that provide specific functionality, access and affordances across a variety of contexts. Each fits together in a way that allows data to be controlled, managed, connected, shared, published and syndicated. The idea isn’t new, Jon wrote about life bits in 2007, but I think the technology has finally caught up to the idea and it’s now possible to make this a reality in very practical way.
His post on the topic deserves a close reading, and it’s the best conceptual mapping of what we might build I have read yet. I wanna help realize this vision, and I guess I am writing about Duke University’s move to Docker because it suggests this is the route Higher Ed IT will be moving towards anyway (sooner or later—which could be a long later for some ). Seems we might have an opportunity to inform what it might look like for teaching and learning from the ground floor. It’s not a given it will be better, that will depend upon us imagining what exactly a teaching and learning infrastructure might look like. Tim Klapdor has provided one of the most compelling visions to date, building on Jone Udell’s thinking, but that’s just the beginning.
D’Arcy Norman posted this one video that parts of a larger documentary project he started while at UMW for the Reclaim Your Domain Hackathon. I don’t get to see D’Arcy nearly enough, and unfortunately I was running around like the proverbial chicken all weekend trying to organize the hackathon which meant any focused time together was limited. I consider D’Arcy one of my oldest and dearest “edtech” friends, and sitting down with him for 15 minutes allowed me to try and articulate what’s important about the Reclaim movement for me.
This is my fourth Reclaim Your Domain event, the others being the MIT Hackathon March 2013, Atlanta Domain Incubator April 2014, and LA Reclaim Hackathon July 2014. These events have been by far the best professional development I’ve had over the last two years, and much of that is owed to the vision Audrey Watters and Kin Lane turned me onto near on two years ago. I’ve effectively been spending my copious spare time trying to wrap my head around things like Amazon Web Services, GitHub, and APIs. And as I suggest in the video, these are the platforms and technologies I’ve trying to understand so I can translate how they reflect some of the more seismic shifts in how the web works over the last few years. Kin and Audrey are a brilliant one-two punch in this regard, framing the technical, social, political, economic and more. Add to all this the IndieWeb movement, and Reclaim really feels vibrant and full of possibility. So, I want to thank D’Arcy, Andy Rush, David Kernohan, and Grant Potter for taking the time last weekend to try and capture some of it. Big Fan!