The Ghost of bava

This is kind of a record keeping post, it turns out when you’ve been blogging for nearly 15 years posts can be useful to remind you of what you did years earlier that you presently have no recollection of. It’s my small battle against the ever-creeping memory loss that follows on the heels of balding and additional chins—blog against the dying of the light!

Anyway, I’m trying to keep on top of my various sites and recently I realized that as a result of extracting this blog out of my long-standing WordPress Multisite in 2018, followed by a recent move over to Digital Ocean this January a number of images in posts that were syndicated from bavatuesdays to sites like https://jimgroom.umwblogs.org were breaking. The work of keeping link rot at bay is daunting, but we have the technology. I was able to login to UMW Blogs database and run the following SQL query:

UPDATE wp_13_posts SET post_content = replace(post_content, 'http://bavatuesdays.com/files/', 'https://bavatuesdays.com/wp-content/uploads/')

That brought those images back, and it reminds me that I may need to do something similar for the ds106.us site given I have a few hundred posts syndicated into that site that probably have broken images now. 

But the other site I discovered had broken images as a result of my various moves was the Ghost instance I’ve kept around since 2014. I initially started this site as a sandbox on AWS in order to get a Bitnami image of Ghost running, which was my first time playing with that space in earnest back in 2014. That period was when Tim and I were trying to convince UMW’s IT department to explore AWS in earnest. In fact, we would soon move UMW Blogs to AWS as a proof-of-concept but also to try and pave the way for hosting more through Cloud-based services like Digital Ocean, etc. 

It’s also the time when the idea of servers in the “Cloud” seemed amazing and the idea of new applications running on stacks other than LAMP became real for me. Ghost was one of those. It was the promise of a brave new world, a next-generation sandbox, which was around the time Tim setup container-based hosting for both Ghost and Discourse through Reclaim Hosting as a bit of an experiment. Both worked quite well and were extremely reliable, but there was not much demand and in terms of support it continued to rely too heavily on Tim for us to sustain it without a more robust container-based infrastructure. We discontinued both services a while back, and are finally shutting down those servers once and for all. And while we had hopes for Cloudron over the last several years, in the end that’s not a direction we’re planning on pursuing. Folks have many options for hosting applications like JupyterHub and the like, and the cost concerns of container-based hosting remains a big question mark—something I learned quickly when using Kinsta.

Part of what makes Reclaim so attractive is we can provide excellent support in tandem with an extremely affordable service. It’s a delicate balance to say the least, but we’ve remained lean, investment free, and as a result have been able to manage it adroitly. We are still convinced that for most folks a $30 per year hosting plan with a free domain will go a long way towards getting them much of what they need when it comes to a web presence. If we were to double or triple that cost by moving to a container-based infrastructure it would remove us from our core mission: provide affordable spaces for folks to explore and learn about the web.* What’s more, in light of the current uncertainties we all face we’re even more committed to keeping costs low and support dialed-in. 

Ghost in a Shell

So, I’m not sure why this record keeping post became a manifesto on affordability, but there you have it ? All this to say while we have been removing our Discourse forum application servers we also decided to use the occasion to migrate our Ghost instances that we’re currently hosting (which are only a very few) to shared hosting so that we can retire the my.reclaim.domains server that was running them on top of Cloudron. So, Tim and I spent a morning last week going over his guide for setting up Ghost through our shared hosting on cPanel, and it still works.† The only change is now you need to use Node.js version 10+ for the latest version of Ghost.

He migrated his Ghost blog to our shared hosting, and I did the same for mine (which only has a few posts).  He has been blogging on Ghost for several years now and I have to say I like the software a lot. It’s clean and quite elegant, and their mission and transparency is a model! But if you don’t have the expertise to install it yourself (whether on cPanel or a VPS) hosting it through them comes at a bit of a cost, with plans starting at about $30 per month. That price-point is a non-starter for most folks starting out. What’s more, there’s little to no room to dig deeper into the various elements of web hosting afforded by cPanel for an entire year (including a domain) versus the same cost for just one month of hosting for only one site.

So, I have toyed with the idea of trying to move all my posts over to Ghost, but when I consider the cost as well as the fact it has no native way to deal with commenting cleanly, it  quickly becomes a non-starter. With over 14,000 comments on this blog, I can’t imagine they would be migrated to anything resembling a clean solution that would not result in just that much more link rot. I guess I am still WordPress #4life ?


*And while it remains something we are keenly interested in doing, we are not seeing it as an immediate path given the trade-off between investment costs and the idea a per-container costs for certain applications which would radically change our pricing model.

†He had to help me figure out some issues I ran into as a result of running the commands as root.

Look a(nother) Ghost

Since May of 2014 I have been playing on and off with the blogging platform Ghost. It has been an on again off again affair, and I have never left WordPress for it, but rather use it as a test bed for exploring how Reclaim might host applications outside the LAMP stack—an ongoing theme for us over the last 3 or 4 years. So, I have been marking my progress with running Ghost both here on the bava as well as on my Ghost blog. I talked about the idea of this as the Next Generation Sandbox, experimented with getting Ghost running on AWS using Bitnami, feeble terminal work, setting up key pairs in AWS, moving to Reclaim’s container-based setup for a kind of multi-site Ghost, setting up mail for Ghost, and most recently using Cloudron to setup Ghost.

Seven posts over three years about (and on) Ghost is not that much in the end (running out of punny titles), but reading over them whiling writing this I realized there’s a lot of learning wrapped up in trying to figure out AWS, Bitnami images, command line, Docker containers, and Cloudron. All stuff I have been trying to focus on more an more, so this side site in many ways lives up to its subtitle: “Letters from the Cloud.” And I came back to it recently because while I blogged about setting up Ghost through Cloudron back in September, my Ghost instance on Reclaim had been terminated when we decided to no longer offer it through Reclaim Hosting. Given my Ghost blogging had been dormant for a while, I totally forgot I was hosting it through Reclaim and it vanished. Luckily I blogged everything on Ghost through the bava, so nothing was lost, and I had backups of all images, etc. So, I used the occasion of things finally slowing down at Reclaim Hosting and my being under the weather to finally get BavaGhost back online, and now it is!

Continue reading “Look a(nother) Ghost”

Setting Up S3 Storage for Omeka

Omeka continues to be a huge draw for a variety of students, faculty, and librarians using Reclaim Hosting. And the good folks at the Roy Rosenzweig Center for History and New Media have been champions of our service from the beginning, and that has made a huge difference for us. One of the issues that has come up regularly is storage for Omeka sites, which by design usually have large archives of documents, images, etc. We tend to keep our storage space for our Student and Faculty plans fairly low (2 GBs and 10 Gbs respectively) because we are trying to keep costs low, and the sales line of “unlimited” storage space for shared hosting is impractical for us. We recently introduced an Organization plan that has 100 GBs for just these instances because the need is there. That said, if you have a lot of resources you might be better off with a service like Amazon’s S3—the backup redundancy is insane and you can’t beat the price.

Screenshot 2015-11-10 23.30.05

Over 8 months ago Tim Owens figured out Omeka has the option for pushing all uploaded files to S3 built into their code. It’s just a matter of setting up an Amazon S3 bucket with the right permissions and adding the credentials to your Omeka’s config.ini file to get it running. I was intrigued by the process, but Tim had taken care of it so I knew it was theoretically possible—but never tried it. Yesterday, however, I had the opportunity to help a Reclaimer get this up and running for their Omeka install. With some help from Tim on a couple of details I missed, I got it figured out. The rest of this post will be a step-by-step for setting up S3 storage with a self-hosted Omeka site. Continue reading “Setting Up S3 Storage for Omeka”

Reclaim Hosting in the Cloud

Reclaim Hosting in the Cloud

This past week marked the 2 year anniversary of Reclaim Hosting and what started as something of an experiment has turned into a successful business and one of the most rewarding things I've been a part of in my professional career. We've come a long way in 2 years but we're learning everyday and one thing on my ever-growing bucket list of bugs and improvements has been our website. Not the design mind you (though I do have lots of plans to give that a makeover), rather the architecture and hosting of it.

When we started the company we had just one lonely server for people's sites and that included our own. Today I manage many servers but in part out of laziness I never did move our site off the same shared hosting server our customers were on. I could justify this for awhile because it's not like we get a ton of traffic (though we're not small), but also I felt at the time it was a good gauge for our service if we were on the same system everyone else was using. Alas, I started to see holes in that methodology when people who had various issues with accessing their account be it firewall issues or otherwise would also no longer be able to get to our site. More and more it seemed like reclaimhosting.com should always be up and always be available regardless of the status of any given server we manage.

Meanwhile for the past year I've dug more and more into Amazon Web Services for server architecture. I helped UMW move UMW Blogs to AWS last summer and in fact the DNS for reclaimhosting.com has been running through Route53 for at least a year now. With a few recent projects being candidates for a multi-layered cloud approach with AWS I thought it was high time we moved the main website for Reclaim Hosting up there as a proof-of-concept and best-practice example of running WordPress in the cloud.

I started with the video below (the audio for it is incredibly quiet but it's worth it) which helped me figure out a few neat tricks for caching, staging environments, and other things I hadn't yet experimented with. The author also shares a lot of information in the notes that point to his GitHub repo for the project.

Today I flipped the DNS and at this point I believe the site is now resolving for most folks on the new setup. Here's a quick laundry list of some of the items at play with our new setup for reclaimhosting.com:

  • A single EC2 server serves as a staging environment that utilizes the same database as production servers.
  • The staging environment changes are committed to a private repo on GitHub (mostly just plugin and theme modifications since the database stores most all content)
  • Our database is using a Multi-AZ (Availability Zone) RDS instance for high availability.
  • We have an Elastic Load Balancer that is receiving the requests to the site and sending them to production servers.
  • OpsWorks is setup to deploy changes from the git repo to 4 production EC2 servers.
  • Each production server is located in a separate datacenter for high availability.
  • All uploads are sent to S3 storage and served on the website using the CloudFront CDN for faster response times globally.
  • A Redis object cache stores objects across all production servers in memory on an ElastiCache instance.

Reclaim Hosting in the Cloud

The new setup also has me digging a bit into Chef which is used in the deployment process by OpsWorks but not something I previously had any experience with. The upgraded setup also comes with a lot of security enhancements. Logins are only available via white-listed IP to our staging environment and wp-login.php is completely removed during deployment so it's virtually impossible for someone to get in to our site. Security groups (essentially firewalls) are configured in a way that each layer only has the necessary ports open to the necessary servers accessing them. For example the database can only be accessed by the running EC2 servers, the production servers will only access ports 80 and 443 (http/https) from the load balancer, and staging is only accessible via SSH with a private key.

I started out with T2-Micro instances across everything (the cheapest Amazon offers), bringing the total cost of a setup like this to around $60/month running 24/7 with 5 servers, a production database, an in-memory cache instance, and ~1GB of data stored in S3 with CDN caching. At the moment performance seems to be pretty great considering the small size of the production servers.

This is no doubt an incredibly complex environment and not something I would recommend anyone tackle for their personal sites, but for large production instances of WordPress used by your campus or sites that absolutely need to be online at all times and highly-available globally? Absolutely! And heck, if you're unsure how you might accomplish the same hire Reclaim Hosting to help you get setup or host it all for you in the cloud!

Abstractions: Running WordPress Multi-Site using AWS, Docker, and BTSync

Heads up: this is not a technical run through, but more of a conceptual overview. Apologies if you came here looking for a how-to. Hopefully we will have just that in the next few months.

But enough about the past, let’s talk about the future!

aws_wpms_setup

This past week Tim Owens and I went down to VCU’s ALT Lab to meet with Tom Woodward, Jon Becker, and Mark Luetke about the work they’re doing with Ram Pages. I already blogged about a couple of plugins they created for making syndication-based course sites dead simple. We also got to talking about some of the ways we have been using Amazon Web Services (AWS)to scale UMW Blogs. At this point Tim took us to school on the whiteboard explaining a possible setup he has been imagining, which is still fairly experimental.

Don’t let Tim fool you, he is DevOps #4life now. He can be found in his spare time watching presentations about load balancing a site for a billion users or scaling infrastructure for small services like Netflix. I’m becoming more and more interested in infrastructure discussions because they highlight interesting trends in the shifting nature of tech that deeply effects edtech, such as virtualization, containers, and APIs.

image

Anyway, the image above is a look at a potential setup for a large WordPress Multisite instance on AWS. It has a couple of elements worth discussing in some detail because I want to try and get my head around each of them. The first is a load balancer that runs in its own EC2 instance.

loadbalancer

What the load balancer does is direct traffic to the Ec2 instance running the WordPress core files with the least load. So if you have four EC2 instances each running WordPress’s core files, the one with the least usage get the next request. Additionally, if all the instances have too great a load another could, theoretically, be spun up to meet the demand. That’s one of the core ideas behind elastic computing. The load balancer Tim used for UMW Blogs was HAProxy.

image

As mentioned above, you can setup a series of instances of EC2 on AWS with the core WordPress files save the wp-content directory, which is the only directory folks write to. But you will notice in the fourth instances Tim switched things up. He suggested here that we could have an EC2 instance running Docker that could then run several WordPress instances within it. What’s the difference? And this is where I am still struggling a bit, but from what I understand this allows you to spin up new instances quicker, isolate instances from each other for more security, and upgrade and switch out instances seamlessly. It effectively makes WordPress upgrades in a large environment trivial.

image

We have yet another EC2 instance that is the Network File Storage, this holds the wp-content files. The uploads, plugins, themes, upgrades, etc.  And each of the above instances share this instance. The all write to this, but one of the issues here is that this can be a single point of failure, kinda like the load balancer. So, Tim suggested there is BitTorrent Sync (BTSync), which I still don’t totally understand but sounds awesome. It’s basically technology that synchs files from your computer to spot on the internet, or between spaces in the internet, etc. So, what if we had several bucket where the various instances of WordPress core files were writing the upload files, themes, plugins, etc, and those buckets used BTSync to share between them almost immediately. So then you wouldn’t have a single point of failure, you would have the various instances writing to various buckets of files that would be constantly synching using the technology behind BitTorrent. Far out, right?

btsynch
BTSync provide ability to immediately copy and synch files across several buckets of the same files that get written to regularly.

Another option, and I think this was before we started talking about BTSync, but not sure if this would be in possible in addition to BTync, is have the blogs.dir folder for a WordPress Multisite that handle all the individual site files uploaded be sent to S3, Amazon’s file storage service.

image

You get the sense that part of what’s happening when you move an application like WordPress Multisite onto AWS, or some other cloud-based, virtualized server environment, is each element is abstracted out to it basic functions. Core files that are read-only are separate from anything that is written to, whether that be themes, plugins, or uploads. Additionally, the database is also abstracted out, and you can run an EC2 instance on AWS with Docker containers each running MySQL (with a sharDB or HyberDB to further break up load) that can also replicate various writes and calls using BTSync? No single point of failure, and you greatly reduce the load on a WPMS which is completely I’m out of my depth here, but if I I accomplished anything here it might be giving you an insight to my confusion, which is also my excitement about figuring out the possibilities.

imageI have no idea if this makes sense, and I would really love any feedback from anyone who knows what they are talking about because I’m admittedly writing this to try and understand it. Regardless, it was pretty awesome hearing Tim lay it out because it certainly provides a pretty impressive solution to running large, resource intensive WordPress Multisite instance.

Duke’s Website has Gone Docker

I was excited to see Tony Hirst retweet the news that Duke University’s website is being run in a Docker environment, and it could even be served through Amazon Web Services. Chris Collins, senior Linux admin at Duke, wrote about “Using Docker and AWS to Survive and Outage” they had as a result of DDoS attacks on their main site back in January. I love the way he tells the story:

While folks were bouncing ideas around on how to bring the site up again while still struggling with the outage, I mentioned that I could pretty quickly migrate the site over to Amazon Web Services and run it in Docker containers there. The higher-ups gave me the go-ahead and a credit card (very important, heh) and told me to get it setup.  The idea was to have it there so we could fail over to the cloud if we were unable to resolve the outage in a reasonable time.

TL;DR – I did, it was easy, and we failed over all external traffic to the cloud. Details below.

He goes on to describe his process in some detail, and it struck me how the shift in IT infrastructure is moving, and also made me wonder how many IT organizations in higher ed are truly rethinking their architecture along these lines. It’s one thing to push your services to a third party vendor that hosts all your stuff, it’s all together different to bring in a team that understands and is prepared to move a university’s infrastructure into a container-based model that can be hosted in the cloud. Not to mention what this might soon mean for personal options, and a robust menu teaching and learning applications heretofore unimaginable. This would make the LAMP environment options Domain of One’s Own offers look like Chucky from Child’s Play Duke’s Website has Gone Docker

I know Tim and I are looking forward to thinking about what such a container-based architecture might means for an educational hosting environment that is simple, personalized, and expansive. Tim turned me on to Tutum recently, which starts to get at the idea of a personalized cloud across various providers—something Tim Klapdor gets at brilliantly:

MYOS is very much the model the Jon Udell laid out as “hosted life bits” – a number of interconnected services that provide specific functionality, access and affordances across a variety of contexts. Each fits together in a way that allows data to be controlled, managed, connected, shared, published and syndicated. The idea isn’t new, Jon wrote about life bits in 2007, but I think the technology has finally caught up to the idea and it’s now possible to make this a reality in very practical way.

His post on the topic deserves a close reading, and it’s the best conceptual mapping of what we might build I have read yet. I wanna help realize this vision, and I guess I am writing about Duke University’s move to Docker because it suggests this is the route Higher Ed IT will be moving towards anyway (sooner or later—which could be a long later for some Duke’s Website has Gone Docker ). Seems we might have an opportunity to inform what it might look like for teaching and learning from the ground floor. It’s not a given it will be better, that will depend upon us imagining what exactly a teaching and learning infrastructure might look like. Tim Klapdor has provided one of the most compelling visions to date, building on Jone Udell’s thinking, but that’s just the beginning.

Reclaim and the Translation of EdTech


Jim Groom on Reclaim and the translation of edtech from UCalgary Taylor Institute on Vimeo.

D’Arcy Norman posted this one video that parts of a larger documentary project he started while at UMW for the Reclaim Your Domain Hackathon. I don’t get to see D’Arcy nearly enough, and unfortunately I was running around like the proverbial chicken all weekend trying to organize the hackathon which meant any focused time together was limited. I consider D’Arcy one of my oldest and dearest “edtech” friends, and sitting down with him for 15 minutes allowed me to try and articulate what’s important about the Reclaim movement for me.

This is my fourth Reclaim Your Domain event, the others being the MIT Hackathon March 2013, Atlanta Domain Incubator April 2014, and LA Reclaim Hackathon July 2014. These events have been by far the best professional development I’ve had over the last two years, and much of that is owed to the vision Audrey Watters and Kin Lane turned me onto near on two years ago. I’ve effectively been spending my copious spare time trying to wrap my head around things like Amazon Web Services, GitHub, and APIs. And as I suggest in the video, these are the platforms and technologies I’ve trying to understand so I can translate how they reflect some of the more seismic shifts in how the web works over the last few years. Kin and Audrey are a brilliant one-two punch in this regard, framing the technical, social, political, economic and more. Add to all this the IndieWeb movement, and Reclaim really feels vibrant and full of possibility. So, I want to thank D’Arcy, Andy Rush, David Kernohan, and Grant Potter for taking the time last weekend to try and capture some of it. Big Fan!