posts   talks   feed   about   search  

My Blog Setup

by Mark Wilkinson · March 07, 2018

When you get a group of bloggers together it’s inevitable that they’ll start talking about the software they use to run their blog. Recent conversations about this very topic have inspired me to make a write up of the various components that go into my blog.

In this post I am going to discuss the platform I use, my workflow when publishing new posts, and how I can get TLS (https://), traffic analytics, a custom domain, search, and hosting on one of the worlds largest providers, all for under $2 a month. I’ll go into detail on the more interesting bits, and link to other resources for the more standard setup pieces.

Price List

Here is the breakdown of all the tools I use and what the costs are (in USD):

Tool Purpose Price
Google Domains Domain Registrar $12.00/yr ($1/month)
Route 53 DNS $0.50/month
S3 Blog Hosting $0.14/month
Cloudfront CDN/TLS Proxy $0.13/month
Mailgun Customer Domain Email FREE
Git Server Source Control FREE
Google Analytics Traffic Analysis FREE
Jekyll Static Site Generator FREE
Lunr.js In-Site Search FREE
Let’s Encrypt TLS Certificate FREE
Jenkins Deploying static content FREE
TOTAL   $1.77/month

If you can’t tell from the list, I am a bit of a nerd. I love trying new technologies. Sure this would be easier to publish on something like Wordpress, but I love the amount of control I have over every part of my blog.

GitHub Pages and Beyond

My blog started it’s life as a page on GitHub Pages. GitHub Pages is a neat tool that serves a website from your github repository using a Ruby program called Jekyll. Jekyll is a static site generator that uses a templating engine called Liquid. Static site generators are tools that programatically generate a website consisting of static HTML documents that can then be uploaded to practically any web host. This is opposed to a technology like PHP that generates content based on user requests and input.

Static site generators are nice because your blog posts are highly portable. Posts are typically written in a plain-text markup language like Markdown so you can use any text editor to write posts. Here is a quick example of Markdown:

# This is a heading
Here is some intro text

## Here is a sub-heading

* Bulleted list item 1
* Item 2
* Item 3

When I decided to move my blog from Github pages to my own servers I simply installed Jekyll on my local computer, generated my site, and uploaded the freshly generated website to my own server. Later when I moved to my current host (an S3 bucket in AWS), all I had to do was upload my website.

AWS S3

S3 is a cheap cloud storage solution provided by Amazon as part of their Amazon Web Services offering. S3 storage is presented as a “bucket” that you can store almost anything in. In my case I have a publicly readable bucket configured for static website hosting. AWS has some great documentation on setting this up, it really only takes a few clicks of your mouse to get a static hosting bucket configured.

S3 pricing is based on the amount of data you are storing, and how many GET/PUT requests are made. While that might sound like it could get expensive, the pricing at the time I wrote this post is as follows:

Storage - First 50TB/month $0.0023/GB
PUT,LIST Requests $0.005 per 1,000 requests
GET Requests $0.0004 per 1,000 requests

The great thing about hosting from S3 is the reliability. For pennies a month I get to host my small blog on servers that are managed by some of the most talented engineers in the world, on highliy redundant systems (your data is copied to at least three different physical facilities within a region).

Cloudfront CDN

Cloudfront is a CDN, or Content Distribution Network, run by Amazon. Cloudfront is responsible for caching static elements of websites and delivering those elements to a users web browser from one of hundreds of edge locations around the globe. My S3 bucket is located in the US-East region (in northern Virginia), with Cloudfront my site gets served from the edge location closest to the web client, this ensures the site always loads fast, no matter where you are accessing it from.

Cloudfront is also what allows me to use HTTPS. The certificate I get from Let’s Encrypt is uploaded to Cloudfront and Cloudfront then uses that certificate to provide HTTPS connections to any clients trying to access my blog.

The process of setting up the S3 bucket, getting a certificate from Let’s Encrypt, and setting up a Cloudfront distribution (this is what the Cloudfont instance is called) is actually pretty simple, instead of re-inventing the wheel though, I am just going to point you to this great tutorial: Let’s Encrypt a Static Site on Amazon S3.

TLS, for Great Privacy

I am a huge fan of the current trend on the internet of sites adopting TLS (https://) over plain http. I am not going to go into my reasons because I think another site captures it quite well: https://doesmysiteneedhttps.com/.

When I first looked into getting a TLS certificate I went with a company called StartCom. Certificates were free but the process wasn’t very simple, and couldn’t be automated. I used StartCom for about a year before Let’s Encrypt started offering certificates. Let’s Encrypt is an amazing free service that offers TLS certificates via API. With a simple shell script and a scheduled task you can automate the task of creating and renewing your certificate. I am currently using a pre-built docker container for this, check it out here.

The combination of Let’s Encrypt and docker has really simplified the process and makes the choice to go with https a no-brainer. With a single command I can generate a certificate and associate it with my Cloudfront distribution.

lunr.js

Search was a fun one to implement. I had simple yet strict requirements: It must be fast, it must all run on the client side. After some digging I found the lunr.js project. Lunr is a javascript fulltext search engine. It supports all kinds of great features like boosting and word distance searches. The best thing about lunr is that it lives in a single blob of javascript and runs completely on the client.

Search Index

The easier thing to mess up when using lunr is the indexing process. Indexing your blog means you have to build an index that includes every post on your blog. If you aren’t careful this index could be quite large and take a while to load to the browser. My approach involved writing a small custom Jekyll plugin in Ruby, and writing a minimal amount of information to the index.

The plugin is straight forward:

module Jekyll
    module LunrStrip
        def lunr_strip(text)
            text.gsub(/[^0-9a-z]/i, ' ').downcase
        end

        def eat_small(text, min_len)
            text.gsub(/\b.{1,#{min_len}}\b/i, ' ')
        end
    end
end

Liquid::Template.register_filter(Jekyll::LunrStrip)

I saved this file as in the _plugins directory under the root of my blog. Jekyll automatically looks here for plugins when building your blog. I used this plugin to programatically generate my search index every time I build my blog. Whenever my blog is built, the index is built and stored in this file: /js/search.js. The heavy lifting is accomplished using the following liquid template snippet:

{% assign count = 0 %}{% for post in site.posts %}
  this.add({
    title: {{post.title | jsonify}},
    category: {{post.category | jsonify}},
    content: {{post.content | lunr_strip | eat_small: 3 | split: " " | sort | uniq | join: " " | jsonify}},
    summary: {{post.summary | strip_html | jsonify}},
    id: {{count}}
  });{% assign count = count | plus: 1 %}{% endfor %}
})

The magic happens in the content line. There we take the entire contents of the post, pass it through the custom Jekyll plugin to strip out all non alphanumeric characters and remove words with less than three characters in them. After that we convert the remaining text into an array, sort the array, and remove duplicate entries. We then join the array back into a single string and convert it to a JSON compliant string and add it to the search index located in the search.js file starting around line 10.

This whole process leaves us with a list of unique words for each post. This makes the file size much smaller while still allowing us to search accurately. As of this writing the search index for my entire blog weighs in at less than 60kb. While this search is working great I would love to remove the need for the client to download the search index. In the future I may try to use a lambda function that performs all of the search operations on the users behalf.

Workflow

My current workflow consists of whatever text editor I feel like using, git, and Jenkins. Jenkins is a very recent addition to my workflow. Previously I would write posts, build my blog, then run a script to upload it to S3. My biggest challenge with that setup was that I had to be on my laptop to deploy my blog. If I wanted to make a quick correction from my phone (I currently use the Tig Git client on my iPhone to write posts) I would have to check in my changes and then hop on my laptop to deploy the changes. Then I discovered Jenkins.

Jenkins

I had heard about Jenkins before but I had never really seen it in action until recently when I attended Open Source 101, a local open source software conference. There I attended a Jenkins session by Brent Laster. This session was the first time I had been exposed to Jenkins pipeline jobs.

Using Jenkins pipeline jobs you can define the various steps in your build process using a custom version of the groovy scripting language. This code is then checked in right along side your blog. After that it is as simple as pointing Jenkins at your blog repo. Here is the Jenkinsfile that defines my build process:

pipeline {
    agent any

    stages {
        stage('Build') {
            steps {
                echo 'Building..'
                sh('PATH="/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"; bundle install; bundle exec jekyll build --incremental')
            }
        }
        stage('Prepare for Deployment') {
            steps {
                echo 'Minifying Style Sheets...'
                sh('_deploy/minify.sh _site/public/css/print.css')
                sh('_deploy/minify.sh _site/public/css/style.css')
                sh('_deploy/minify.sh _site/public/css/syntax.css')
            }
        }
        stage('Deploy') {
            steps {
                echo 'Deploying....'
                withCredentials([[$class: 'UsernamePasswordMultiBinding', credentialsId: 'blog-s3',usernameVariable: 'USERNAME', passwordVariable: 'PASSWORD']]) {
                    sh '_deploy/deploy_to_s3.sh $USERNAME $PASSWORD'
                }
            }
        }
    }
}

This code is fairly straight forward. First it makes sure all the proper dependencies are installed for Jekyll, and builds the site. Then I run my CSS assets through a custom minify script I wrote. This makes the files smaller, which decreases load time and bandwidth usage. Finally I retrieve my AWS credentials and run a custom deploy script to copy all of my blog files over to my S3 bucket.

You can read more about writing pipeline jobs via a Jenkinsfile here.

Git

There isn’t much special about my Git setup expect one thing. Jenkins has a feature that allows you to trigger a build via a simple web request to the Jenkins server. To accomplish this I added the following to the a post-update script in the hooks directory on my git server. The script simply makes a web request whenever code is pushed to the master branch of my blog project:

#!/bin/bash
branch=$(git rev-parse --symbolic --abbrev-ref $1)

echo "Commit received for branch: ${branch}"
if [[ "${branch}" == "master" ]]; then
	echo "Calling build API..."
	curl -k https://<api-user>:<api-user-key>@<jenkins-server-address>/job/Blog/build?token=<auth-token>
fi
exec git update-server-info

Misc

There are a few things I didn’t really cover in detail, but wanted to call out here:

Final Thoughts

This whole mess of components looks really complex when I type it all up, but it has been an ongoing project that has grown as I find new technologies to use and new challenges. I think of all the projects I have worked on in the past, setting up my blog is the one that really illustrates how I learn about technology. Before this setup I had redundant web servers with a load balancer running HAPROXY, mostly just because I wanted to learn how to use HAPROXY.

The amount of free/cheap technologies out there is truly astounding, and none of this would be possible without open source software, and people curious enough and driven enough to create these wonderful things. While I know this setup isn’t for everyone, I hope someone finds some of this useful. or it inspires someone to try hosting their blog on their own.


Lets talk! Start a conversation about this post on Twitter