Static Hosting Part I - Thank you, Mr. Goldberg

Years ago I built my own CI-CD 🔗 solution for a moderately complex static website.

The solution works, and I learned a great deal from building it, but it came to feel like own private Rube Goldberg machine 🔗 that nobody but its creator could appreciate. Over the years I’ve grown concerned for its longevity, but before I give it up for a more cloud-native solution I felt it deserved a write-up.

Solution Design

Here’s what the solution looks like from the user’s perspective:

sequenceDiagram autonumber participant A as Author participant G as Gitolite participant S as Script participant C as Flask+Celery A->>G: Push activate G G->>S: Git hook activate S S->>C: Submit Job
to HTTP addr loop Poll S->>C: Done yet? C-->>S: Eventually yes end S->>G: Reply is available deactivate S G->>A: Reply sent deactivate G

A contributor submits work in the form of a Git commit. Since the git server runs the hook script synchronously, the user gets the results inside the “git push”.

Author pushes a Git commit.
A configurable hook gets triggered (see blue-green.sh below).
The hook script submits a job via HTTP request to a Flask service running Celery.
… then polls the Flask+Celery server, again via HTTP, to check for completion.
Eventually the Celery task status is updated (see worker flow below).
The reply gets back to Gitolite …
… and eventually to the user.

For quick builds this was great, but this “in-line” approach was a problem once the site got busy and build times started to take longer. While it appeared to be synchronous to the user, the job submission process was queued using Celery and Redis, and then the Git Hook would poll for results.

sequenceDiagram autonumber participant FC as Flask+Celery participant R as Redis participant W as Worker W->>R: Subscribe to task list FC->>R: Submit Job par Worker process R->>W: Task notification W->>W: Run build script
found in repo W->>R: Status update and Script polls loop Poll from Hook script FC->R: Job status? R-->>FC: Status end end

This is what was happening behinds the scenes:

One or more worker processes subscribe to receive task events
A task arrives.
Redis notifies subscribers.
The Celery worker runs the build it finds inside the repository (e.g. a Bash script that calls webpack or Hugo).
Status is returned.
The Git hook script that kicked it off has been polling for results.
Results trickle in as available.

Example Scripts

I was able to host the entire process using Docker inside a small VM. Using Gitolite 🔗 made it possible to create new repositories based on permissions, and those repos could have default and/or custom git hooks 🔗 .

# Sample Gitolite website.conf
@webmaster = wm-key

repo site-sample site-search site-home
    RW+                   = alice bob charlize don
    -   master            = @webmaster
    -   refs/tags/v[0-9]  = @webmaster
    RW+ contrib/USER/     = @webmaster
    option hook.post-receive    =   blue-green.sh  # <-- this is the good bit

The hook script could be anything, but in my case I was hosting multiple processes in Rancher 1.2 🔗 , back when it was using the Cattle 🔗 orchestrator. An example script for doing blue-green deployment looked something like this:

#!/usr/bin/env bash

# blue-green.sh

builder_host="http://localhost:5050"
outputurl="${builder_host}/run_build"
sitename="my-site"

VAULT_TOKEN=$(cat /etc/vault_token | head -n 1 | awk '{print $1}')

# Email Credentials are in Hashicorp Vault. Always be careful with credentials!
GMAIL_PASSWD=$(curl -s -X GET \
    -H "X-Vault-Token: ${VAULT_TOKEN}" \
    http://127.0.0.1:8200/v1/secret/gmail_password_rjpw \
    | jq -r .data.value)

# associative array to toggle between blue and green configs
declare -A OTHER_CONFIG
OTHER_CONFIG=([green]=blue [blue]=green)

function update_search_index () {
  # gets the correct Rancher PID for a site and runs a custom data upload script on it
  SEARCH_INSTANCE_NAME=$(rancher ps | grep ${sitename}/searchapi-${LIVE_SIDE} | awk '{print $1}')
  RANCHER_CMD="rancher exec -it ${SEARCH_INSTANCE_NAME} node bulk_load.js"
  RELOAD_RESULT=$(bash -c "${RANCHER_CMD}")
  echo -e "Command: ${RANCHER_CMD}" >> ${TEMPFILE}
}

function toggle_live_config () {

  GOT_RANCHER=$(which rancher)
  echo -e "Rancher path: ${GOT_RANCHER}" >> ${TEMPFILE}

  # get name of the container where ${sitename} web is running
  LIVE_INSTANCE_NAME=$(rancher ps | grep ${sitename}/web-live | awk '{print $1}')
  OFFLINE_INSTANCE_NAME=$(rancher ps | grep ${sitename}/web-offline | awk '{print $1}')

  # tell nginx to reload config (it got updated by the builder)
  if [[ ${LIVE_INSTANCE_NAME} ]]; then
    rancher exec -it ${LIVE_INSTANCE_NAME} nginx -s reload
    rancher exec -it ${OFFLINE_INSTANCE_NAME} nginx -s reload
    echo -e "Live nginx instance ${LIVE_INSTANCE_NAME}" >> ${TEMPFILE}
  else
    echo -e "Unable to find live instance to reload using Rancher PS" >> ${TEMPFILE}
  fi

}

# get inputs from the post-receive hook from GIT
while read oldrev newrev ref
do

  # Running on the current host, the repository path (PWD) at runtime is
  # as shown (/export/repositories). We want just the project path
  PROJECT_PATH=${PWD/#\/export\/repositories/\/""}
  # trim inexplicably remaining double slash
  PROJECT_PATH=${PROJECT_PATH/#\/\//""}

  OUTPUT="{\
\"repository\": \"${PROJECT_PATH}\", \
\"commit\": \"${newrev}\", \
\"ref\": \"${ref}\", \
\"passwd\": \"${GMAIL_PASSWD}\"
}"

  # serialize as base64 to eliminate special HTTP characters
  BASE64_OUTPUT=$(echo ${OUTPUT} | base64)
  TEMPFILE=$(tempfile)
  echo -e ${OUTPUT} >> ${TEMPFILE}

  # get the URL to check
  CHECK_URL=$(curl -s -XPOST -d "update=${BASE64_OUTPUT}" ${outputurl})
  CHECK_RESULT=$(curl -s -XGET "${CHECK_URL}")

  # loop until you stop getting PENDING
  while [ "${CHECK_RESULT}" == "PENDING" ]; do
    sleep 2
    CHECK_RESULT=$(curl -s -XGET "${CHECK_URL}")
  done

  # get the desired live side from the builder messages
  LIVE_SIDE_REGEX='Enable\: ([[:alpha:]]*)'
  [[ ${CHECK_RESULT} =~ ${LIVE_SIDE_REGEX} ]] && LIVE_SIDE=${BASH_REMATCH[1]}

  # if we get an "Enable: blue|green" signal then we are toggling live and idle
  # update search index
  if [ ! -z "${LIVE_SIDE}" ]; then
    IDLE_SIDE=${OTHER_CONFIG[$LIVE_SIDE]}
    update_search_index
    toggle_live_config
    echo -e "Enabling ${LIVE_SIDE}" >> ${TEMPFILE}
  else
    echo -e "Hook script ended without Enable: blue|green message!" >> ${TEMPFILE}
  fi

Flask and Celery

If you are curious about the code for the Flask+Celery server, it has just been posted on GitHub 🔗 . Caveat emptor: don’t be impressed because it’s free. It works, but as this post alluded to, there are better solutions today.

Pros and Cons

The most important element of this build was that it was a completely private solution, but it offered a great deal of evolutionary prospects, and it had decent prospects for scaling out should that have been needed.

The whole assembly was a joy to build due to the flexibility offered by Gitolite and the abundant advice one can find on how to create a Flask+Celery build process. The service itself is stable and fast, with the only thing making it seem slow was the synchronous nature of the scripts that were deployed. With different scripts and more of a loosely-coupled response channel (e.g. via Slack), these issues would go away.

The downside of this build is that it requires a host, and then that host needs configuration. That makes it: 1) a cost, and 2) a single point of failure. Point 2 can be mitigated by use of a configuration tool like Ansible, and I may do that if I’m bored, but these days I am leaning toward cloud-native solutions like GitHub Actions 🔗 . If you are committed to a self-hosted solution like this, I’ve had great success using Gogs (or Gitea) and Drone. This is much more like GH Actions, and therefore good preparation for thinking in cloud-native terms. If you need them on a single VM (which isn’t very cloud-like, is it?), you can use something like Traefik or Kong to route a single HTTP port to these multiple back-ends.

Upcoming Post

I mention above that the perception of slowness that came from using synchronous operations could be easily corrected by making the trigger more of a “fire and forget” event, and then following that up with a reponse channel via Slack or some other tool. This idea and others will be discussed in Part II.

Banner photo by Kumiko SHIMIZU on Unsplash