GoAccess on Dreamhost

In the most recent reboot of the site, I had ripped out all JavaScript, including Google Analytics. To be honest, it wasn’t a big loss since I can’t even remember when I had last looked at the Analytics data.

But it’s good to have some understanding of what’s happening, if only to detect and weed out badly behaved crawler bots.

As I have occasionally mentioned here, this site runs on a shared host provided by Dreamhost. I know that sounds awfully outmoded in this day and age of containerizing everything! But I’ve had antrix.net on shared hosting for more than fifteen years now and if it ain’t broke …

Anyway, Dreamhost provides web site statistics out of the box that rely on server side access logs to generate reports. While it works, it is based on Analog which, true to its origins, looks like something from the 90s.

Could I replace Analog stats with something a bit more modern? Could I do it while ignoring the irony that I’ll be fixing something that isn’t broken?

GoAccess appears to be the new kid on the block of open source web log analyzers. It seemed simple enough to configure and use that I decided to give it a Go.

The rest of this post describes how I set it up on Dreamhost. If you want to jump ahead, here’s the final result: stats.antrix.net

The first step was to compile and install goaccess:

$ mkdir ~/goaccess
$ cd ~/goaccess/
$ wget https://tar.goaccess.io/goaccess-1.4.5.tar.gz
$ tar xvzf goaccess-1.4.5.tar.gz
$ cd goaccess-1.4.5
$ ./configure --prefix=$HOME/goaccess/ --enable-utf8
$ make
$ make install

Next, I created the stats.antrix.net subdomain using Dreamhost’s domain management UI, making sure to use $HOME/stats.antrix.net/ as the web server root.

After that, I created a simple script that will call goaccess to parse the most recent access logs and generate an html report in the web server root directory.

$ cat $HOME/goaccess/gen-access-report.sh
#!/bin/bash

${HOME}/goaccess/bin/goaccess --db-path ${HOME}/goaccess/data/ --persist --restore --log-format=COMBINED --anonymize-ip --keep-last=90 --real-os ${HOME}/logs/antrix.net/http/access.log.0 -o ${HOME}/stats.antrix.net/index.html 1>${HOME}/goaccess/cronjob.log 2>&1

if [[ $? -ne 0 ]]; then
    echo "goaccess execution failed"
    echo "========================="
    cat ${HOME}/goaccess/cronjob.log
    exit 1
fi

The script does a few things:

Finally, I setup a cronjob to run the script once a day.

$ crontab -l
MAILTO="xxxxxxxxxxx@xxxxx.com"
0 7 * * * /bin/bash ${HOME}/goaccess/gen-access-report.sh

If there’s any error, the failure output printed by the script is emailed to the MAILTO address.

That’s about it. Relatively painless to setup and it seems to be working well so far: stats.antrix.net