Upgrading the Blog – Software design struggles
It's been a bit, and it's been busy. This one will definitely feel a little more rambly.
A few months back when I started writing up content, the blog platform software that I use to run the blog, Ghost, started not subtly letting me know that the latest version of the software was available. With a catch, it was not compatible with my current database. When I spun up the blog originally, MariaDB was the more performant option, with little to no differences between the original MySQL. MariaDB has been making great strides in continuing to improve performance. Although, this means the feature-set is going to continue to shift. I saw something a few months back, regarding the next version of MySQL was going to be fundamentally different enough, that MariaDB would start to no longer be the drop-in replacement. (I didn't save the link because it didn't seem relevant at the time ☹️).
(drake meme, with ghost logo, MariaDB vs MySQL)
Ghost seems to be relying on certain features of MySQL for newer features that don't exist in MariaDB. Which means, if I want to keep my platform updated and secure, I will have to put in the work to do this migration. Migrating and upgrading also means a chance to change some underlying things, and redesign workflows.
Current stack vs New stack
Now, fundamentally, there's no major changes.
The current stack is as follows:
Docker via docker-compose
- Traefik (webserver)
- mariadb (db)
- ghost (website)
- watchtower (updater)
- Matomo (monitoring and analytics)
Now, I liked the stack when I originally used it, but for other projects Traefik felt more like a slog. Matomo never worked properly the way I wanted it to. Everything else has been stable.
The new stack:
Docker via Ansible
- Caddy (webserver)
- mysql (db for ghost)
- ghost (website)
- watchtower (container updater)
- shynet (analytics)
- postgres (db for shynet)
- telegraf/grafana (monitoring)
- giscus (comments)
There are a few new faces involved. Some of it I'll go into deep detail with other posts, like using Ansible instead of compose, as well as some other new tricks I picked up through this process.
Monitoring via telegraf and grafana is still in flux, I'll eventually hone down what I want exactly, and will cover it at a later date.
The second-biggest difference, while the smallest in the amount of work, is the switch to Caddy from Traefik. Don't get me wrong, Traefik works wonderfully, but the config was finicky in my experience. It's truly designed with a docker or Kubernetes focus first, which made it hard for me to implement it back when I was running a mix of LXC instances and containers outside the VPS the blog runs on. At the time, there also wasn't a clean way to do DNS challenges for the certs. I think the config has grown a bit and some new features have been added to improve QoL.
I chose Caddy over Nginx with certbot, mostly because I want the cert renewal and everything more integrated. Personally, I prefer using nginx as an ingress controller or in a more traditional environment setup. Besides, the Caddyfile makes it really easy to spin up sites with reverse proxies:
blog.jonnagel.us {
encode gzip
proxy_pass $container_name:2368
}That's it! Just add more blocks as needed for different domains. DNS auth for LetsEncrypt works slightly differently, but not enough to really cover right now. Their documentation is top-notch in my opinion. I have a script that I use for my other VPS's that I'll share in a different post how to somewhat automate the DNS creation with caddy installed and managed by systemd. It could easily be modified to work with a docker environment.
Now, the most exotic thing on that list is probably giscus. Disqus back in 2017 made it only possible to remove scripts and ads from their system for a fee of $10/month originally, now it's $11/month. While that's might seem pretty affordable, I don't have nearly enough traffic per month to justify it, or get a cut from the ad revenue. Ads and injections scripts are known to carry malware, and I personally do not feel comfortable to potentially expose others to that kind of risk.
That's where giscus comes in. It's an interesting app that allows you to leverage GitHub Discussions as a comment system. That means, you don't need to build and manage your own database for users, deal with spam or bots. Authentication is managed by GitHub, making it simple to use. There's a bit of configuration to do to get it setup, but adding it onto the template for the posts was quite simple.
This is all that's needed:
<script src="https://giscus.app/client.js"
data-repo="org/reponame"
data-repo-id="XXXXXXXXXXXX"
data-category="Q&A"
data-category-id="XXXXXXXXXXX"
data-mapping="pathname"
data-strict="0"
data-reactions-enabled="1"
data-emit-metadata="0"
data-input-position="top"
data-theme="preferred_color_scheme"
data-lang="en"
crossorigin="anonymous"
async>
</script>This will get generated by the app's site, and will for the most part walk you through what's needed to get it setup. I just added some additional CSS configuration to get it mostly set up the way I wanted it. I might to more configuration later, so I don't need as many modifications to get it to look how I want. For now, it looks good enough for my needs.
Now, for the meat of the migration. Moving from MariaDB to MySQL. Now, normally, such a migration should be a piece of cake in theory, but there were 2 major things I needed to take care of as part of the migration.
- Collation schema updates
- Manage the additional bloat or memory that MySQL uses compared to MariaDB.
The first one was an unfortunate issue. Like I mentioned, the steps to migrate are quite simple.
- Stop Ghost
- Take a mysqldump backup.
- Spin down MariaDB container and delete volume.
- Create new MySQL container and volume, with backup script mounted as part of the initdb.d folder.
- Wait a little bit and start Ghost again.
Unfortunately, due to the collation changes, I was prompted with errors like the following:
MigrationScriptError: alter table `posts_products` add constraint `posts_products_post_id_foreign` foreign key (`post_id`) references `posts` (`id`) on delete CASCADE - UNKNOWN_CODE_PLEASE_REPORT: Referencing column 'post_id' and referenced column 'id' in foreign key constraint 'posts_products_post_id_foreign' are incompatible.It took some googling, and I was able to parse that the error. MariaDB 10 normally uses the default schema of latin1_swedish_ci, but since the original database setup was caused by an initialization script as part of Ghost, the collation was set to utf8mb4_general_ci when I did my initial installation. MySQL 8 changed their default uft8 collation to utf8mb4_0900_ai_ci.
So, what is one to do? Do a fresh install, migrate the posts one by one and update the dates somehow? Do I update the script from the SQL dump to have the correct collation value? I prefer to work smarter than harder 🙃, so I went with the update the script method, similar to how Andrej did theirs. Instead of using sed, I leveraged the global replace feature in vi. Afterwards, starting up the database and consequently Ghost went smoothly after. Or so I thought…
For reasons, I have been running the blog on a simple $5/month DigitalOcean droplet for a few years now. It's all been running fine for 4 years. During the process of upgrading the OS to the latest LTS and deploying the stack, everything started to hang. Looking at htop in a different terminal, I was able to see that load was low, but memory was completely filled. Since there's not normally swap space on droplets, OOM Killer was killing things temporarily (hooray for restart: unless stopped 🤦). Took a look at docker stats, MySQL was using nearly 550 MB of memory! That will not do, so my options are to either increase the size of the droplet, or try to perform some memory tuning. This time, working smarter is memory tuning, and probably stupidly perform some memory limits on the container. I had docker set the memory limit to 512M.
It's probably not the most efficient config, but here's what I added to the my.cnf that gets applied:
connect_timeout = 5
wait_timeout = 600
max_allowed_packet = 16M
thread_cache_size = 0
sort_buffer_size = 32K
bulk_insert_buffer_size = 0
tmp_table_size = 1K
max_heap_table_size = 16K
myisam_recover_options = BACKUP
key_buffer_size = 2M
table_open_cache = 400
myisam_sort_buffer_size = 512M
concurrent_insert = 2
read_buffer_size = 16K
read_rnd_buffer_size = 16K
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 10
binlog_expire_logs_seconds = 259200
max_binlog_size = 100M
default_storage_engine = InnoDB
# you can't just change log file size, requires special procedure
#innodb_log_file_size = 50M
innodb_buffer_pool_size = 64M
innodb_log_buffer_size = 2M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 400
innodb_flush_method = O_DIRECT
[client]
socket = /var/run/mysqld/mysqld.sock
[mysqldump]
quick
quote-names
max_allowed_packet = 16M
I'll probably tweak it the longer it runs, but for now it seems to tame the beast enough. Memory appears to max out for the container around 240 MB and when everything is idle it settles down around 70 MB.
The total memory usage now is around 750 MB at it's peak.
The only other new thing in the stack is an application called Shynet. It's a very modern, privacy analytics tool. It doesn't use cookies, and respects the DNT setting in browsers. From the GitHub page, here are the exact metrics that it will collect:
* Hits — how many pages on your site were opened/viewed
* Sessions — how many times your site was visited (essentially a collection of hits)
* Page load time — how long the pages on your site look to load
* Bounce rate — the percentage of visitors who left after just one page
* Duration — how long visitors stayed on the site
* Referrers — the links visitors followed to get to your site
* Locations — the relative popularity of all the pages on your site
* Operating system — your visitors' OS (from user agent)
* Browser — your visitors' browser (from user agent)
* Geographic location & network — general location of your visitors (from IP)
* Device type — whether your visitors are using a desktop, tablet, or phone (from user agent)
Looking at how it's configured to work, it uses a combination of an invisible pixel to get the basic information about your session (IP for geolocation, browser type, and OS/Device Type). Then it uses a simple script to track load times and everything else. Now, I personally have it set so that IP is not tracked if you don't have DNT set. I don't need that information, knowing where you might be from and checking out my blog is more than enough.