Part 2 of Kubernetes in LXD - Working but unstable...

Last weekend I managed to get Kubernetes running in LXD, but not using Kubernetes the Hard Way. I'll explain the journey in a little bit, but first I want to step ahead to today and say I'm putting the rest of this project on hold temporarily. Not putting K8s at home on hold, but running it in general in LXC containers.

What's the problem? Honestly not sure, but if I had to put actual money on a wager I'd say there's weird incompatibilities between Kernel, OS version and LXD from snap.

I managed to get Microk8s running in a 3 node cluster. Each node had 2GB ram max memory and could utilize 80% of one core, mostly to try and keep the contention for workloads under control later down the line. Node 3 was never happy. constantly hanging and unresponsive, but if I logged into it, it was as happy as the other 2. Backed down to 2 nodes and it was better, but not great. Started fresh and the node seemed happy, but the weird ness that I was seeing was still happening and the only solution was to restart microk8s, which isn't exactly sustainable model in the long run.

Where to now?

I'm still going to play with K8s at home on my server, but I'm going to test both K3s and Microk8s and see which is better for my tiny machine (the money right now appears to be on K3s).


The Journey

I started with following this guide which was for setting up Kubernetes the hard way, but on bare metal. Figuring that it would be the easiest place to start. It was going smoothly until I hit an issue where I couldn't get etcd to start up or talk to any of the other nodes. I tried setting it up as a single control plane and worker to mitigate that but it still wasn't starting up and I couldn't find anything as to why. I thought maybe it was cert related so I repurposed an ansible script I found that did this to handle the setup with no change. Instead I took an old laptop I had laying around and managed to get k8s the hard way working with a vagrant setup. Hooray...

I scrapped the containers and found a git page that described getting up and running K3s in LXD. Between their page and some of the stuff I originally had setup for docker I was able to get half of the way to a working solution. The problem this time was that I think we were running 2 different versions of both k3s and lxd, with the bigger problem being the k3s version. the latest stable k3s comes with a version of containerd that expects to be able to read the disk for the kernel modules instead of polling the kernel directly. Unfortunately, the way the kernel modules passthrough works, those files don't make it into the container. I didn't play with the profile enough to try and get it working, but that's mostly the reason that I'm not giving up long term on it.

Then it hit me.

Yes, the lightbulb actually went off.

Microk8s. By design it should just work, no? For those unaware, Microk8s is a flavor of Kubernetes built by Canonical (the fabulous folks behind Ubuntu) to run on pretty much every version of Linux out in the wild, as long as it supports Snapd. They've recently started rolling out Multipass which is like a global version of WSL for everywhere. Which in turn would let your run microk8s on Mac. Snap programs are all self contained and by default have all the pieces that they need to run. In theory, this means that you'd be able to get around any version compatibility and be able to run microk8s in LXD. The only thing that is needed to work properly is to make sure the security settings for apparmor and  things like nf_conntrack are in the profile for the container. And it did. I was able to spin up a cluster, deploy a dashboard and a small hello world app and access it using kubectl proxy.

Here's a copy of the profile I used, sans personal bits like cloud init (which might be worth showing in a future post):

config:
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,br_netfilter
  raw.lxc: |-
    lxc.mount.auto=proc:rw sys:rw
    lxc.apparmor.profile = unconfined
    lxc.cgroup.devices.allow = a
  security.nesting: "true"
  security.privileged: "true"
  description: Profile supporting docker in containers
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /sys/module/nf_conntrack/parameters/hashsize
    type: disk
  aadisable2:
    path: /dev/kmsg
    source: /dev/kmsg
    type: disk
  eth0:
    name: eth0
    nictype: macvlan
    parent: macvlan0
    type: nic
  root:
    path: /
    pool: lxd
    type: disk
name: docker
used_by: []

Back to the hair pulling

At this point I'm back to the beginning of the post. The third node was unstable, so I deleted it. kubectl started complaining about second node unreachable via DNS, so I edited /etc/hosts. And finally the part that made me start questioning my sanity. Pods stuck in terminating state no matter what you do. Anytime I tried to delete a deployment, or update a deployment, any of the connected pods would just be shown as Terminating. I tried to force their removal following this, but the command would just hang after saying removing pod XXXX. The only way to get them to remove was for me to restart the cluster. I couldn't find anything in the logs to point me at the cause, but after a full week of playing and messing with the cluster to try and get it in a working state I need to step away temporarily.


What's next?

Right now, the next step is to figure out what's the best option for my current setup outside of LXD. I still have some other methods to play around with to get this working. A couple is to mount the troublesome kernel modules directly into the containers similar to the nf_conntrack, expose more volatility into the profile to see if that makes it more stable (ironic, I know), and there's also Charmed Kubernetes using Conjure and Juju. Part of me is thinking CKD is probably going to be a jfw situation, except the impression I got is that it might be too resource hungry for my setup.

Either way, there will be more information about what comes next.