First thing out of the way… Why is a network engineer working on openshift???
Well, I currently use Talos linux as my “Production” Kubernetes cluster in my homelab. I have been wanting to try OpenShift for awhile, but I couldn’t get it working fully each time I tried. Thankfully I found this blog post https://itnext.io/guide-installing-an-okd-4-5-cluster-508a2631cbee . This post will be about what I did different to get it working. I still have alot to learn about OpenShift.
The install process
The install proccess can be confusing at first, but I removed the pfsense vm out of the mix which helped me troubleshoot issues since the UDM Pro that I use for my edge firewall/router isn’t that great a troubleshooting issues that are outside of it’s ecosystem. I may change that in the future. I’m still thinking about it.
I used my current DNS server for DNS resolution as I felt that it wouldn’t make much sense to install a DNS server that would only have 20~ records in it. It would also have to forward upstream for everything else anyway.
For haproxy I pretty much followed the guide. I did change the records to the way I wanted them named.
Finally I got to the installation. The initial bootstrap install was successful so I continued on to the control plane nodes. Once they installed the rebooted a few times then joined the cluster.
Here’s where it stopped being smooth. One thing that I had issues with was the openshift-install
application. It would sit at waiting on bootstrap even when the control plane is ready. So eventually I just booted the worker nodes and attempted to bootstrap them. The issue I ran into was the worker nodes were trying to contact the bootstrap node because of the haproxy config. So I removed the config and restarted the service. The worker nodes then began to join.
Next was getting to the console. When running oc get clusteroperators
there were services that weren’t starting. I should have looked into why etcd was complaining about “3 out of 4 members are available” instead I started with why the console wasn’t starting. The console couldn’t contact the authentication operator. Looking into that wasn’t providing much help so I finally looked at the etcd operator. Then I thought about the bootstrap being powered on still. I tried powering it off and it worked. Once I logged in and confirm everything was up from the base install I moved on to the post install tasks.