Skip to content

Instantly share code, notes, and snippets.

@alexellis
Last active September 24, 2024 14:36
Show Gist options
  • Save alexellis/fdbc90de7691a1b9edb545c17da2d975 to your computer and use it in GitHub Desktop.
Save alexellis/fdbc90de7691a1b9edb545c17da2d975 to your computer and use it in GitHub Desktop.
K8s on Raspbian
#!/bin/sh
# This installs the base instructions up to the point of joining / creating a cluster
curl -sSL get.docker.com | sh && \
sudo usermod pi -aG docker
sudo dphys-swapfile swapoff && \
sudo dphys-swapfile uninstall && \
sudo update-rc.d dphys-swapfile remove
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list && \
sudo apt-get update -q && \
sudo apt-get install -qy kubeadm
echo Adding " cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory" to /boot/cmdline.txt
sudo cp /boot/cmdline.txt /boot/cmdline_backup.txt
orig="$(head -n1 /boot/cmdline.txt) cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory"
echo $orig | sudo tee /boot/cmdline.txt
echo Please reboot

Use this to setup quickly

# curl -sL \
 https://gist.githubusercontent.com/alexellis/fdbc90de7691a1b9edb545c17da2d975/raw/b04f1e9250c61a8ff554bfe3475b6dd050062484/prep.sh \
 | sudo sh
@cglazner
Copy link

To get the current flannel manifest (https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml) to work on 1.12.2 I had to apply the patch suggested here:

kubectl patch daemonset kube-flannel-ds-arm \
      --namespace=kube-system \
      --patch='{"spec":{"template":{"spec":{"tolerations":[{"key": "node-role.kubernetes.io/master", "operator": "Exists", "effect": 
      "NoSchedule"},{"effect":"NoSchedule","operator":"Exists"}]}}}}' ```

@Ocramius
Copy link

Ocramius commented Dec 3, 2018

I just went through the entire process, and as of today (2018-12-03), it is very very unstable and fragile.

Recapping for anybody that is spending sleepless nights on it:

  1. followed https://gist.github.com/alexellis/a7b6c8499d9e598a285669596e9cdfa2 - my nodes are called ocramius-k8s-pi-1 (192.168.1.110) and ocramius-k8s-pi-2 (192.168.1.111)

  2. followed steps above until before kubeadm init (note: as I'm writing, I have v1.12.3 installed)

  3. had to downgrade docker-ce to 18.06.0 on all hosts, due to kubernetes/minikube#3323. To do that, I followed:

    curl -sSL get.docker.com | sh && \
    sudo usermod pi -aG docker
    newgrp docker
    apt purge -y docker-ce && apt-autoremove -y
    apt install docker-ce=18.06.0~ce~3-0~raspbian

    Note that ignoring the preflight checks with --ignore-preflight-errors=SystemVerification won't work, since something changed in how dockerd handles temporary files. Make sure that docker version reports 18.06.0:

    pi@ocramius-k8s-pi-1:/home/pi# docker version
    Client:
     Version:           18.06.0-ce
     API version:       1.38
     Go version:        go1.10.3
     Git commit:        0ffa825
     Built:             Wed Jul 18 19:19:46 2018
     OS/Arch:           linux/arm
     Experimental:      false
    
    Server:
     Engine:
      Version:          18.06.0-ce
      API version:      1.38 (minimum version 1.12)
      Go version:       go1.10.3
      Git commit:       0ffa825
      Built:            Wed Jul 18 19:15:34 2018
      OS/Arch:          linux/arm
      Experimental:     false
    
  4. went with the flannel setup (couldn't get weavenet to work):

    sudo kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.110
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml
    sudo sysctl net.bridge.bridge-nf-call-iptables=1
  5. kubectl get pods --namespace=kube-system will report something like following:

    pi@ocramius-k8s-pi-1:/home/pi# kubectl get pods --namespace=kube-system
    NAME                                        READY   STATUS              RESTARTS   AGE
    coredns-576cbf47c7-9bp9s                    0/1     ContainerCreating   0          2m29s
    coredns-576cbf47c7-jmgf5                    0/1     ContainerCreating   0          2m29s
    etcd-ocramius-k8s-pi-1                      1/1     Running             0          110s
    kube-apiserver-ocramius-k8s-pi-1            1/1     Running             1          95s
    kube-controller-manager-ocramius-k8s-pi-1   1/1     Running             0          106s
    kube-proxy-t4qc7                            1/1     Running             0          2m29s
    kube-scheduler-ocramius-k8s-pi-1            1/1     Running             0          2m44s
    

    Inspecting the pods that are in ContainerCreating status, you will get something like:

    kubectl describe pods coredns-576cbf47c7-9bp9s --namespace=kube-system
    <snip>
    Events:
       Type     Reason           Age                  From                        Message
     ----     ------           ----                 ----                        -------
     Normal   Scheduled        2m41s                default-scheduler           Successfully assigned kube-system/coredns-576cbf47c7-9bp9s to ocramius-k8s-pi-1
     Warning  NetworkNotReady  1s (x13 over 2m41s)  kubelet, ocramius-k8s-pi-1  network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]
    
  6. applied the patch suggested by @cglazner right above me:

    kubectl patch daemonset kube-flannel-ds-arm \
      --namespace=kube-system \
      --patch='{"spec":{"template":{"spec":{"tolerations":[{"key": "node-role.kubernetes.io/master", "operator": "Exists", "effect": 
      "NoSchedule"},{"effect":"NoSchedule","operator":"Exists"}]}}}}'
  7. system will recover:

    pi@ocramius-k8s-pi-1:/home/pi# kubectl get pods --namespace=kube-system
    NAME                                        READY   STATUS    RESTARTS   AGE
    coredns-576cbf47c7-9bp9s                    1/1     Running   0          31m
    coredns-576cbf47c7-jmgf5                    1/1     Running   0          31m
    etcd-ocramius-k8s-pi-1                      1/1     Running   0          30m
    kube-apiserver-ocramius-k8s-pi-1            1/1     Running   1          30m
    kube-controller-manager-ocramius-k8s-pi-1   1/1     Running   0          30m
    kube-flannel-ds-arm-pznfc                   1/1     Running   0          19m
    kube-proxy-t4qc7                            1/1     Running   0          31m
    kube-scheduler-ocramius-k8s-pi-1            1/1     Running   0          31m
    pi@ocramius-k8s-pi-1:/home/pi# kubectl get nodes
    NAME                STATUS   ROLES    AGE   VERSION
    ocramius-k8s-pi-1   Ready    master   32m   v1.12.3
    
  8. can now join other nodes (in my case ocramius-k8s-pi-2):

    sudo sysctl net.bridge.bridge-nf-call-iptables=1
    kubeadm join 192.168.1.110:6443 --token <snip> --discovery-token-ca-cert-hash sha256:<snip>
  9. verify status:

    pi@ocramius-k8s-pi-1:/home/pi# kubectl get nodes
    NAME                STATUS   ROLES    AGE   VERSION
    ocramius-k8s-pi-1   Ready    master   37m   v1.12.3
    ocramius-k8s-pi-2   Ready    <none>   70s   v1.12.3
    

@HugoRh
Copy link

HugoRh commented Dec 10, 2018

Very nice!
Indeed the patch mentionned by cglazner did fix the flannel issue.
I'm on k8s 1.13.0 with flannel 0.10-arm:

#kubectl get no
NAME STATUS ROLES AGE VERSION
pi-master Ready,SchedulingDisabled master 53m v1.13.0
pi3-slave-01 Ready worker 46m v1.13.0
pi3-slave-02 Ready worker 46m v1.13.0
pi3-slave-03 Ready worker 46m v1.13.0
pi3-slave-04 Ready worker 46m v1.13.0

I use the below code to create the cluster:

--On Master node--

root:
kubeadm init --token-ttl=0 --pod-network-cidr=10.244.0.0/16
myuser :
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

curl -sSL https://rawgit.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml| sed "s/amd64/arm/g" | kubectl create -f -

kubectl -n kube-system patch daemonset kube-flannel-ds
--patch='{"spec":{"template":{"spec":{"tolerations":[{"key": "node-role.kubernetes.io/master", "operator": "Exists", "effect": "NoSchedule"},{"effect":"NoSchedule","operator":"Exists"}]}}}}'

sysctl net.bridge.bridge-nf-call-iptables=1

--On Slaves node--

kubeadm join 192.168.x.x:6443 --token --discovery-token-ca-cert-hash sha256:

--On Master node--

kubectl cordon pi-master
kubectl label node pi3-slave-01 node-role.kubernetes.io/worker=
kubectl label node pi3-slave-02 node-role.kubernetes.io/worker=
kubectl label node pi3-slave-03 node-role.kubernetes.io/worker=
kubectl label node pi3-slave-04 node-role.kubernetes.io/worker=

Copy link

ghost commented Dec 11, 2018

Have been playing around with this over the weekend, really enjoying the project!

I hit a block with Kubernetes Dashboard, and realised that I couldn't connect to it via proxy due to it being set as a ClusterIP rather than a NodeIP.

* Edit `kubernetes-dashboard` service.
$ kubectl -n kube-system edit service kubernetes-dashboard
* You should the see yaml representation of the service. Change type: **ClusterIP** to type: **NodePort** and save file.

* Check port on which Dashboard was exposed.
$ kubectl -n kube-system get service kubernetes-dashboard
NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes-dashboard   NodePort   10.108.252.18   <none>        80:30294/TCP   23m
* Create a proxy to view within your browser
$ ssh -L 8001:127.0.0.1:31707 [email protected]
* Browse [localhost:8001](http://localhost:8001)

Thanks again Alex!

Many thanks Alex for a great post and thanks to Denham I incorporated your comments to my dashboard config with help from this post to get me up and running https://kubecloud.io/kubernetes-dashboard-on-arm-with-rbac-61309310a640 having experienced the same issues with RBAC permissions in the post. (from...

$ kubectl create serviceaccount dashboard....

I hope this helps anyone who is getting to grips with K8S RPI cluster!! ::D

@b3nw
Copy link

b3nw commented Dec 18, 2018

New problem seems to have cropped up on flannel: flannel-io/flannel#1060

vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false E1218 03:12:12.719715 1 main.go:280] Error registering network: failed to configure interface flannel.1: failed to ensure address of interface flannel.1: link has incompatible addresses. Remove additional addresses and try again. &netlink.Vxlan{LinkAttrs:netlink.LinkAttrs{Index:5, MTU:1450, TxQLen:0, Name:"flannel.1", HardwareAddr:net.HardwareAddr{0x2, 0x2a, 0x1c, 0x2f, 0x61, 0x25}, Flags:0x13, RawFlags:0x11043, ParentIndex:0, MasterIndex:0, Namespace:interface {}(nil), Alias:"", Statistics:(*netlink.LinkStatistics)(0x13a320e4), Promisc:0, Xdp:(*netlink.LinkXdp)(0x13812200), EncapType:"ether", Protinfo:(*netlink.Protinfo)(nil), OperState:0x0}, VxlanId:1, VtepDevIndex:2, SrcAddr:net.IP{0xc0, 0xa8, 0x3, 0x32}, Group:net.IP(nil), TTL:0, TOS:0, Learning:false, Proxy:false, RSC:false, L2miss:false, L3miss:false, UDPCSum:true, NoAge:false, GBP:false, Age:300, Limit:0, Port:8472, PortLow:0, PortHigh:0}

To allow the pod to start successfully, SSH onto the worker and run sudo ip link delete flannel.1. Recreating the pod will then start successfully.

@andyburgin
Copy link

andyburgin commented Dec 23, 2018

Install Kubernetes 1.13.1 on Raspberry Pi Cluster

This comment combines the knowledge of this gist and the many comments above plus the workings of https://github.com/aaronkjones/rpi-k8s-node-prep

Download Raspbian Stretch Lite (2018-11-13 4.14 kernel), flash to sd cards for your cluster (in my case 5 cards).Once flashed and BEFORE you boot the pis, set up the networking by mounting the sd card (you can of course just boot the Pis and setup networking on each machine, my personal preference is to do this in advance):

Turn on ssh: sudo touch <boot partition mount point>/ssh

Enable C-Groups: sudo vi <boot partition mount point>/cmdline.txt

dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=7ee80803-02 rootfstype=ext4 elevator=deadline fsck.repair=yes group_enable=cpuset cgroup_memory=1 cgroup_enable=memory rootwait quiet init=/usr/lib/raspi-config/init_resize.sh

I'm using wired networking on a 192.168.2.xx subnet so setup the host entries: sudo vi <rootfs partition mount point>/etc/hosts

 	...add to bottom of file...
	192.168.2.31       node01
	192.168.2.32       node02
	192.168.2.33       node03
	192.168.2.34       node04
	192.168.2.35       node05

sudo vi <rootfs partition mount point>/etc/dhcpcd.conf

	...add to bottom of file...
    interface eth0
	static ip_address=192.168.2.XX/24   <--- change XX = 31,32,33,34,35
	static routers=192.168.2.1
	static domain_name_servers=192.168.2.1

Unmount the sd card, then in turn mount the other 4 sd cards repeating the steps above changing the ip address. Once you have setup all 5 sdcards put them into the pis and power on the cluster. SSH to each in turn and complete the configuration and install the software with the below steps.

ssh [email protected]

sudo -i
hostnamectl set-hostname nodeXX <- change to node01/02/03/04/05 as appropriate
apt-get update
apt-get upgrade -y

curl -s https://download.docker.com/linux/raspbian/gpg | sudo apt-key add -
echo "deb [arch=armhf] https://download.docker.com/linux/raspbian stretch edge" | sudo tee /etc/apt/sources.list.d/socker.list
apt-get update -q
apt-get install -y docker-ce=18.06.0~ce~3-0~raspbian --allow-downgrades
echo "docker-ce hold" | sudo dpkg --set-selections
usermod pi -aG docker

dphys-swapfile swapoff
dphys-swapfile uninstall
update-rc.d dphys-swapfile remove

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update -q
apt-get install -y kubeadm=1.13.1-00 kubectl=1.13.1-00 kubelet=1.13.1-00

reboot

Setup master node01

ssh [email protected]

sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.2.31 --kubernetes-version v1.13.1

Save the join token and token hash, it will be needed in "Setup slave nodes02-05"

Make local config for pi user, so login as pi on node01

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Check it's working (except the dns pods wont be ready)

kubectl get pods --all-namespaces

Setup kubernetes ovverlay networking

kubectl apply -f https://git.io/weave-kube-1.6
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Setup slave nodes02-05

Join the cluster using the join token and token hash when you ran kubeadm on node01

sudo kubeadm join 192.168.2.31:6443 --token xxxxxxxxxxxxxxxxxxxxxxxx --discovery-token-ca-cert-hash sha256:yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

Back on the master node01 check the nodes have joined the cluster and that pods are running:

kubectl get nodes
kubectl get pods --all-namespaces

Deploy dashboard

Deploy the tls disabled version of the dashboard

echo -n 'apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kubernetes-dashboard
  labels:
    k8s-app: kubernetes-dashboard
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system' | kubectl apply -f -

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/alternative/kubernetes-dashboard-arm.yaml

To access the dashboard start the proxy on node01:
kubectl proxy --address 0.0.0.0 --accept-hosts '.*'

Then from your pc point your browser at:
http://192.168.2.31:8001/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/

@janpieper
Copy link

@andyburgin I followed your instructions, but I can't get the master node running...

The kubeadm init [...] did not finish:

[...]
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
        - 'docker ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

I waited for some time until all pod were "Running":

$ kubectl get pods --all-namespaces

NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   etcd-msh-master                      1/1     Running   0          105s
kube-system   kube-apiserver-msh-master            1/1     Running   5          107s
kube-system   kube-controller-manager-msh-master   1/1     Running   0          115s
kube-system   kube-scheduler-msh-master            1/1     Running   0          76s

(is it possible that kube-dns and kube-proxy are missing?)

Then I applied the two weave-net files you mentioned:

kubectl apply -f https://git.io/weave-kube-1.6
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

But the weave-net pod will not become "Running"...

ERROR: logging before flag.Parse: E0120 16:25:32.259195   11085 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:319: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
ERROR: logging before flag.Parse: E0120 16:25:32.267598   11085 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:320: Failed to list *v1.NetworkPolicy: Get https://10.96.0.1:443/apis/networking.k8s.io/v1/networkpolicies?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
ERROR: logging before flag.Parse: E0120 16:25:32.274948   11085 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

10.96.0.1 seems to be the kubernetes service IP:

$ kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   17m

@janpieper
Copy link

Oooookayy... I finally managed to get it working \o/

I wrote a small bash script that checks for /etc/kubernetes/manifests/kube-apiserver.yaml to update failureThreshold (new value: 100) and initialDelaySeconds (new value: 1080) as soon as the file exists. The new values are much bigger than they need to be, but they allowed my to get my master node up and running! Whenever I tried to change these values this by hand, the kubeadm init ... command failed.

@ejeklint
Copy link

ejeklint commented Feb 3, 2019

I just set up a working cluster but couldn't get the master running on an RPI 2. Moved SD card over to a RPI 3 and then kubeadm init ran just fine. The worker node seem to run just fine on the RPI 2.

@rnbwkat
Copy link

rnbwkat commented Feb 4, 2019

Wondering if anyone has gotten helm/tiller working in this configuration?

@GarethOates
Copy link

@rnbwkat I got it working but I had to specify a different tiller image, one which was compatible with ARM. The command I used was:

helm init --service-account tiller --tiler-image=jessestuart/tiller:v2.9.0

@oprwiz
Copy link

oprwiz commented Feb 10, 2019

@janpieper. I’ve run into the “node not found”. looking through all the comments I was going to follow the save steps you did. I wonder what versions of k8s and docker you’ve installed

@Jurgen-Allewijn
Copy link

I tried to setup the cluster following the steps described but still didn't get a succesful kubeadm init. I tried different versions of k8s and docker. Is there somebody who has the steps to get 1.13-3 working with 18.09.0

@Jurgen-Allewijn
Copy link

@janpieper can you share the script?

@sinfloodmusic
Copy link

sinfloodmusic commented Jul 11, 2019

@janpieper steps worked up until the point everyone mentioned, and rather than the script that polls and zaps the config, I found you can do the same (after the initial failure) by running these commands (lifted from this issue kubernetes/kubeadm#1380)

sudo kubeadm reset
sudo kubeadm init phase certs all
sudo kubeadm init phase kubeconfig all
sudo kubeadm init phase control-plane all --pod-network-cidr 10.244.0.0/16
sudo sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g'             /etc/kubernetes/manifests/kube-apiserver.yaml
sudo sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g'            /etc/kubernetes/manifests/kube-apiserver.yaml
sudo kubeadm init --v=1 --skip-phases=certs,kubeconfig,control-plane --ignore-preflight-errors=all --pod-network-cidr 10.244.0.0/16

Then I installed flannel.

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.11.0/Documentation/kube-flannel.yml

Something that threw me off was the shell demo that Kubernetes provides works fine (kubectl apply -f https://k8s.io/examples/application/shell-demo.yaml) docs here:
https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/

But it fails when doing a deployment of nginx from their example here:
https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/

Turns out the nginx image isn't compatible with ARM, once I changed the image to a pi supported image (tobi312/rpi-nginx
) it worked fine! Thanks to everyone here, I finally got my pi cluster going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment