SpoddyCoder/fix-kube-prometheus-stack-on-docker-dekstop-for-mac.md

## fix-kube-prometheus-stack-on-docker-dekstop-for-mac.md

      
    Raw
  

              fix-kube-prometheus-stack-on-docker-dekstop-for-mac.md
            
          
    Kube-Prometheus-Stack on Docker Desktop for Mac (Sonoma 14.3 + Docker Desktop 4.27)


This can be awkward to setup - by default Prometheus will complain 4 targets are down.
prometheus-community/helm-charts#812
The majority of these fixes come from good work by @LanDinh who went through a similar process on windows: https://groups.google.com/g/prometheus-users/c/_aI-HySJ-xM/m/kqrL1FYVCQAJ
The underlying cause is a problem with the way kubeadm sets up the default cluster: kubernetes/kubeadm#2388

Initial Setup - reconfigure docker-desktop to expose metrics endpoints


Default setup of docker-desktop does not allow Prometheus to scrape metrics from the following kube-system pods;

kube-controller-manager-docker-desktop
etcd-docker-dekstop
kube-scheduler-docker-dekstop
kube-proxy-{id}


Edit the configmap kube-proxy in the kube-system namespace

metricsBindAddress: 127.0.0.1:10249 change to metricsBindAddress: 0.0.0.0:10249
Manually delete the kube-proxy-{id} pod and let it reprovision


Docker-desktop runs inside a virtual machine, to be able to edit the cluster manifest files we must shell into the host using nsenter;

docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
cd /etc/kubernetes/manifests
vi kube-controller-manager.yaml


kube-controller-manager-docker-desktop

Edit kube-controller-manager.yaml
bind-address=127.0.0.1 change to bind-address=0.0.0.0
Save + exit, wait for the pod to reload automatically


kube-scheduler-docker-dekstop

Edit kube-scheduler.yaml
bind-address=127.0.0.1 change to bind-address=0.0.0.0
Save + exit, the pod should reload automatically


etcd-docker-dekstop

Edit etcd.yaml
listen-metrics-urls=http://127.0.0.1:<port> change to listen-metrics-urls=http://127.0.0.1:<port>,http://<cluster IP>:2381 - you can find the cluster IP in the other settings in that same file
Save + exit - this will make the cluster unresponsive temporarily, wait a minute or two and it should come back with the pod restarted


You should now see all targets UP in the Prom dashboard now :)

Now you can install kube-prometheus-stack helm chart to deploy all the necessary components

https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
eg: to install the components into monitoring namespace...
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install --wait \
            --values kube-prom-stack-values.yaml \
            --create-namespace \
            --namespace monitoring \
            prometheus \
            prometheus-community/kube-prometheus-stack \
            --debug \
            --timeout 5m

Small fix for the Grafana dashboard contained in kube-prom-stack-values.yaml...
# fix grafana dashboard: https://github.com/prometheus-community/helm-charts/issues/3800
grafana:
  serviceMonitor: 
    labels:
      release: prometheus

docker-desktop update script


You will need to do the steps outlined here everytime the docker-desktop cluster is reset :(
A script would be nice, huh? Yvw! :)
Disclaimer: this works for me. No claims made on robustness / portability.

#!/usr/bin/env bash

echo
echo "Updating docker-desktop pods to expose metrics endpoints"
echo "This will involve several kube-system pod restarts" 
echo

echo "Fetching debian image to run nsenter on the docker-desktop host..."
docker pull debian

NODE_IP=$(kubectl get nodes -o wide --no-headers | awk -v OFS='\t\t' '{print $6}')
echo "Host Node IP: $NODE_IP"

echo "Updating kube-proxy configmap..."
MOD_YAML="/tmp/modify.yaml"
kubectl get configmap/kube-proxy -n kube-system -o yaml > $MOD_YAML
if cat $MOD_YAML | grep -q "metricsBindAddress: 127.0.0.1:10249"; then
    sed -i '' 's/metricsBindAddress: 127.0.0.1:10249/metricsBindAddress: 0.0.0.0:10249/g' $MOD_YAML     # non-standard sed for mac
    kubectl delete configmap/kube-proxy -n kube-system
    kubectl create -f $MOD_YAML
    echo "Restarting the kube-proxy pod"
    kubectl delete pod -n kube-system -l k8s-app=kube-proxy
    if ! kubectl wait -n kube-system --timeout=3m --for=condition=Ready pod -l k8s-app=kube-proxy; then
        echo "kube-proxy pod did not restart in time, please check the pod logs."
        exit 1
    fi
    echo "kube-proxy pod restarted."
else
    echo "kube-proxy metricBindAddress already updated, skipping."
fi
rm -f $MOD_YAML

echo "Updating bind-address on kube-controller-manager..."
if kubectl describe pod kube-controller-manager-docker-desktop -n kube-system | grep -q "bind-address=127.0.0.1"; then
    docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
        sh -c "sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/g' /etc/kubernetes/manifests/kube-controller-manager.yaml"
    echo "Waiting for kube-controller-manager to restart, this can take some time..."
    kubectl wait pod -l component=kube-controller-manager -n kube-system --timeout=3m --for=delete 
    if ! kubectl wait pod -l component=kube-controller-manager -n kube-system --timeout=3m --for=condition=Ready; then
        echo "kube-controller-manager pod did not restart in time, please check the pod logs."
        exit 1
    fi
    echo "kube-controller-manager pod restarted."
else
    echo "kube-controller-manager bind-address already updated, skipping."
fi

echo "Updating bind-address on kube-scheduler"
if kubectl describe pod kube-scheduler-docker-desktop -n kube-system | grep -q "bind-address=127.0.0.1"; then
    docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
        sh -c "sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/g' /etc/kubernetes/manifests/kube-scheduler.yaml"
    echo "Waiting for kube-scheduler to restart, this can take some time..."
    kubectl wait pod -l component=kube-scheduler -n kube-system --timeout=3m --for=delete
    if ! kubectl wait pod -l component=kube-scheduler -n kube-system --timeout=3m --for=condition=Ready; then
        echo "kube-scheduler pod did not restart in time, please check the pod logs."
        exit 1
    fi
    echo "kube-scheduler pod restarted."
else
    echo "kube-scheduler bind-address already updated, skipping."
fi

echo "Adding node ip to listen-metrics-urls on etcd"
if kubectl describe pod etcd-docker-desktop -n kube-system | grep "listen-metrics-urls" | grep -q "http://${NODE_IP}:2381"; then
    echo "etcd listen-metrics-urls already updated, skipping."
else
    docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
        sh -c "sed -i 's/--listen-metrics-urls=http:\/\/127.0.0.1\:2381/--listen-metrics-urls=http:\/\/127.0.0.1\:2381,http:\/\/${NODE_IP}\:2381/g' /etc/kubernetes/manifests/etcd.yaml"
    echo "Waiting for etcd to restart, this can take some time..."
    kubectl wait pod -l component=etcd -n kube-system --timeout=3m --for=delete                         # as soon as etcd goes down this will respond with an error from the api server
    sleep 10                                                                                            # so we wait for a few seconds for the api server to reboot & then we can run kubectl commands again
    if ! kubectl wait pod -l component=etcd -n kube-system --timeout=3m --for=condition=Ready; then     # if all gone well this should respond immediately
        echo "etcd pod did not restart in time - this may just be the api server still rebooting, give it a few minutes before panicking."
    fi
fi

echo
echo "Done! You can now deploy the monitoring components."
echo