- This can be awkward to setup - by default Prometheus will complain 4 targets are down.
- prometheus-community/helm-charts#812
- The majority of these fixes come from good work by @LanDinh who went through a similar process on windows: https://groups.google.com/g/prometheus-users/c/_aI-HySJ-xM/m/kqrL1FYVCQAJ
- The underlying cause is a problem with the way
kubeadm
sets up the default cluster: kubernetes/kubeadm#2388
- Default setup of docker-desktop does not allow Prometheus to scrape metrics from the following
kube-system
pods;kube-controller-manager-docker-desktop
etcd-docker-dekstop
kube-scheduler-docker-dekstop
kube-proxy-{id}
- Edit the configmap
kube-proxy
in thekube-system
namespacemetricsBindAddress: 127.0.0.1:10249
change tometricsBindAddress: 0.0.0.0:10249
- Manually delete the
kube-proxy-{id}
pod and let it reprovision
- Docker-desktop runs inside a virtual machine, to be able to edit the cluster manifest files we must shell into the host using nsenter;
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
cd /etc/kubernetes/manifests
vi kube-controller-manager.yaml
kube-controller-manager-docker-desktop
- Edit
kube-controller-manager.yaml
bind-address=127.0.0.1
change tobind-address=0.0.0.0
- Save + exit, wait for the pod to reload automatically
- Edit
kube-scheduler-docker-dekstop
- Edit
kube-scheduler.yaml
bind-address=127.0.0.1
change tobind-address=0.0.0.0
- Save + exit, the pod should reload automatically
- Edit
etcd-docker-dekstop
- Edit
etcd.yaml
listen-metrics-urls=http://127.0.0.1:<port>
change tolisten-metrics-urls=http://127.0.0.1:<port>,http://<cluster IP>:2381
- you can find the cluster IP in the other settings in that same file- Save + exit - this will make the cluster unresponsive temporarily, wait a minute or two and it should come back with the pod restarted
- Edit
- You should now see all targets UP in the Prom dashboard now :)
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
eg: to install the components into monitoring namespace...
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install --wait \
--values kube-prom-stack-values.yaml \
--create-namespace \
--namespace monitoring \
prometheus \
prometheus-community/kube-prometheus-stack \
--debug \
--timeout 5m
Small fix for the Grafana dashboard contained in kube-prom-stack-values.yaml
...
# fix grafana dashboard: https://github.com/prometheus-community/helm-charts/issues/3800
grafana:
serviceMonitor:
labels:
release: prometheus
- You will need to do the steps outlined here everytime the docker-desktop cluster is reset :(
- A script would be nice, huh? Yvw! :)
- Disclaimer: this works for me. No claims made on robustness / portability.
#!/usr/bin/env bash
echo
echo "Updating docker-desktop pods to expose metrics endpoints"
echo "This will involve several kube-system pod restarts"
echo
echo "Fetching debian image to run nsenter on the docker-desktop host..."
docker pull debian
NODE_IP=$(kubectl get nodes -o wide --no-headers | awk -v OFS='\t\t' '{print $6}')
echo "Host Node IP: $NODE_IP"
echo "Updating kube-proxy configmap..."
MOD_YAML="/tmp/modify.yaml"
kubectl get configmap/kube-proxy -n kube-system -o yaml > $MOD_YAML
if cat $MOD_YAML | grep -q "metricsBindAddress: 127.0.0.1:10249"; then
sed -i '' 's/metricsBindAddress: 127.0.0.1:10249/metricsBindAddress: 0.0.0.0:10249/g' $MOD_YAML # non-standard sed for mac
kubectl delete configmap/kube-proxy -n kube-system
kubectl create -f $MOD_YAML
echo "Restarting the kube-proxy pod"
kubectl delete pod -n kube-system -l k8s-app=kube-proxy
if ! kubectl wait -n kube-system --timeout=3m --for=condition=Ready pod -l k8s-app=kube-proxy; then
echo "kube-proxy pod did not restart in time, please check the pod logs."
exit 1
fi
echo "kube-proxy pod restarted."
else
echo "kube-proxy metricBindAddress already updated, skipping."
fi
rm -f $MOD_YAML
echo "Updating bind-address on kube-controller-manager..."
if kubectl describe pod kube-controller-manager-docker-desktop -n kube-system | grep -q "bind-address=127.0.0.1"; then
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
sh -c "sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/g' /etc/kubernetes/manifests/kube-controller-manager.yaml"
echo "Waiting for kube-controller-manager to restart, this can take some time..."
kubectl wait pod -l component=kube-controller-manager -n kube-system --timeout=3m --for=delete
if ! kubectl wait pod -l component=kube-controller-manager -n kube-system --timeout=3m --for=condition=Ready; then
echo "kube-controller-manager pod did not restart in time, please check the pod logs."
exit 1
fi
echo "kube-controller-manager pod restarted."
else
echo "kube-controller-manager bind-address already updated, skipping."
fi
echo "Updating bind-address on kube-scheduler"
if kubectl describe pod kube-scheduler-docker-desktop -n kube-system | grep -q "bind-address=127.0.0.1"; then
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
sh -c "sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/g' /etc/kubernetes/manifests/kube-scheduler.yaml"
echo "Waiting for kube-scheduler to restart, this can take some time..."
kubectl wait pod -l component=kube-scheduler -n kube-system --timeout=3m --for=delete
if ! kubectl wait pod -l component=kube-scheduler -n kube-system --timeout=3m --for=condition=Ready; then
echo "kube-scheduler pod did not restart in time, please check the pod logs."
exit 1
fi
echo "kube-scheduler pod restarted."
else
echo "kube-scheduler bind-address already updated, skipping."
fi
echo "Adding node ip to listen-metrics-urls on etcd"
if kubectl describe pod etcd-docker-desktop -n kube-system | grep "listen-metrics-urls" | grep -q "http://${NODE_IP}:2381"; then
echo "etcd listen-metrics-urls already updated, skipping."
else
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i \
sh -c "sed -i 's/--listen-metrics-urls=http:\/\/127.0.0.1\:2381/--listen-metrics-urls=http:\/\/127.0.0.1\:2381,http:\/\/${NODE_IP}\:2381/g' /etc/kubernetes/manifests/etcd.yaml"
echo "Waiting for etcd to restart, this can take some time..."
kubectl wait pod -l component=etcd -n kube-system --timeout=3m --for=delete # as soon as etcd goes down this will respond with an error from the api server
sleep 10 # so we wait for a few seconds for the api server to reboot & then we can run kubectl commands again
if ! kubectl wait pod -l component=etcd -n kube-system --timeout=3m --for=condition=Ready; then # if all gone well this should respond immediately
echo "etcd pod did not restart in time - this may just be the api server still rebooting, give it a few minutes before panicking."
fi
fi
echo
echo "Done! You can now deploy the monitoring components."
echo