kubernetes helm을 통해 prometheus chart 및 grafana 연동 (1/2)
- -
쿠버네티스 클러스터 리소스 모니터링을 위한 Grafana는 보통 Prometheus와 같이 간다.
설치방법이 다양한데, 제일 편한 건 helm chart를 통해 설치하는 것.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update \
&& helm upgrade --install kube-stack-prometheus prometheus-community/kube-prometheus-stack \
-n prom-monitoring --create-namespace \
--set prometheus-node-exporter.hostRootFsMount.enabled=false \
--no-hooks
이 방식으로 설치를 진행하면, prometheus 및 grafana가 같이 설치된다.
helm upgrade --install, --no-hooks는 개인적으로 중요하다고 생각하는데,
설치중에 오류가 좀 자주 뜨더라.
Error: failed pre-install: warning: Hook pre-install kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml failed: an error on the server ("error trying to reach service: tunnel disconnect") has prevented the request from succeeding (post jobs.batch)
그래서 내 커맨드라인 툴에서는 에러가 발생하면 자동적으로 설치를 재시작하는데, helm install에서 create-namespace를 같이 수행하다 보니, 중복 namespace 때문에 install error --> upgrade로 변경, 그리고 no-hooks로 변경한 것이다.
무튼 설치는 이렇게 된다
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "longhorn" chart repository
...Successfully got an update from the "traefik" chart repository
...Successfully got an update from the "rook-release" chart repository
...Successfully got an update from the "ingress-nginx" chart repository
...Successfully got an update from the "jetstack" chart repository
...Successfully got an update from the "rancher-stable" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Release "kube-stack-prometheus" has been upgraded. Happy Helming!
NAME: kube-stack-prometheus
LAST DEPLOYED: Thu May 19 10:51:21 2022
NAMESPACE: prom-monitoring
STATUS: deployed
REVISION: 2
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace prom-monitoring get pods -l "release=kube-stack-prometheus"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
---
번외로 삭제의 경우 요렇게 수행했다.
helm uninstall kube-stack-prometheus -n prom-monitoring
kubectl delete namespace prom-monitoring
helm repo remove prometheus-community
설치/삭제가 helm chart 치고는 굉장히 오래 걸리는 편인데, 클러스터 CRD를 모두 모니터링하기 때문인 것 같다.
kubectl get crd -n prom-monitoring
NAME CREATED AT
alertmanagerconfigs.monitoring.coreos.com 2022-05-19T01:25:37Z
alertmanagers.monitoring.coreos.com 2022-05-16T05:39:46Z
apps.catalog.cattle.io 2022-05-16T05:39:53Z
backingimagedatasources.longhorn.io 2022-05-16T09:34:32Z
backingimagemanagers.longhorn.io 2022-05-16T09:34:33Z
backingimages.longhorn.io 2022-05-16T09:34:33Z
backups.longhorn.io 2022-05-16T09:34:33Z
backuptargets.longhorn.io 2022-05-16T09:34:33Z
backupvolumes.longhorn.io 2022-05-16T09:34:33Z
bgpconfigurations.crd.projectcalico.org 2022-05-16T05:36:26Z
bgppeers.crd.projectcalico.org 2022-05-16T05:36:26Z
blockaffinities.crd.projectcalico.org 2022-05-16T05:36:26Z
clusterauthtokens.cluster.cattle.io 2022-05-16T05:36:47Z
clusterinformations.crd.projectcalico.org 2022-05-16T05:36:26Z
clusterrepos.catalog.cattle.io 2022-05-16T05:39:53Z
clusters.management.cattle.io 2022-05-16T05:39:53Z
clusteruserattributes.cluster.cattle.io 2022-05-16T05:36:47Z
engineimages.longhorn.io 2022-05-16T09:34:33Z
engines.longhorn.io 2022-05-16T09:34:33Z
features.management.cattle.io 2022-05-16T05:39:53Z
felixconfigurations.crd.projectcalico.org 2022-05-16T05:36:26Z
globalnetworkpolicies.crd.projectcalico.org 2022-05-16T05:36:26Z
globalnetworksets.crd.projectcalico.org 2022-05-16T05:36:26Z
hostendpoints.crd.projectcalico.org 2022-05-16T05:36:26Z
instancemanagers.longhorn.io 2022-05-16T09:34:33Z
ipamblocks.crd.projectcalico.org 2022-05-16T05:36:26Z
ipamconfigs.crd.projectcalico.org 2022-05-16T05:36:26Z
ipamhandles.crd.projectcalico.org 2022-05-16T05:36:26Z
ippools.crd.projectcalico.org 2022-05-16T05:36:26Z
kubecontrollersconfigurations.crd.projectcalico.org 2022-05-16T05:36:26Z
networkpolicies.crd.projectcalico.org 2022-05-16T05:36:26Z
networksets.crd.projectcalico.org 2022-05-16T05:36:26Z
nodes.longhorn.io 2022-05-16T09:34:33Z
operations.catalog.cattle.io 2022-05-16T05:39:53Z
podmonitors.monitoring.coreos.com 2022-05-19T01:25:37Z
preferences.management.cattle.io 2022-05-16T05:39:53Z
probes.monitoring.coreos.com 2022-05-19T01:25:37Z
prometheuses.monitoring.coreos.com 2022-05-16T05:39:46Z
prometheusrules.monitoring.coreos.com 2022-05-16T05:39:46Z
recurringjobs.longhorn.io 2022-05-16T09:34:33Z
replicas.longhorn.io 2022-05-16T09:34:33Z
servicemonitors.monitoring.coreos.com 2022-05-16T05:39:46Z
settings.longhorn.io 2022-05-16T09:34:33Z
settings.management.cattle.io 2022-05-16T05:39:53Z
sharemanagers.longhorn.io 2022-05-16T09:34:33Z
thanosrulers.monitoring.coreos.com 2022-05-19T01:25:38Z
volumes.longhorn.io 2022-05-16T09:34:32Z
요건 내 클러스터 한정이지만, 클러스터 내 리소스가 많아질수록 설치/삭제가 느려지는 듯 하다.
---
ㅋㅋㅋㅋㅋㅋㅋ글쓰면서 하는 중에 실시간 난리가 났다.
문제는, helm chart 삭제 시도 중에, prom-monitoring namespace를 날렸는데도 불구하고 CRD -n prom-monitoring이 위 리스트처럼 계속 뜨는 거다.
그래서 해버렸지 뭐야.
kubectl delete crd --all -n prom-monitoring
warning: deleting cluster-scoped resources, not scoped to the provided namespace
customresourcedefinition.apiextensions.k8s.io "alertmanagerconfigs.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "apps.catalog.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "backingimagedatasources.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "backingimagemanagers.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "backingimages.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "backups.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "backuptargets.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "backupvolumes.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "bgpconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "bgppeers.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "blockaffinities.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "clusterauthtokens.cluster.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "clusterinformations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "clusterrepos.catalog.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "clusters.management.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "clusteruserattributes.cluster.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "engineimages.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "engines.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "features.management.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "felixconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "globalnetworkpolicies.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "globalnetworksets.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "hostendpoints.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "instancemanagers.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "ipamblocks.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipamconfigs.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ipamhandles.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "ippools.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "kubecontrollersconfigurations.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "networkpolicies.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "networksets.crd.projectcalico.org" deleted
customresourcedefinition.apiextensions.k8s.io "nodes.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "operations.catalog.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "podmonitors.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "preferences.management.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "probes.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "recurringjobs.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "replicas.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "settings.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "settings.management.cattle.io" deleted
customresourcedefinition.apiextensions.k8s.io "sharemanagers.longhorn.io" deleted
customresourcedefinition.apiextensions.k8s.io "thanosrulers.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "volumes.longhorn.io" deleted
ㅋㅋㅋㅋㅋ
뭔가 쎄했다.
결과는?
멀쩡하던 longhorn 대폭파 ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ
클러스터 다 다시 올려야 한다 ^^.... 테스트 클러스터 파서 그라파나 연습해볼걸....... 좀 슬프다
일단 다시 올리고 오겠다...
'Cloud Engineering Log' 카테고리의 다른 글
Rook-Ceph Teardown 정석으로 수행하기 (0) | 2022.05.26 |
---|---|
rook-ceph trouble shooting: OSD가 생성되지 않아요 (RKE) (0) | 2022.05.24 |
Kubernetes Pod의 Root Directory("/")에서 작업이 필요할 때 (0) | 2022.05.18 |
<삽질금지 한입지식> The PersistentVolumeClaim "<SOMETHING>" is invalid: spec.resources.requests.storage: Forbidden: field can not be less than previous value (0) | 2022.05.16 |
Kubernetes Secret Object 이용 시 주의사항 (0) | 2022.05.10 |
당신이 좋아할만한 콘텐츠
-
Rook-Ceph Teardown 정석으로 수행하기 2022.05.26
-
rook-ceph trouble shooting: OSD가 생성되지 않아요 (RKE) 2022.05.24
-
Kubernetes Pod의 Root Directory("/")에서 작업이 필요할 때 2022.05.18
-
<삽질금지 한입지식> The PersistentVolumeClaim "<SOMETHING>" is invalid: spec.resources.requests.storage: Forbidden: field can not be less than previous value 2022.05.16
소중한 공감 감사합니다