k3s历险记(二):监控系统 – 知乎

k3s历险记(二):监控系统 – 知乎

k3s历险记(二):监控系统

前面我们把k3s跑起来了
并尝试在里面跑了一个app
但想要把生产环境里的项目挪进去
监控报警系统是必不可少的
这次就来搭一套简单的监控系统吧

介绍一下Promtheus

Prometheus是一套开源的监控、报警、时间序列数据库的组合.

自从2016年Prometheus加入CNCF promtheus已经慢慢成为容器平台内监控系统的事实标准了。

这里推荐一本介绍promtheus的书,有兴趣的小伙伴可以去研究一下

Introductionyunlzheng.gitbook.io图标

基本组件选择

其实无论是早期的kube-promtheus

还是现在被很多人推崇的promtheus-operator 都可以做到一键安装,开箱即用。

但由于我们是跑在配置很低的机器上,用那么重的组件就显得很不划算了

我一开始考虑用过promtheus-operator ,实际测试下来
我的1c/1g的小机器根本跑不动…

所以最后我选择手动配置这些基本组件:

  • promtheus(metrics的收集)
  • node-exporter(cpu/内存等硬件指标的收集)
  • kube-state-metrics(k3s内部指标的收集,比如pod/job等)
  • grafana(数据的展示)

也就是说,只需要起4个pod就能达到最基本的要求了

而这些加起了也只占用了小几百兆的内存:

开始安装

安装这些首先得准备一台配置(1c/1g)以上的vps
并且安装好k3s,没安装的小伙伴可以参考我的上一篇文章

Ehco:k3s历险记(一):先跑起来zhuanlan.zhihu.com图标

先创建一个namespace来单独跑这些组件:

kubectl create ns guardian

配置promtheus:

需要准备四份配置文件

  • prometheus-cfg.yaml (prometheus的配置文件)
  • prometheus-deploy.yaml (depoly配置)
  • prometheus-rabc.yaml (rabc的配置)
  • prometheus-svc.yaml (对应的service配置)

promtheus-cfg :

---  kind: ConfigMap  apiVersion: v1  metadata:    labels:      app: prometheus    name: prometheus-config    namespace: guardian  data:    prometheus.yml: /|/      global:        scrape_interval: 15s        scrape_timeout: 10s        evaluation_interval: 1m      scrape_configs:      - job_name: 'kubernetes-kubelet'        scheme: https        tls_config:          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt          insecure_skip_verify: /true/        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        kubernetes_sd_configs:        - role: node        relabel_configs:        - action: labelmap          regex: __meta_kubernetes_node_label_(.+)        - target_label: __address__          replacement: kubernetes.default.svc.cluster.local:443        - source_labels: [__meta_kubernetes_node_name]          regex: (.+)          target_label: __metrics_path__          replacement: /api/v1/nodes/${1}/proxy/metrics      - job_name: 'kubernetes-cadvisor'        scheme: https        tls_config:          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt          insecure_skip_verify: /true/        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token        kubernetes_sd_configs:        - role: node        relabel_configs:        - action: labelmap          regex: __meta_kubernetes_node_label_(.+)        - target_label: __address__          replacement: kubernetes.default.svc.cluster.local:443        - source_labels: [__meta_kubernetes_node_name]          regex: (.+)          target_label: __metrics_path__          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor      - job_name: 'kubernetes-kube-state'        kubernetes_sd_configs:        - role: pod        relabel_configs:        - action: labelmap          regex: __meta_kubernetes_pod_label_(.+)        - source_labels: [__meta_kubernetes_namespace]          action: replace          target_label: kubernetes_namespace        - source_labels: [__meta_kubernetes_pod_name]          action: replace          target_label: kubernetes_pod_name        - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]          regex: .*true.*          action: keep        - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']          regex: 'node-exporter;(.*)'          action: replace          target_label: nodename

prometheus-deploy.:

apiVersion: extensions/v1beta1  kind: Deployment  metadata:    labels:      name: prometheus    name: prometheus    namespace: guardian  spec:    replicas: 1    template:     metadata:      labels:        app: prometheus-server     spec:      serviceAccountName: prometheus      containers:      - name: prometheus        image: prom/prometheus:v2.1.0        imagePullPolicy: Always        ports:          - containerPort: 9090            protocol: TCP        volumeMounts:          - mountPath: "/etc/prometheus"            name: config-prometheus      volumes:      - name: config-prometheus        configMap:         name: prometheus-config

promtheus-rabc:

---  apiVersion: rbac.authorization.k8s.io/v1beta1  kind: ClusterRole  metadata:    name: prometheus  rules:  - apiGroups: [""]    resources:    - nodes    - nodes/proxy    - services    - endpoints    - pods    verbs: ["get", "list", "watch"]  - apiGroups:    - extensions    resources:    - ingresses    verbs: ["get", "list", "watch"]  - nonResourceURLs: ["/metrics"]    verbs: ["get"]  ---  apiVersion: v1  kind: ServiceAccount  metadata:    name: prometheus    namespace: guardian  ---  apiVersion: rbac.authorization.k8s.io/v1beta1  kind: ClusterRoleBinding  metadata:    name: prometheus  roleRef:    apiGroup: rbac.authorization.k8s.io    kind: ClusterRole    name: prometheus  subjects:  - kind: ServiceAccount    name: prometheus    namespace: guardian

promtheus-svc :

apiVersion: v1  kind: Service  metadata:    name: prometheus    namespace: guardian  spec:    type: ClusterIP    ports:    - port: 9090      targetPort: 9090    selector:      app: prometheus-server

配置node-exporter

由于node—exporter是针对机器级别的监控

所以我们将它部署为k8s里的DaemonSet

这样就可以做到每一台node都会自动部署一个node—exporter的pod了

需要准备两份配置文件

  • node-exporter-ds.yaml
  • node-exporter-svc.yaml

node-exporter-ds:

kind: DaemonSet  apiVersion: extensions/v1beta1  metadata:    name: node-exporter    namespace: guardian  spec:    selector:      matchLabels:        daemon: node-exporter        grafanak8sapp: "true"    template:      metadata:        name: node-exporter        labels:          daemon: node-exporter          grafanak8sapp: "true"      spec:        volumes:        - name: proc          hostPath:            path: /proc        - name: sys          hostPath:            path: /sys        containers:        - name: node-exporter          image: quay.io/prometheus/node-exporter:v0.15.0          args:            - --path.procfs=/proc_host            - --path.sysfs=/host_sys          ports:            - name: node-exporter              hostPort: 9100              containerPort: 9100          volumeMounts:            - name: sys              readOnly: /true/              mountPath: /host_sys            - name: proc              readOnly: /true/              mountPath: /proc_host          imagePullPolicy: IfNotPresent        restartPolicy: Always        hostNetwork: /true/        hostPID: /true/

node-exporter-svc:

apiVersion: v1  kind: Service  metadata:    annotations:      prometheus.io/scrape: 'true'    name: prometheus-node-exporter    namespace: guardian    labels:      app: prometheus      component: node-exporter  spec:    clusterIP: None    ports:      - name: prometheus-node-exporter        port: 9100        protocol: TCP    selector:      app: prometheus      component: node-exporter    type: ClusterIP

配置kube-state-metrics

kube-state-metrics本质上是通过不断轮询k8s的api-server

来对k8s内部资源进行监控 同样也需要两个配置文件

  • kube-state-metrics-deploy.yaml
  • kube-state-metrics-rabc.yaml

kube-state-metrics-deploy:

apiVersion: apps/v1beta1  kind: Deployment  metadata:    name: kube-state-metrics    namespace: guardian  spec:    selector:      matchLabels:        k8s-app: kube-state-metrics        grafanak8sapp: "true"    replicas: 1    template:      metadata:        labels:          k8s-app: kube-state-metrics          grafanak8sapp: "true"      spec:        serviceAccountName: kube-state-metrics        containers:        - name: kube-state-metrics          image: quay.io/coreos/kube-state-metrics:v1.1.0          ports:          - name: http-metrics            containerPort: 8080          readinessProbe:            httpGet:              path: /healthz              port: 8080            initialDelaySeconds: 5            timeoutSeconds: 5

kube-state-metrics-rabc:

apiVersion: v1  kind: ServiceAccount  metadata:    name: kube-state-metrics    namespace: guardian  ---    apiVersion: rbac.authorization.k8s.io/v1  kind: Role  metadata:    namespace: guardian    name: kube-state-metrics  rules:  - apiGroups: [""]    resources:    - pods    verbs: ["get"]  - apiGroups: ["extensions"]    resources:    - deployments    resourceNames: ["kube-state-metrics"]    verbs: ["get", "update"]  ---    apiVersion: rbac.authorization.k8s.io/v1  kind: RoleBinding  metadata:    name: kube-state-metrics    namespace: guardian  roleRef:    apiGroup: rbac.authorization.k8s.io    kind: Role    name: kube-state-metrics  subjects:  - kind: ServiceAccount    name: kube-state-metrics    namespace: guardian  ---    apiVersion: rbac.authorization.k8s.io/v1  kind: ClusterRole  metadata:    name: kube-state-metrics    namespace: guardian  rules:  - apiGroups: [""]    resources:    - nodes    - pods    - services    - resourcequotas    - replicationcontrollers    - limitranges    - persistentvolumeclaims    - persistentvolumes    - namespaces    - endpoints    verbs: ["list", "watch"]  - apiGroups: ["extensions"]    resources:    - daemonsets    - deployments    - replicasets    verbs: ["list", "watch"]  - apiGroups: ["apps"]    resources:    - statefulsets    verbs: ["list", "watch"]  - apiGroups: ["batch"]    resources:    - cronjobs    - jobs    verbs: ["list", "watch"]  - apiGroups: ["autoscaling"]    resources:    - horizontalpodautoscalers    verbs: ["list", "watch"]  - apiGroups: ["policy"]    resources:    - poddisruptionbudgets    verbs: ["list", "watch"]  ---    apiVersion: rbac.authorization.k8s.io/v1  kind: ClusterRoleBinding  metadata:    name: kube-state-metrics    namespace: guardian  roleRef:    apiGroup: rbac.authorization.k8s.io    kind: ClusterRole    name: kube-state-metrics  subjects:  - kind: ServiceAccount    name: kube-state-metrics    namespace: guardian

配置grafana

这是我们的老朋友了 只需要一个配置文件就可以搞定了

  • grafana-depoly.yaml

需要注意的是配置文件里有两个环境变量: GF_SECURITY_ADMIN_PASSWORD (默认admin用户的密码) GF_INSTALL_PLUGINS:grafana-kubernetes-app(要用的插件)

grafana-kubernetes-app 是grafana官方做的一个插件 可以将k8s本身作为一个数据源(datasource)来获取需要的数据 这个项目的地址在这里: https://github.com/grafana/kubernetes-app

grafana-depoly:

apiVersion: extensions/v1beta1  kind: Deployment  metadata:    name: grafana    namespace: guardian  spec:    replicas: 1    template:      metadata:        labels:          k8s-app: grafana      spec:        containers:        - name: grafana          image: grafana/grafana          ports:          - containerPort: 3000            protocol: TCP          volumeMounts:          - mountPath: /var/lib/grafana            name: grafana-storage          env:          - name: GF_SERVER_HTTP_PORT            value: "3000"          - name: GF_SECURITY_ADMIN_PASSWORD            value: "helloworld"          - name: GF_INSTALL_PLUGINS            value: "grafana-kubernetes-app"        volumes:        - name: grafana-storage          emptyDir: {}

安装所有组件

终于把所有的YAML配置文件搞出来了 好辛苦啊!说起来我看到过这样一个笑话:

把所有的配置文件放到guardian文件夹下

接着执行 kubectl apply -f guardian/
就可以看到一堆资源被起起来了:

配置&访问grafana

由于我们没有将grafana作为一个服务暴露出来

想要访问到运行在k8s里的pod我么可以用官方的port-forward工具:

grafana默认是跑在3000端口
下面的命令是将本地的3000端口代理到pod的3000端口

kubectl port-forward grafana-7bf8d59769-v4ftq 3000:3000

接着我们访问本地127.0..0.1:3000就能进入了

输入用户名(admin)和密码(刚才配置好的helloworld)

将集群内部的promtheus添加进数据源:

配置grafana-kubernetes-app

勾上Basic AuthWith CA Cert

用户名密码就是你配置在本地~/.kube/config的用户名和密码

根证书在 k3s master node/var/lib/rancher/k3s/server/tls/server-ca.crt

到这里所有的组件都安装和配置完毕了!

看一下效果

Cluster本身的监控:

Node的监控:

容器级别的监控:

如果有小伙伴不想上k8s/k3s这一套工具,但又想搞一套监控系统的话
可以参考一下我这篇文章:

Ehco:吃灰的服务器能拿来干什么?搭一套炫酷的监控系统?zhuanlan.zhihu.com图标