dashboard
监控项:
- 各个k8s集群所有node的状态,包括node NotReady/Unschedulabel,设置dashboard refresh every 30s,可实时看到node的状态,同时可设置grafana告警
-
node的kernel/docker/kubelet版本差异统计,并支持过滤
k8s-node-status.png
k8s-node-status2.png
TechStack
promtheus+grafana
PromQL
datasource接入prometheus,PromQL语句对应上图各项title
- NotReady
sum(kube_node_status_condition{condition="Ready",status="unknown",node=~"$node"})
- Unschedulable
sum(kube_node_spec_unschedulable{node=~"$node"})
- Kubelet Version Status
sum(kube_node_info{node=~"$node"}) by (kubelet_version)
- Docker Version Status
sum(kube_node_info{node=~"$node"}) by (container_runtime_version)
- Kernel Version Status
sum(kube_node_info{node=~"$node"}) by (kernel_version)
- K8S Node Status
(kube_node_status_condition{condition="Ready",status="unknown",node=~"$node"}) * on(node) group_right() (sum(kube_node_labels) by (label_FOO_com_role,node)*on(node) group_right(label_cloud_ctrip_com_role) sum(kube_node_info{container_runtime_version=~"$docker_version",kernel_version=~"$kernel_version",kubelet_version=~"$kubelet_version"}) by (container_runtime_version,kernel_version,kubelet_version,node))











网友评论