Prometheus-Operator结合blackbox-exporter针对k8s集群内外简单应用
前言
很久没搞Operator了,最近公司配了两台机器给个人用.
之前网站博客等等所有服务都在一台云服务商,难免有些吃不消. 有了机器之后我迅速构建一套Kubernetes集群.并把之前的监控也迁移了过来与rometheus Operator进行合并.你懂的
把一些常用的主机监控加入到Prometheus Operator,突然发现少了点什么~ 哦对 那就是黑盒监控,也就是传说中的blackbox. 这玩意很强大,可以对网络端口啥的 进行监控.
Prometheus Operator,顾名思义,是负责K8S中自动化管理Prometheus的Custom Controller。更多内容,参考coreos/prometheus-operator
那么怎样利用Prometheus Operator,在Kubernetes集群中安装部署Prometheus,并且添加Blackbox exporter组件?让我们开始把!
安装Prom Operator
参考Prometheus Operator 安装之有手即可和coreos/kube-prometheus 官网,安装Prometheus Operator。
1、kubelet配置添加参数 vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
添加:
--authentication-token-webhook=true
--authorization-mode=Webhook
2、获取源码,并切换版本(与k8s版本的对应关系可以在github仓库找到)
git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus
kubectl version
git branch -a
git checkout origin/release-0.4
3、安装Prom Operator
# Create the namespace and CRDs, and then wait for them to be availble before creating the remaining resources
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
4、查看安装
kubectl get crd | grep coreos
kubectl get pod -n monitoring
kubectl get svc -n monitoring
以上,Prometheus Operator安装完成,Prometheus也安装完成。
PS:卸载Prom Operator
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
题外话:
如果你访问不了Github,首先你并不是一个出色的互联网IT工作者
其次,我在这里为你精心准备了Prometheus Operator离线包. 你下载直接apply即可.
组件版本:0.7.0
支持Kubernetes版本:1.19.* 1.20.*
安装Blackbox exporter
1、创建yaml文件 blackbox-exporter.yaml
apiVersion: v1
data:
config.yml: |
modules:
http_2xx:
prober: http
http:
method: GET
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
http:
method: POST
preferred_ip_protocol: "ip4"
tcp:
prober: tcp
ping:
prober: icmp
timeout: 3s
icmp:
preferred_ip_protocol: "ip4"
dns_k8s:
prober: dns
timeout: 5s
dns:
transport_protocol: "tcp"
preferred_ip_protocol: "ip4"
query_name: "kubernetes.default.svc.cluster.local"
query_type: "A"
kind: ConfigMap
metadata:
name: blackbox-exporter
namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
name: blackbox-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
name: blackbox-exporter
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
spec:
containers:
- image: prom/blackbox-exporter:v0.16.0
name: blackbox-exporter
ports:
- containerPort: 9115
volumeMounts:
- name: config
mountPath: /etc/blackbox_exporter
args:
- --config.file=/etc/blackbox_exporter/config.yml
- --log.level=info
volumes:
- name: config
configMap:
name: blackbox-exporter
---
apiVersion: v1
kind: Service
metadata:
#annotations:
# service.beta.kubernetes.io/alicloud-loadbalancer-address-type: intranet
labels:
name: blackbox-exporter
cluster: ali-huabei2-dev
name: blackbox-exporter
namespace: monitoring
spec:
#externalTrafficPolicy: Local
selector:
name: blackbox-exporter
ports:
- name: http-metrics
port: 9115
targetPort: 9115
type: LoadBalancer
2、应用yaml文件
kubectl apply -f blackbox-exporter.yaml
kubectl get svc -n monitoring
kubectl get deploy -n monitoring
配置使用Blackbox exporter(错误方法)
在Prometheus中配置使用Blackbox exporter是很简单的,scrape_configs里配置相应字段即可。但是,k8s中的Prometheus配置,会有一些不同。
1、获取prometheus.yml配置
kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d > prometheus.yml
2、查看prometheus.yml配置,下面截取一段:
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
prometheus: monitoring/k8s
prometheus_replica: $(POD_NAME)
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/node-exporter/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 15s
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: node-exporter
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: https
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Node;(.*)
replacement: ${1}
target_label: node
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod;(.*)
replacement: ${1}
target_label: pod
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_service_name
target_label: job
replacement: ${1}
- source_labels:
- __meta_kubernetes_service_label_k8s_app
target_label: job
regex: (.+)
replacement: ${1}
- target_label: endpoint
replacement: https
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: instance
regex: (.*)
replacement: $1
action: replace
- source_labels:
- __meta_kubernetes_service_label_cluster
target_label: cluster
regex: (.*)
replacement: $1
action: replace
其中,job_name配置target名称,kubernetes_sd_configs配置k8s的服务发现,relabel_configs配置标签最终的显示。source_labels是样本的原标签,target_label是显示的标签;regex使用正则匹配value,replacement代表最终显示的value。$1
代表regex正则匹配到的第一个字符串。
3、添加blackbox exporter的配置
- job_name: monitoring/blackbox-exporter/0
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 15s
scheme: http
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_name
regex: blackbox-exporter
- source_labels:
- __meta_kubernetes_service_label_name
target_label: job
regex: (.+)
replacement: ${1}
- source_labels:
- __meta_kubernetes_service_label_cluster
target_label: cluster
regex: (.*)
replacement: $1
action: replace
4、应用新的配置
# 1. compress prometheus.yaml
cat prometheus.yaml | gzip -f | base64 | tr -d "\n"
# 2. copy string
# 3. edit secret
kubectl edit secrets -n monitoring prometheus-k8s
# 4. replace prometheus.yaml.gz
# 5. get the latest config
kubectl get secrets -n monitoring prometheus-k8s -oyaml | grep prometheus.yaml.gz | awk '{print $2}' | base64 --decode | gzip -d | grep blackbox
然而,配置中并没有blackbox,配置没有发生改变!证明了prometheus的配置是自动生成的,手动修改无效。如果你系统的学过Prometheus-opertoar,上面的操作,你根本做不来. 因为人家根本不是那样设计的.....让我们接下来去看正确的方法是怎么操作的吧~
配置使用Blackbox exporter(正确方法)
Prometheus Operator中配置Target,是利用ServiceMonitor进行动态发现的方式。
1、创建servicemonitor的yaml文件,blackbox-exporter-sm.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: blackbox-exporter
release: p
name: blackbox-exporter
namespace: monitoring
spec:
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
name: blackbox-exporter
endpoints:
- interval: 15s
port: http-metrics
path: /probe
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_service_label_cluster
targetLabel: cluster
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_module
targetLabel: module
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_target
targetLabel: target
params:
module:
- http_2xx
target:
- https://www.vlinux.cn
- interval: 15s
port: http-metrics
path: /probe
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_service_label_cluster
targetLabel: cluster
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_module
targetLabel: module
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __param_target
targetLabel: target
params:
module:
- dns_k8s
target:
- 172.31.16.10 # dns ip address
2、应用到k8s集群 kubectl apply -f blackbox-exporter-sm.yaml
3、等待一分钟后,进行验证 访问prometheus的graph页面,可以查看blackbox-exporter指标。
{job=~"blackbox-exporter",__name__!~"^go.*"}
查看结果表明,params的配置中,http_2xx 探测只有第一个target生效了,另外两个target根本没有探测记录。本实验证明了,target里只能填写一个域名,多了无效。 要想配置多个站点的探测,最简单的办法就是配置多个endpoint。至于N个站点配置M种探测方式,如果你知道怎么配置,欢迎留言告知,感谢~
配置告警
学过Prometheus基础的你知道配置告警需要在prometheus配置文件中指定alertmanager实例和报警的rules文件。 而通过operator部署的prometheus,怎样配置告警呢?这里需要定义PrometheusRule资源,并且具备标签 prometheus=k8s 和 role=alert-rules。 这里以配置dns服务告警为例,dns服务出问题,不能正常解析 kubernetes.default.svc.cluster.local 。
1、查看alertmanager配置
kubectl get secrets -n monitoring alertmanager-main -oyaml | grep "alertmanager.yaml" | awk '{print $2}' | base64 -d
2、创建prometheus-rule-dns.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: dns-alert-rules
namespace: monitoring
spec:
groups:
- name: DNS
rules:
- alert: DNSServerError
annotations:
summary: No summary
description: No description
webhookToken: xxxxxxxxx
expr: |
probe_success{module="dns_k8s"} == 0
for: 1m
labels:
severity: critical
alertTag: k8s
3、应用rule kubectl apply -f prometheus-rule-dns.yaml