1 介绍
prometheus是由谷歌研发的一款开源的监控软件,它通过安装在远程机器上的exporter,通过HTTP协议从远程的机器收集数据并存储在本地的时序数据库上
同时Prometheus后端用 golang语言开发,前端是 Grafana
Prometheus为了支持各种中间件以及第三方的监控提供了exporter,大家可以把它理解成监控适配器,将不同指标类型和格式的数据统一转化为Prometheus能够识别的指标类型。
例如Node exporter主要通过读取Linux的/proc以及/sys目录下的系统文件获取操作系统运行状态,reids exporter通过Reids命令行获取指标,mysql exporter通过读取数据库监控表获取MySQL的性能数据。他们将这些异构的数据转化为标准的Prometheus格式,并提供HTTP查询接口。
2 部署
2.1 二进制部署prometheus server
2.1 下载软件包
wget https://github.com/prometheus/prometheus/releases/download/v2.53.4/prometheus-2.53.4.linux-amd64.tar.gz
svip:
[root@prometheus-server31 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/Prometheus_Server/prometheus-2.53.4.linux-amd64.tar.gz
2.2 解压软件包
[root@prometheus-server31 ~]# tar xf prometheus-2.53.4.linux-amd64.tar.gz -C /usr/local/
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# ll /usr/local/prometheus-2.53.4.linux-amd64/
total 261324
drwxr-xr-x 4 1001 fwupd-refresh 4096 Mar 18 23:08 ./
drwxr-xr-x 11 root root 4096 May 12 08:55 ../
drwxr-xr-x 2 1001 fwupd-refresh 4096 Mar 18 23:05 console_libraries/
drwxr-xr-x 2 1001 fwupd-refresh 4096 Mar 18 23:05 consoles/
-rw-r--r-- 1 1001 fwupd-refresh 11357 Mar 18 23:05 LICENSE
-rw-r--r-- 1 1001 fwupd-refresh 3773 Mar 18 23:05 NOTICE
-rwxr-xr-x 1 1001 fwupd-refresh 137836884 Mar 18 22:52 prometheus*
-rw-r--r-- 1 1001 fwupd-refresh 934 Mar 18 23:05 prometheus.yml
-rwxr-xr-x 1 1001 fwupd-refresh 129719117 Mar 18 22:52 promtool*
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]#
2.3 运行Prometheus
[root@prometheus-server31 ~]# cd /usr/local/prometheus-2.53.4.linux-amd64/
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./prometheus
2.4. 访问Prometheus的webUi
http://10.168.10.31:9090
2.5 卸载服务
[root@prometheus-server31 ~]# rm -rf /usr/local/prometheus-2.53.4.linux-amd64/
[root@prometheus-server31 ~]#
- Prometheus-server一键部署脚本
1.下载安装脚本
[root@prometheus-server31 ~]# wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-prometheus-server-v2.53.4.tar.gz
2.解压软件包
[root@prometheus-server31 ~]# tar xf cmy-install-prometheus-server-v2.53.4.tar.gz
3.安装Prometheus
[root@prometheus-server31 ~]# ./install-prometheus-server.sh i
4.查看webUi
http://10.168.10.31:9090/targets?search=
5.卸载服务
[root@prometheus-server31 ~]# ./install-prometheus-server.sh r
2.2 二进制部署node exportor
二进制部署node-exporter环境
1.下载软件包
wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz
SVIP:
wget http://192.168.17.253/Resources/Prometheus/softwares/node_exporter/node_exporter-1.9.1.linux-amd64.tar.gz
2.解压软件包
tar xf node_exporter-1.9.1.linux-amd64.tar.gz -C /usr/local/
3.运行node-exporter
cd /usr/local/node_exporter-1.9.1.linux-amd64/
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]#
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]# ll
total 21708
drwxr-xr-x 2 1001 1002 4096 Apr 1 23:23 ./
drwxr-xr-x 11 root root 4096 May 12 09:13 ../
-rw-r--r-- 1 1001 1002 11357 Apr 1 23:23 LICENSE
-rwxr-xr-x 1 1001 1002 22204245 Apr 1 23:19 node_exporter*
-rw-r--r-- 1 1001 1002 463 Apr 1 23:23 NOTICE
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]#
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]# ./node_exporter
4.访问node-exporter的WebUI
http://10.168.10.41:9100/metrics
5.卸载二进制部署的node-exporter
rm -rf /usr/local/node_exporter-1.9.1.linux-amd64/
- node-exporter一键部署脚本
1.下载脚本
wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-node-exporter-v1.9.1.tar.gz
2.解压软件包
tar xf cmy-install-node-exporter-v1.9.1.tar.gz
3.安装node-exporter
./install-node-exporter.sh i
4.访问WebUI
http://10.168.10.41:9100/metrics
2.3 grafana部署
- Grafna环境安装
参考链接:
https://grafana.com/grafana/download/9.5.21
1.安装grafna的依赖包
[root@prometheus-server31 ~]# apt-get install -y adduser libfontconfig1 musl
2.安装grafana
15 wget http://192.168.17.253/Resources/Prometheus/softwares/Grafana/grafana-enterprise_9.5.21_amd64.deb
16 dpkg -i grafana-enterprise_9.5.21_amd64.deb
17 apt install -y libfontconfig1 musl
18 apt update
19 sudo apt install -y libfontconfig1 musl
20 sudo apt --fix-broken install
21 sudo apt install -y libfontconfig1 musl
22 apt install -y libfontconfig1 musl
23 dpkg -i grafana-enterprise_9.5.21_amd64.deb
[root@prometheus-server31 ~]# dpkg -i grafana-enterprise_9.5.21_amd64.deb
4.启动grafana
[root@prometheus-server31 ~]# systemctl enable --now grafana-server
[root@prometheus-server31 ~]# ss -ntl | grep 3000
LISTEN 0 4096 *:3000 *:*
[root@prometheus-server31 ~]#
5.访问Grafana的webUI
http://10.168.10.31:3000/?
默认的用户名和密码均为: admin
首次登录需要修改密码,可跳过。
- grafana配置Prometheus数据源
1.grafan添加Prometheus数据源
略,见视频 。
2.新建Dashboard目录
略,见视频
3.导入第三方Dashboard的ID
1860
4.查看Dashboard
3 prometheus数据类型
- Prometheus的数据类型
1.gauge
gauge数据类型表示当前的值,是一种所见即所得的情况。
如上图所示,使用"node_boot_time_seconds"指标查看节点的启动时间,表示的是当前值。
如下图所示,使用"go_info"指标查看go的版本信息,其返回值意义不大,这个时候标签的KEY和VALUE就能获取到我们想要的信息。
2.counter
counter数据类型表示一个指标单调递增的计数器。
一般可以结合rate查看QPS,比如: rate(prometheus_http_requests_total[1m])
也可以结合increase查看增量,比如: increase(prometheus_http_requests_total[1m])
查询平均访问时间:
prometheus_http_request_duration_seconds_sum / prometheus_http_request_duration_seconds_count
3.histogram
histogram数据类型表示直方图样本观测,通常用于查询"所有观察值的总和","请求持续时间","响应时间"等场景。
上一个案例中,我们可以使用"prometheus_http_request_duration_seconds_sum / prometheus_http_request_duration_seconds_count"查询平均访问时间。
但这种统计方式比较粗糙,用"请求的响应时间/请求的次数",算的是平均响应时间,并不能反应在某个时间段内是否有故障,比如在"12:30~12:35"之间出现大面积服务无法响应,其他时间段都是正常提供服务的,最终使用上面的公式算出来的是没有延迟的,因为5分钟的微小延迟在24小时内平均下来的话可能就可以忽略了,从而运维人员就无法及时发现问题并处理,这对于用户体验是比较差的。
因此Prometheus可以使用histogram数据类型可以采用分位值的方式随机采样短时间范围内的数据,从而及时发现问题,这需要配合histogram_quantile函数来使用。
举个例子: HTTP请求的延迟柱状图(下面的"0.95"表示的是分位值,你可以根据需求自行修改即可。)
histogram_quantile(0.95,sum(rate(prometheus_http_request_duration_seconds_bucket[1m])) by (le))
histogram_quantile(0.95,sum(rate(prometheus_http_request_duration_seconds_bucket{handler="/api/v1/query"}[5m])) by (le))
输出格式请参考:
https://www.cnblogs.com/cmy/p/18522782#二-histogram数据说明
4.summary
相比于histogram需要结合histogram_quantile函数进行实时计算结果,summary数据类型的数据是分值值的一个结果。
输出格式请参考:
https://www.cnblogs.com/cmy/p/18522782#三-summary数据说明
4 prometheus常见操作
4.1 常见的操作符
- prometheus的PromQL初体验之常见的操作符
1.精确匹配
node_cpu_seconds_total{instance="10.168.10.42:9100",cpu="1"}
2.基于正则匹配
node_cpu_seconds_total{instance="10.168.10.42:9100",cpu="1",mode=~"i.*"}
3.取反操作
node_cpu_seconds_total{instance="10.168.10.42:9100",cpu!="1",mode=~"i.*"}
4.可以做算数运算
100/5
10+20
参考链接:
https://prometheus.io/docs/prometheus/latest/querying/operators/
4.2 常见的函数
- prometheus的PromQL初体验之常见的函数
1.压力测试42节点
[root@node-exporter42 ~]# apt -y install stress
[root@node-exporter42 ~]# stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M --timeout 20m
2.计算CPU的使用率
(1 - sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu_seconds_total[1m])) by (instance)) * 100
3.每个节点的启动时间
(time() - node_boot_time_seconds) / 60
参考链接:
https://prometheus.io/docs/prometheus/latest/querying/functions/
参考案例:
https://www.cnblogs.com/cmy/p/18799074
Prometheus的webUi使用的两个痛点
- 1.临时性,查询数据是临时的,关闭页面重新打开后并不会保存,该页面主要是用来做临时调试的;
- 2.需要需要PromQL语法,新手来说比较痛苦,阅读官方文档,需要有一定的学习能力,还要求你有操作系统的基本功
5 监控各种中间件 *****
- Prometheus监控服务的流程
- 1.被监控端需要暴露metrics指标;
- 2.prometheus server端需要配置要监控的目标(服务发现);
- 3.热加载配置文件;
- 4.检查Prometheus的WebUI验证配置是否生效;
- 5.grafana导入模板ID;
- 6.grafana的Dashboard出图展示;
- 7.配置相应的告警规则;
5.1 监控linux主机
1.被监控的Linux主机安装node-exporter
[root@node-exporter42 ~]# wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-node-exporter-v1.9.1.tar.gz
[root@node-exporter43 ~]# wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-node-exporter-v1.9.1.tar.gz
2.解压软件包
[root@node-exporter42 ~]# tar xf cmy-install-node-exporter-v1.9.1.tar.gz
[root@node-exporter43 ~]# tar xf cmy-install-node-exporter-v1.9.1.tar.gz
3.安装node-exporter
[root@node-exporter42 ~]# ./install-node-exporter.sh i
[root@node-exporter43 ~]# ./install-node-exporter.sh i
4.安装Prometheus server
[root@prometheus-server31 ~]# ./install-prometheus-server.sh i
5.修改Prometheus server的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
global:
scrape_interval: 3s
scrape_configs:
...
- job_name: "cmy-node-exporter"
static_configs:
- targets:
- 10.168.10.41:9100
- 10.168.10.42:9100
- 10.168.10.43:9100
6.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
7.检查Prometheus的WebUI
http://10.168.10.31:9090/targets?search=
5.2 监控windows
1.1 下载安装的软件包
https://github.com/prometheus-community/windows_exporter/releases/download/v0.30.6/windows_exporter-0.30.6-amd64.exe
1.2 运行软件包 【cmd窗口运行】
windows_exporter-0.30.6-amd64.exe
1.3 访问测试
http:// 10.168.10.1:9182/metrics
2.prometheus server端需要配置要监控的目标(服务发现);
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-windows-exporter"
static_configs:
- targets:
- 10.168.10.200:9182
3.热加载配置文件;
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
4.检查Prometheus的WebUI验证配置是否生效;
http:// 10.168.10.31:9090/targets?search=
5.grafana导入模板ID;
20763
14694
6.grafana的Dashboard出图展示;
略,见视频
5.3 监控redis
2.下载redis exporter
wget https://github.com/oliver006/redis_exporter/releases/download/v1.71.0/redis_exporter-v1.71.0.linux-amd64.tar.gz
svip:
[root@elk92 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/redis_exporter/redis_exporter-v1.71.0.linux-amd64.tar.gz
3.解压软件包
[root@elk92 ~]# tar xf redis_exporter-v1.71.0.linux-amd64.tar.gz -C /usr/local/bin/ redis_exporter-v1.71.0.linux-amd64/redis_exporter --strip-components=1
[root@elk92 ~]#
[root@elk92 ~]# ll /usr/local/bin/redis_exporter
-rwxr-xr-x 1 1001 fwupd-refresh 9642168 May 4 13:22 /usr/local/bin/redis_exporter*
[root@elk92 ~]#
4.运行redis-exporter
[root@elk92 ~]# redis_exporter -redis.addr redis://10.168.10.43:6379 -web.telemetry-path /metrics -web.listen-address :9121
5.访问redis-exporter的webUI
http://10.168.10.92:9121/metrics
6.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-redis-exporter"
static_configs:
- targets:
- 10.168.10.92:9121
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证配置是否生效
http://10.168.10.31:9090/targets?search=
9.Grafana导入ID
11835
14091
14615 # 缺少插件。
5.3.1 grafana插件安装
- Grafana插件安装
1.Grafana插件概述
Grafana支持安装第三方插件。
例如,报错如下: 说明缺少插件
Panel plugin not found: natel-discrete-panel
默认的数据目录:
[root@prometheus-server31 ~]# ll /var/lib/grafana/
total 1940
drwxr-xr-x 5 grafana grafana 4096 May 12 14:46 ./
drwxr-xr-x 61 root root 4096 May 12 10:38 ../
drwxr-x--- 3 grafana grafana 4096 May 12 10:38 alerting/
drwx------ 2 grafana grafana 4096 May 12 10:38 csv/
-rw-r----- 1 grafana grafana 1961984 May 12 14:46 grafana.db
drwx------ 2 grafana grafana 4096 May 12 10:38 png/
[root@prometheus-server31 ~]#
2.Grafana插件管理
2.1 列出本地安装的插件
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# grafana-cli plugins ls
Error: ✗ stat /var/lib/grafana/plugins: no such file or directory
[root@prometheus-server31 ~]#
2.2 安装指定的插件
[root@prometheus-server31 ~]# grafana-cli plugins install natel-discrete-panel
✔ Downloaded and extracted natel-discrete-panel v0.1.1 zip successfully to /var/lib/grafana/plugins/natel-discrete-panel
Please restart Grafana after installing or removing plugins. Refer to Grafana documentation for instructions if necessary.
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# ll /var/lib/grafana/
total 1944
drwxr-xr-x 6 grafana grafana 4096 May 12 14:49 ./
drwxr-xr-x 61 root root 4096 May 12 10:38 ../
drwxr-x--- 3 grafana grafana 4096 May 12 10:38 alerting/
drwx------ 2 grafana grafana 4096 May 12 10:38 csv/
-rw-r----- 1 grafana grafana 1961984 May 12 14:48 grafana.db
drwxr-xr-x 3 root root 4096 May 12 14:49 plugins/
drwx------ 2 grafana grafana 4096 May 12 10:38 png/
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# ll /var/lib/grafana/plugins/
total 12
drwxr-xr-x 3 root root 4096 May 12 14:49 ./
drwxr-xr-x 6 grafana grafana 4096 May 12 14:49 ../
drwxr-xr-x 4 root root 4096 May 12 14:49 natel-discrete-panel/
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# ll /var/lib/grafana/plugins/natel-discrete-panel/
total 180
drwxr-xr-x 4 root root 4096 May 12 14:49 ./
drwxr-xr-x 3 root root 4096 May 12 14:49 ../
-rw-r--r-- 1 root root 1891 May 12 14:49 CHANGELOG.md
drwxr-xr-x 2 root root 4096 May 12 14:49 img/
-rw-r--r-- 1 root root 1079 May 12 14:49 LICENSE
-rw-r--r-- 1 root root 2650 May 12 14:49 MANIFEST.txt
-rw-r--r-- 1 root root 30629 May 12 14:49 module.js
-rw-r--r-- 1 root root 808 May 12 14:49 module.js.LICENSE.txt
-rw-r--r-- 1 root root 108000 May 12 14:49 module.js.map
drwxr-xr-x 2 root root 4096 May 12 14:49 partials/
-rw-r--r-- 1 root root 1590 May 12 14:49 plugin.json
-rw-r--r-- 1 root root 3699 May 12 14:49 README.md
[root@prometheus-server31 ~]#
2.3 重启Grafana使得配置生效
[root@prometheus-server31 ~]# systemctl restart grafana-server.service
[root@prometheus-server31 ~]#
5.4 监控es
1.检查ES集群是否正常
[root@prometheus-server31 ~]# curl https://10.168.10.91:9200/_cat/nodes -u elastic:123456 -k
10.168.10.91 62 61 1 0.05 0.06 0.07 cdfhilmrstw - elk91
10.168.10.93 73 60 1 0.10 0.10 0.08 cdfhilmrstw * elk93
10.168.10.92 76 43 2 0.00 0.00 0.00 cdfhilmrstw - elk92
[root@prometheus-server31 ~]#
2.下载ElasticSearch-exporter
wget https://github.com/prometheus-community/elasticsearch_exporter/releases/download/v1.9.0/elasticsearch_exporter-1.9.0.linux-amd64.tar.gz
SVIP:
[root@elk92 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/elasticSearch_exporter/elasticsearch_exporter-1.9.0.linux-amd64.tar.gz
3.解压软件包
[root@elk92 ~]# tar xf elasticsearch_exporter-1.9.0.linux-amd64.tar.gz -C /usr/local/bin/ elasticsearch_exporter-1.9.0.linux-amd64/elasticsearch_exporter --strip-components=1
[root@elk92 ~]#
[root@elk92 ~]# ll /usr/local/bin/elasticsearch_exporter
-rwxr-xr-x 1 1001 fwupd-refresh 15069336 Mar 3 18:01 /usr/local/bin/elasticsearch_exporter*
[root@elk92 ~]#
4.启动ElasticSearch-exporter
[root@elk92 ~]# elasticsearch_exporter --es.uri="https://elastic:123456@10.168.10.91:9200" --web.listen-address=:9114 --web.telemetry-path="/metrics" --es.ssl-skip-verify
5.访问ElasticSearch-exporter的webUI
http://10.168.10.92:9114/metrics
6.Prometheus修改配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-es-exporter"
static_configs:
- targets:
- 10.168.10.92:9114
[root@prometheus-server31 ~]#
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证ES的配置是否生效
http://10.168.10.31:9090/targets?search=
9.grafna导入模板ID
14191
5.5 监控zookeeper
1.1 修改配置文件
[root@elk91 ~]# vim /usr/local/apache-zookeeper-3.8.4-bin/conf/zoo.cfg
[root@elk91 ~]#
[root@elk91 ~]# tail -5 /usr/local/apache-zookeeper-3.8.4-bin/conf/zoo.cfg
# https://prometheus.io Metrics Exporter
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
metricsProvider.httpHost=0.0.0.0
metricsProvider.httpPort=7000
metricsProvider.exportJvmInfo=true
[root@elk91 ~]#
1.2 同步配置文件到其他节点
[root@elk91 ~]# scp /usr/local/apache-zookeeper-3.8.4-bin/conf/zoo.cfg 10.168.10.92:/usr/local/apache-zookeeper-3.8.4-bin/conf
[root@elk91 ~]# scp /usr/local/apache-zookeeper-3.8.4-bin/conf/zoo.cfg 10.168.10.93:/usr/local/apache-zookeeper-3.8.4-bin/conf
1.3 启动zookeeper集群
[root@elk91 ~]# zkServer.sh start
[root@elk92 ~]# zkServer.sh start
[root@elk93 ~]# zkServer.sh start
2.访问zookeeper的webUI
http://10.168.10.91:7000/metrics
http://10.168.10.92:7000/metrics
http://10.168.10.93:7000/metrics
3.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
- job_name: "cmy-zookeeper-exporter"
static_configs:
- targets:
- 10.168.10.91:7000
- 10.168.10.92:7000
- 10.168.10.93:7000
4.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
5.验证配置是否生效
http://10.168.10.31:9090/targets?search=
6.Grafana导入模板ID
10465
5.6 监控kafka
5.6.1 启动kafka集群
1.启动kafka集群
[root@elk91 ~]# kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
[root@elk91 ~]# ss -ntl | grep 9092
LISTEN 0 50 *:9092 *:*
[root@elk91 ~]#
[root@elk92 ~]# kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
[root@elk92 ~]# ss -ntl | grep 9092
LISTEN 0 50 *:9092 *:*
[root@elk92 ~]#
[root@elk93 ~]# kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
[root@elk93 ~]# ss -ntl | grep 9092
LISTEN 0 50 *:9092 *:*
[root@elk93 ~]#
5.6.2 部署kafka exporter
2.下载kafka exporter
wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.9.0/kafka_exporter-1.9.0.linux-amd64.tar.gz
SVIP:
[root@elk91 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/kafka_exporter/kafka_exporter-1.9.0.linux-amd64.tar.gz
3.解压软件包
[root@elk91 ~]# tar xf kafka_exporter-1.9.0.linux-amd64.tar.gz -C /usr/local/bin/ kafka_exporter-1.9.0.linux-amd64/kafka_exporter --strip-components=1
[root@elk91 ~]#
[root@elk91 ~]# ll /usr/local/bin/kafka_exporter
-rwxr-xr-x 1 1001 fwupd-refresh 25099148 Feb 17 11:04 /usr/local/bin/kafka_exporter*
[root@elk91 ~]#
4.启动kafka exporter
[root@elk91 ~]# kafka_exporter --kafka.version="3.9.0" --kafka.server=10.168.10.92:9092 --web.listen-address=":9308" --web.telemetry-path="/metrics"
5.访问kafka的webUI
http://10.168.10.91:9308/metrics
5.6.3 修改Prometheus的配置文件
6.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-kafka-exporter"
static_configs:
- targets:
- 10.168.10.91:9308
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证配置是否生效
http://10.168.10.31:9090/targets?search=
9.Grafana导入模板ID
7589
12483
5.7 监控rabbitmq
5.7.1 安装rabbitmq
#### docker-compose安装rabbitmq
> 这里注意rabbitmq需要暴露2个端口
docker-compose.yaml
version: '3'
services:
rabbitmq:
image: rabbitmq:3.7.15-management
container_name: rabbitmq
restart: always
volumes:
- /data/rabbitmq/data: /var/lib/rabbitmq
- /data/rabbitmq/log: /var/log/rabbitmq
port:
- 5672:5672
- 15672:15672
docker-compose up -d
5.7.2 docker安装exporter
docker安装exporter
docker run -d -p 9419:9419 --name rabbitmq_exporter -e RABBIT_URL=http://rabbitmq:15672 -e RABBIT_USER=guest -e RABBIT_PASSWORD=guest kbudde/rabbitmq_exporter
安装好Exporter后会暴露一个/metrics结尾的服务
5.访问rabbitmq_exporter的webUI
http://localhost:9419/metrics
5.7.3 修改Prometheus的配置文件
6.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-rabbitmq-exporter"
static_configs:
- targets:
- 10.168.10.91:9419
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证配置是否生效
http://10.168.10.31:9090/targets?search=
5.8 监控docker
5.8.1 部署docker环境
参考链接:
https://github.com/google/cadvisor
1.部署docker环境【建议41-43都安装】
wget http://192.168.17.253/Resources/Docker/softwares/cmy-autoinstall-docker-docker-compose.tar.gz
tar xf cmy-autoinstall-docker-docker-compose.tar.gz
./install-docker.sh i
wget http://192.168.17.253/Resources/Prometheus/images/cAdvisor/cmy-cadvisor-v0.52.1.tar.gz
docker load -i cmy-cadvisor-v0.52.1.tar.gz
2.导入镜像【建议41-43都安装】
wget http://192.168.17.253/Resources/Docker/images/Linux/alpine-v3.20.2.tar.gz
docker image load < alpine-v3.20.2.tar.gz
3.运行测试的镜像
[root@node-exporter41 ~]# docker run -id --name c1 alpine:3.20.2
344a3e936abe90cfb2e2e0e6e5f13e1117a79faa5afb939ae261794d3c5ee2b0
[root@node-exporter41 ~]#
[root@node-exporter41 ~]# docker run -id --name c2 alpine:3.20.2
b2130c8f78f2df06f53d338161f3f9ad6a133c9c6b68ddb011884c788bb1b37d
[root@node-exporter41 ~]#
[root@node-exporter41 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2130c8f78f2 alpine:3.20.2 "/bin/sh" 5 seconds ago Up 4 seconds c2
344a3e936abe alpine:3.20.2 "/bin/sh" 8 seconds ago Up 8 seconds c1
[root@node-exporter41 ~]#
[root@node-exporter42 ~]# docker run -id --name c3 alpine:3.20.2
f399c1aafd607bf0c18dff09c1839f923ee9db39b68edf5b216c618a363566a1
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# docker run -id --name c4 alpine:3.20.2
bff22c8d96f731cd44dfa55b60a9dd73d7292add33ea5b82314bf2352db115a7
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bff22c8d96f7 alpine:3.20.2 "/bin/sh" 3 seconds ago Up 1 second c4
f399c1aafd60 alpine:3.20.2 "/bin/sh" 6 seconds ago Up 4 seconds c3
[root@node-exporter42 ~]#
[root@node-exporter43 ~]# docker run -id --name c5 alpine:3.20.2
198464e1e9a3c7aefb361c3c7df3bfe8009b5ecd633aa19503321428d404008c
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# docker run -id --name c6 alpine:3.20.2
b8ed9fcec61e017086864f8eb223cd6409d33f144e2fcfbf33acfd09860b0a06
[root@node-exporter43 ~]#
5.8.2 运行cAdVisor
4.运行cAdVisor【建议41-43都安装】
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--network host \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
gcr.io/cadvisor/cadvisor-amd64:v0.52.1
5.访问cAdvisor的webUI
http://10.168.10.41:8080/docker/
http://10.168.10.42:8080/docker/
http://10.168.10.43:8080/docker/
[root@node-exporter41 ~]# curl -s http://10.168.10.43:8080/metrics | wc -l
3067
[root@node-exporter41 ~]#
5.8.3 prometheus配置
6.Prometheus监控容器节点
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-docker-cadVisor"
static_configs:
- targets:
- 10.168.10.41:8080
- 10.168.10.42:8080
- 10.168.10.43:8080
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证配置是否生效
http://10.168.10.31:9090/targets?search=
9.Grafana导入ID模板
10619
无法正确显示数据的优化案例:
- 1.PromQL语句优化
count(last_over_time(container_last_seen{instance=~"$node:$port",job=~"$job",image!=""}[3s]))
- 2.Value options
将'Calculation'字段设置为'Last *'即可。
- 3.保存Dashboard
若不保存,刷新页面后所有配置丢失!!!
5.9 监控mysql
5.9.1 部署MySQL
1.1 导入MySQL镜像
[root@node-exporter43 ~]# wget http://192.168.17.253/Resources/Docker/images/WordPress/cmy-mysql-v8.0.36-oracle.tar.gz
[root@node-exporter43 ~]# docker load < cmy-mysql-v8.0.36-oracle.tar.gz
1.2 运行MySQL服务
[root@node-exporter43 ~]# docker run -d --network host --name mysql-server --restart always -e MYSQL_DATABASE=prometheus -e MYSQL_USER=linux97 -e MYSQL_PASSWORD=cmy -e MYSQL_ALLOW_EMPTY_PASSWORD=yes mysql:8.0.36-oracle --character-set-server=utf8 --collation-server=utf8_bin --default-authentication-plugin=mysql_native_password
1.3 检查MySQL服务
[root@node-exporter43 ~]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
16aa74bc9e03 mysql:8.0.36-oracle "docker-entrypoint.s…" 2 seconds ago Up 2 seconds mysql-server
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# ss -ntl | grep 3306
LISTEN 0 151 *:3306 *:*
LISTEN 0 70 *:33060 *:*
[root@node-exporter43 ~]#
1.4 添加用户权限
[root@node-exporter43 ~]# docker exec -it mysql-server mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.36 MySQL Community Server - GPL
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SHOW GRANTS FOR linux97;
+---------------------------------------------------------+
| Grants for linux97@% |
+---------------------------------------------------------+
| GRANT USAGE ON *.* TO `linux97`@`%` |
| GRANT ALL PRIVILEGES ON `prometheus`.* TO `linux97`@`%` |
+---------------------------------------------------------+
2 rows in set (0.00 sec)
mysql>
mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO linux97;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SHOW GRANTS FOR linux97;
+-------------------------------------------------------------------+
| Grants for linux97@% |
+-------------------------------------------------------------------+
| GRANT SELECT, PROCESS, REPLICATION CLIENT ON *.* TO `linux97`@`%` |
| GRANT ALL PRIVILEGES ON `prometheus`.* TO `linux97`@`%` |
+-------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql>
5.9.2 部署mysql_exporter
2.下载MySQL-exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.17.2/mysqld_exporter-0.17.2.linux-amd64.tar.gz
SVIP:
[root@node-exporter42 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/mysql_exporter/mysqld_exporter-0.17.2.linux-amd64.tar.gz
3.解压软件包
[root@node-exporter42 ~]# tar xf mysqld_exporter-0.17.2.linux-amd64.tar.gz -C /usr/local/bin/ mysqld_exporter-0.17.2.linux-amd64/mysqld_exporter --strip-components=1
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# ll /usr/local/bin/mysqld_exporter
-rwxr-xr-x 1 1001 1002 18356306 Feb 26 15:16 /usr/local/bin/mysqld_exporter*
[root@node-exporter42 ~]#
4.运行MySQL-exporter暴露MySQL的监控指标
[root@node-exporter42 ~]# cat .my.cnf
[client]
host = 10.168.10.43
port = 3306
user = linux97
password = cmy
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# mysqld_exporter --config.my-cnf=/root/.my.cnf
...
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:239 msg="Starting mysqld_exporter" version="(version=0.17.2, branch=HEAD, revision=e84f4f22f8a11089d5f04ff9bfdc5fc042605773)"
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:240 msg="Build context" build_context="(go=go1.23.6, platform=linux/amd64, user=root@18b69b4b0fea, date=20250226-07:16:19, tags=unknown)"
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=global_status
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=global_variables
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=slave_status
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=info_schema.innodb_cmp
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=info_schema.innodb_cmpmem
time=2025-05-13T02:07:45.898Z level=INFO source=mysqld_exporter.go:252 msg="Scraper enabled" scraper=info_schema.query_response_time
time=2025-05-13T02:07:45.898Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9104
time=2025-05-13T02:07:45.898Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9104
5.验证测试
[root@node-exporter41 ~]# curl -s http://10.168.10.42:9104/metrics | wc -l
2569
[root@node-exporter41 ~]#
5.9.3 prometheus配置
6.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-mysql-exporter"
static_configs:
- targets:
- 10.168.10.42:9104
7.热加载配置文件
[root@prometheus-server31 ~]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8.验证配置是否生效
http://10.168.10.31:9090/targets?search=
9.Grafana导入ID模板
14057
17320
5.10 监控mongo
5.10.1 部署mongoDB
1 导入mongoDB镜像
wget http://192.168.17.253/Resources/Prometheus/images/MongoDB/cmy-mongoDB-v8.0.6-noble.tar.gz
docker load -i cmy-mongoDB-v8.0.6-noble.tar.gz
2 部署mongoDB服务
[root@node-exporter43 ~]# docker run -d --name mongodb-server --network host mongo:8.0.6-noble
4b0f00dea78bb571c216c344984ced026c1210c94db147fdc9e32f549e3135de
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8179c6077ec8 mongo:8.0.6-noble "docker-entrypoint.s…" 4 seconds ago Up 3 seconds mongodb-server
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# ss -ntl | grep 27017
LISTEN 0 4096 0.0.0.0:27017 0.0.0.0:*
[root@node-exporter43 ~]#
5.10.2 部署mongodb_exporter
3 下载MongoDB的exporter
https://github.com/percona/mongodb_exporter/releases/download/v0.43.1/mongodb_exporter-0.43.1.linux-amd64.tar.gz
SVIP:
[root@node-exporter42 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/MongoDB_exporter/mongodb_exporter-0.43.1.linux-amd64.tar.gz
4 解压软件包
[root@node-exporter42 ~]# tar xf mongodb_exporter-0.43.1.linux-amd64.tar.gz -C /usr/local/bin/ mongodb_exporter-0.43.1.linux-amd64/mongodb_exporter --strip-components=1
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# ll /usr/local/bin/mongodb_exporter
-rwxr-xr-x 1 1001 geoclue 20467864 Dec 13 20:10 /usr/local/bin/mongodb_exporter*
[root@node-exporter42 ~]#
5 运行mongodb-exporter
[root@node-exporter42 ~]# mongodb_exporter --mongodb.uri=mongodb://10.168.10.43:27017 --log.level=info --collect-all
time=2025-05-13T02:49:26.332Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9216
time=2025-05-13T02:49:26.332Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9216
6 验证mongoDB-exporter的WebUI
http://10.168.10.42:9216/metrics
[root@node-exporter41 ~]# curl -s http://10.168.10.42:9216/metrics | wc -l
8847
[root@node-exporter41 ~]#
5.10.3 配置Prometheus
7.配置Prometheus监控mongoDB容器
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: cmy-linux96-mongodb-exporter
static_configs:
- targets:
- 10.168.10.42:9216
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# curl -X POST http://10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
8 验证Prometheus配置是否生效
http://10.168.10.31:9090/targets?search=
可以进行数据的查询,推荐使用: mongodb_dbstats_dataSize
9 grafana导入模板ID
16504
由于我们的MongoDB版本较为新,grafana的社区模板更新的并不及时,因此可能需要我们自己定制化一些Dashboard。
参考链接:
https://grafana.com/grafana/dashboards
5.11 监控nginx
两种方案
【编译安装时添加vts模块即可】
【编译时添加vts模块且需要安装nginx-exporter】
5.11.1 编译安装nginx
1 编译安装nginx
1.1 安装编译工具
apt -y install git wget gcc make zlib1g-dev build-essential libtool openssl libssl-dev
1.2 克隆nginx-module-vts模块
git clone https://gitee.com/jasonyin2020/nginx-module-vts.git
1.3 下载nginx软件包
wget https://nginx.org/download/nginx-1.28.0.tar.gz
1.4 解压nginx
tar xf nginx-1.28.0.tar.gz
1.5 配置nginx
./configure --prefix=/cmy/softwares/nginx --with-http_ssl_module --with-http_v2_module --with-http_realip_module --without-http_rewrite_module --with-http_stub_status_module --without-http_gzip_module --with-file-aio --with-stream --with-stream_ssl_module --with-stream_realip_module --add-module=/root/nginx-module-vts
1.6 编译并安装nginx
make -j 2 && make install
1.7 修改nginx的配置文件
vim /cmy/softwares/nginx/conf/nginx.conf
...
http {
vhost_traffic_status_zone;
upstream cmy-promethues {
server 10.168.10.31:9090;
}
...
server {
...
location / {
root html;
# index index.html index.htm;
proxy_pass http://cmy-promethues;
}
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
}
}
}
1.8 检查配置文件语法
/cmy/softwares/nginx/sbin/nginx -t
1.9 启动nginx
/cmy/softwares/nginx/sbin/nginx
1.10 访问nginx的状态页面
http://10.168.10.43/status/format/prometheus
5.11.2 安装nginx-vtx-exporter
2.1 下载nginx-vtx-exporter
wget https://github.com/sysulq/nginx-vts-exporter/releases/download/v0.10.8/nginx-vtx-exporter_0.10.8_linux_amd64.tar.gz
SVIP:
wget http://192.168.17.253/Resources/Prometheus/softwares/nginx_exporter/nginx-vtx-exporter_0.10.8_linux_amd64.tar.gz
2.2 解压软件包到path路径
[root@node-exporter42 ~]# tar xf nginx-vtx-exporter_0.10.8_linux_amd64.tar.gz -C /usr/local/bin/ nginx-vtx-exporter
[root@node-exporter42 ~]#
[root@node-exporter42 ~]# ll /usr/local/bin/nginx-vtx-exporter
-rwxr-xr-x 1 1001 avahi 7950336 Jul 11 2023 /usr/local/bin/nginx-vtx-exporter*
[root@node-exporter42 ~]#
2.3 运行nginx-vtx-exporter
[root@node-exporter42 ~]# nginx-vtx-exporter -nginx.scrape_uri=http://10.168.10.43/status/format/json
5.11.3 配置prometheus
3.1 修改配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-nginx-exporter"
metrics_path: "/status/format/prometheus"
static_configs:
- targets:
- "10.168.10.43:80"
- job_name: "cmy-nginx-vts-exporter"
static_configs:
- targets:
- "10.168.10.42:9913"
3.2 重新加载配置并验证配置是否生效
curl -X POST http://10.168.10.31:9090/-/reload
3.3 导入grafana模板
9785【编译安装时添加vts模块即可】
2949【编译时添加vts模块且需要安装nginx-exporter】
5.12 监控tomcat
5.12.1 部署tomcat
- prometheus监控主流的中间件之tomcat
1 部署tomcat-exporter
1.1 导入镜像
[root@node-exporter43 ~]# wget http://192.168.17.253/Resources/Prometheus/images/cmy-tomcat-v9.0.87.tar.gz
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# docker load -i cmy-tomcat-v9.0.87.tar.gz
1.2 基于Dockerfile构建tomcat-exporter
[root@node-exporter43 ~]# git clone https://gitee.com/jasonyin2020/tomcat-exporter.git
[root@node-exporter43 ~]# cd tomcat-exporter/
[root@node-exporter43 tomcat-exporter]#
[root@node-exporter43 tomcat-exporter]# ll
total 44
drwxr-xr-x 5 root root 4096 May 13 11:55 ./
drwx------ 10 root root 4096 May 13 11:55 ../
-rw-r--r-- 1 root root 96 May 13 11:55 build.sh
-rw-r--r-- 1 root root 503 May 13 11:55 Dockerfile
drwxr-xr-x 8 root root 4096 May 13 11:55 .git/
drwxr-xr-x 2 root root 4096 May 13 11:55 libs/
-rw-r--r-- 1 root root 3407 May 13 11:55 metrics.war
drwxr-xr-x 2 root root 4096 May 13 11:55 myapp/
-rw-r--r-- 1 root root 191 May 13 11:55 README.md
-rw-r--r-- 1 root root 7604 May 13 11:55 server.xml
[root@node-exporter43 tomcat-exporter]#
[root@node-exporter43 tomcat-exporter]# bash build.sh
1.2 运行tomcat镜像
[root@node-exporter43 tomcat-exporter]# docker run -dp 18080:8080 --name tomcat-server registry.cn-hangzhou.aliyuncs.com/cmy-k8s/tomcat9-app:v1
5643c618db790e12b5ec658c362b3963a3db39914c826d6eef2fe55355f1d5d9
[root@node-exporter43 tomcat-exporter]#
[root@node-exporter43 tomcat-exporter]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5643c618db79 registry.cn-hangzhou.aliyuncs.com/cmy-k8s/tomcat9-app:v1 "/usr/local/tomcat/b…" 4 seconds ago Up 4 seconds 8009/tcp, 8443/tcp, 0.0.0.0:18080->8080/tcp, :::18080->8080/tcp tomcat-server
[root@node-exporter43 tomcat-exporter]#
1.3 访问tomcat应用
http://10.168.10.43:18080/metrics/
http://10.168.10.43:18080/myapp/
5.12.2 配置prometheus
2 配置prometheus监控tomcat应用
2.1 修改配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-tomcat-exporter"
static_configs:
- targets:
- "10.168.10.43:18080"
2.2 重新加载配置并验证配置是否生效
curl -X POST http://10.168.10.31:9090/-/reload
2.3 导入grafana模板
由于官方的支持并不友好,可以在GitHub自行搜索相应的tomcat监控模板。
参考链接:
https://github.com/nlighten/tomcat_exporter/blob/master/dashboard/example.json
5.13 监控ESXI
5.13.1 部署esxi
使用Vmware Workstation搭建部署ESXi 7虚拟机_vmware workstation 14支持esxi7.0-CSDN博客
5.13.2 部署vmware_exporter
cat docker-compose.yaml
services:
vmware-exporter:
image: pryorda/vmware_exporter
container_name: vmware-exporter
restart: unless-stopped
ports:
- '9272:9272'
expose:
- 9272
environment:
VSPHERE_HOST: "10.168.10.140"
VSPHERE_IGNORE_SSL: "True"
VSPHERE_USER: "root"
VSPHERE_PASSWORD: "cmyCMY123"
labels:
org.label-schema.group: "monitoring"
5.13.3 配置Prometheus
- job_name: "cmy-vm-exsi-exporter"
static_configs:
- targets:
- 10.168.10.41:9272
5.14 监控fastDFS
在创建容器之前我们先简单说一下 FastDFS,FastDFS 系统有三个角色:
跟踪服务器
(Tracker Server):跟踪服务器,主要做调度工作,起到均衡的作用;负责管理所有的 storage server和 group,每个 storage 在启动后会连接 Tracker,告知自己所属 group 等信息,并保持周期性心跳。存储服务器
(Storage Server):存储服务器,主要提供容量和备份服务;以 group 为单位,每个 group 内可以有多台 storage server,数据互为备份。客户端
(Client):上传下载数据的服务器,也就是我们自己的项目所部署在的服务器。
1.拉取镜像
docker pull season/fastdfs:1.2
mkdir -p /usr/local/server/fastdfs/tracker/data
mkdir -p /usr/local/server/fastdfs/storage/data
mkdir -p /usr/local/server/fastdfs/storage/path
2 创建tracker容器(跟踪服务器容器)
docker run -id --name tracker \ -p 22122:22122 \ --restart=always --net host \ -v /usr/local/server/fastdfs/tracker/data:/fastdfs/tracker/data \ season/fastdfs:1.2 tracker
3 创建storage容器(存储服务器容器)
docker run -id --name storage \
--restart=always --net host \
-v /usr/local/server/fastdfs/data/storage:/fastdfs/store_path \
-e TRACKER_SERVER="10.168.10.41:22122" \
season/fastdfs:1.2 storage
docker cp trakcer:/etc/fdfs/client.conf /usr/local/server/fastdfs/
vi client.conf #修改地址
docker cp /usr/local/server/fastdfs/client.conf tracker:/etc/fdfs
4. 上传文件测试
docker exec -it tracker bash
echo "niceyoo" > niceyoo.txt
通过 fdfs_upload_file 命令将 niceyoo.txt 文件上传至服务器
fdfs_upload_file /etc/fdfs/client.conf niceyoo.txt
5.### 配置Nginx
docker cp storage:/etc/nginx/conf/nginx.conf /tmp/
#修改如下
location / {
root /fastdfs/store_path/data;
ngx_fastdfs_module;
}
#运行nginx容器
docker run -id --name fastdfs_nginx \
--restart=always \
-v /usr/local/server/fastdfs/data/storage:/fastdfs/store_path \
-v /nginx.conf:/etc/nginx/conf/nginx.conf \
-p 8888:80 \
-e TRACKER_SERVER=10.168.10.41:22122 \
season/fastdfs:1.2 nginx
5.15 监控hadoop
hadoop搭建
1.修改hosts并分发密钥
cat /etc/hosts
10.168.10.231 master-231
10.168.10.232 worker232
10.168.10.233 worker233
ssh-keygen
ssh-copy-id master-231
ssh-copy-id worker232
ssh-copy-id worker233
2.准备jdk环境和hadoop软件
apt update
apt install -y openjdk-8-jdk
wget https://mirrors.aliyun.com/apache/hadoop/core/hadoop-3.3.6/hadoop-3.3.6.tar.gz
sudo mkdir -p /export/{data,servers,software}
tar -zxvf hadoop-3.3.6.tar.gz -C /export/servers/
cat >> /etc/profile <<EOF
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/export/servers/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF
source /etc/profile
验证是否配置成功
hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /export/servers/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar
Hadoop集群配置
cd /export/servers/hadoop-3.3.6/etc/hadoop/
vim hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre
# HDFS组件用户配置
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
# YARN组件用户配置
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master-231:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-3.3.6/tmp</value>
</property>
</configuration>
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>worker232:50090</value>
</property>
</configuration>
vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master-231</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
cat workers
master-231
worker232
worker233
scp -r /export/ 10.168.10.232:/
scp -r /export/ 10.168.10.233:/
hdfs namenode -format #在主节点格式化文件系统
start-dfs.sh
start-yarn.sh
jps 检查
jps
10146 DataNode
9991 NameNode
10489 ResourceManager
12860 Jps
10639 NodeManager
访问web
http://10.168.10.231:8088
prometheus配置
监控
ID
23175
5.16 minio集群部署及prometheus 监控
[[minio集群部署及prometheus 监控]]
6 服务发现方式 *****
[[服务发现]]
7 联邦模式
默认情况下,prometheus采集的数据会存储到本地,这意味者prometheus在这种工作模式下,可能会存在单机存储的瓶颈。
为了解决prometheus对于数据的采集压力,我们可以采用联邦模式来部署prometheus。
2.prometheus server32配置
2.1 安装Prometheus server
[root@prometheus-server32 ~]# wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-prometheus-server-v2.53.4.tar.gz
[root@prometheus-server32 ~]#
[root@prometheus-server32 ~]# tar xf cmy-install-prometheus-server-v2.53.4.tar.gz
[root@prometheus-server32 ~]#
[root@prometheus-server32 ~]# ./install-prometheus-server.sh i
2.2 修改prometheus server配置文件
[root@prometheus-server32 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: 'cmy-file-service-discovery-32'
# 基于文件的服务发现
file_sd_configs:
# 指定文件路径
- files:
- /tmp/cmy-file-sd.yaml
2.3 重载prometheus server
[root@prometheus-server32 ~]# curl -X POST http://10.168.10.32:9090/-/reload
[root@prometheus-server32 ~]#
2.4 编写yaml文件
[root@prometheus-server32 ~]# cat > /tmp/cmy-file-sd.yaml <<EOF
- targets:
- '10.168.10.41:9100'
labels:
"address": "shahe"
"office": "www.cmy.com"
"apps": "yaml"
EOF
2.5 验证数据是否采集成功
http://10.168.10.32:9090/targets?search=
3.prometheus server33 配置
3.1 安装Prometheus server
[root@prometheus-server33 ~]# wget http://192.168.17.253/Resources/Prometheus/Scripts/cmy-install-prometheus-server-v2.53.4.tar.gz
[root@prometheus-server33 ~]#
[root@prometheus-server33 ~]# tar xf cmy-install-prometheus-server-v2.53.4.tar.gz
[root@prometheus-server33 ~]#
[root@prometheus-server33 ~]# ./install-prometheus-server.sh i
3.2 修改prometheus server的配置文件
[root@prometheus-server33 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: "cmy-consul-sd"
consul_sd_configs:
- server: 10.168.10.43:8500
- server: 10.168.10.42:8500
- server: 10.168.10.41:8500
relabel_configs:
- source_labels: [__meta_consul_service]
regex: consul
action: drop
3.3 重载prometheus server
curl -X POST http://10.168.10.33:9090/-/reload
3.5 注册节点
curl -X PUT -d '{"id":"prometheus-node42","name":"cmy-prometheus-node42","address":"10.168.10.42","port":9100,"tags":["node-exporter"],"checks": [{"http":"http://10.168.10.42:9100","interval":"5m"}]}' http://10.168.10.43:8500/v1/agent/service/register
curl -X PUT -d '{"id":"prometheus-node43","name":"cmy-prometheus-node43","address":"10.168.10.43","port":9100,"tags":["node-exporter"],"checks": [{"http":"http://10.168.10.43:9100","interval":"5m"}]}' http://10.168.10.43:8500/v1/agent/service/register
3.6 验证数据是否采集成功
http://10.168.10.33:9090/targets?search=
4.Prometheus server31配置
4.1 修改prometheus server的配置文件
[root@prometheus-server31 ~]# cd /cmy/softwares/prometheus-2.53.4.linux-amd64/
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "prometheus-federate-32"
metrics_path: "/federate"
# 用于解决标签的冲突问题,有效值为: true和false,默认值为false
# 当设置为true时,将保留抓取的标签以忽略服务器自身的标签。说白了会覆盖原有标签。
# 当设置为false时,则不会覆盖原有标签,而是在标点前加了一个"exported_"前缀。
honor_labels: true
params:
"match[]":
- '{job="promethues"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- "10.168.10.32:9090"
- job_name: "prometheus-federate-33"
metrics_path: "/federate"
honor_labels: true
params:
"match[]":
- '{job="promethues"}'
- '{__name__=~"job:.*"}'
- '{__name__=~"node.*"}'
static_configs:
- targets:
- "10.168.10.33:9090"
4.2 重载prometheus server
curl -X POST http://10.168.10.31:9090/-/reload
4.3 验证数据是否采集成功
http://10.168.10.31:9090/targets?search=
4.4 查询数据进行验证
node_cpu_seconds_total{job=~"cmy-consul-sd|cmy-file-service-discovery-32"}
4.5 基于Grafana查询数据
略,见视频。
8 远端存储 **
- prometheus本地存储常用参数解析
1.常用参数说明
--config.file=/cmy/softwares/prometheus/prometheus.yml
指定prometheus的配置文件。
--web.enable-lifecycle
启用web方式热加载。
--storage.tsdb.path="/cmy/data/prometheus"
指定prometheus数据存储路径。
--storage.tsdb.retention.time="60d"
指定prometheus数据存储周期。
--web.listen-address="0.0.0.0:9090"
指定prometheus的监听端口。
--web.max-connections=65535
指定最大的连接数。
--storage.tsdb.retention.size="512MB"
指定prometheus数据块的滚动大小。
--query.timeout=10s
查询数据的超时时间。
--query.max-concurrency=20
最大并发查询数量。
--log.level=info
指定日志级别。
--log.format=logfmt
指定日志格式。
--web.read-timeout=5m
最大的空闲超时时间。
8.1 Prometheus集成VicoriaMetrics远端存储
1 VicoriaMetrics概述
VictoriaMetrics是一个快速、经济高效且可扩展的监控解决方案和时间序列数据库。
官网地址:
https://victoriametrics.com/
官方文档:
https://docs.victoriametrics.com/
GitHub地址:
https://github.com/VictoriaMetrics/VictoriaMetrics
部署文档:
https://docs.victoriametrics.com/quick-start/
2 部署victoriametrics
2.1 下载victoriametrics
版本选择建议使用93 LTS,因为使用97 LTS貌似需要企业授权,启动报错,发现如下信息:
[root@prometheus-server33 ~]# journalctl -u victoria-metrics.service -f
...
Nov 14 12:03:28 prometheus-server33 victoria-metrics-prod[16999]: 2024-11-14T04:03:28.576Z error VictoriaMetrics/lib/license/copyrights.go:33 VictoriaMetrics Enterprise license is required. Please obtain it at https://victoriametrics.com/products/enterprise/trial/ and pass it via either -license or -licenseFile command-line flags. See https://docs.victoriametrics.com/enterprise/
wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.93.16/victoria-metrics-linux-amd64-v1.93.16.tar.gz
SVIP:
[root@prometheus-server32 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/VictoriaMetrics/victoria-metrics-linux-amd64-v1.93.16.tar.gz
2.2 解压软件包
[root@prometheus-server32 ~]# tar xf victoria-metrics-linux-amd64-v1.93.16.tar.gz -C /usr/local/bin/
[root@prometheus-server32 ~]#
[root@prometheus-server32 ~]# ll /usr/local/bin/victoria-metrics-prod
-rwxr-xr-x 1 cmy cmy 22216200 Jul 18 2024 /usr/local/bin/victoria-metrics-prod*
[root@prometheus-server32 ~]#
2.3 编写启动脚本
[root@prometheus-server32 ~]# cat > /etc/systemd/system/victoria-metrics.service <<EOF
[Unit]
Description=cmy Linux VictoriaMetrics Server
Documentation=https://docs.victoriametrics.com/
After=network.target
[Service]
ExecStart=/usr/local/bin/victoria-metrics-prod \
-httpListenAddr=0.0.0.0:8428 \
-storageDataPath=/cmy/data/victoria-metrics \
-retentionPeriod=3
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now victoria-metrics.service
systemctl status victoria-metrics
2.4 检查端口是否存活
[root@prometheus-server32 ~]# ss -ntl | grep 8428
LISTEN 0 4096 0.0.0.0:8428 0.0.0.0:*
[root@prometheus-server32 ~]#
2.5 查看webUI
http://10.168.10.32:8428/
3 prometheus配置VictoriaMetrics远端存储
3.1 修改prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
# 在顶级字段中配置VictoriaMetrics地址
remote_write:
- url: http://10.168.10.32:8428/api/v1/write
3.2 重新加载prometheus的配置
[root@prometheus-server31 ~]# systemctl stop prometheus-server
[root@prometheus-server31 ~]#
[root@prometheus-server31 ~]# cd /cmy/softwares/prometheus-2.53.4.linux-amd64/
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./prometheus
温馨提示:
为了避免实验干扰,我建议大家手动启动prometheus。
3.3 在VictoriaMetrics的WebUI查看数据
node_cpu_seconds_total{job=~"cmy-consul-sd|cmy-file-service-discovery-32"}
温馨提示:
如果此步骤没有数据,则不要做下面的步骤了,请先把数据搞出来。
3.4 配置grafana的数据源及URL
数据源是prometheus,但是URL得写VictoriaMetric的URL哟。
3.5 导入grafana的模板ID并选择数据源
1860
9 prometheus标签管理 *
9.1 标签的声明周期
- 标签的声明周期:
- labels :
在抓取数据之前给目标节点打标签,为target打标签。
- relabel_configs:
relabel_configs修改target标签案例,修改后可以存储在本地。
TIPS:
在此步骤之前,能够处理job标签,但是不能处理instance标签,此处我觉得instance标签应该在relabel_configs之后系统自动加入的。
- metric_relabel_configs修改metric标签案例
修改的是某个target内部的某个监控指标的标签。
9.2 标签管理
- 标签管理进阶
1.标签的分类
标签用于对数据分组和分类,利用标签可以将数据进行过滤筛选。
标签管理的常见场景:
- 1.删除不必要的指标;
- 2.从指标中删除敏感或不需要的标签;
- 3.添加,编辑或修改指标的标签值或标签格式;
标签的分类:
- 默认标签:
Prometheus自身内置的标签,格式为"__LABLE__"。
典型点如下所示:
- "__metrics_path__"
- "__address__"
- "__scheme__"
- "__scrape_interval__"
- "__scrape_timeout__"
- "__name__"
- "instance"
- "job"
- 应用标签:
应用本身内置,尤其是监控特定的服务,会有对应的应用标签,格式一般为"__LABLE"
如下图所示,以consul服务为例,典型点如下所示:
- "__meta_consul_address"
- "__meta_consul_service"
- "__meta_consul_dc"
- ...
- 自定义标签:
指的是用户自定义的标签,我们在定义targets可以自定义。
- relabel_configs修改target标签案例
1.为targets自定义打标签案例
1.1 修改配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-labels"
static_configs:
- targets: ["10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
labels:
auther: cmy
office: https://www.cmy.com
...
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
1.2.热加载配置
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
1.3.查询数据验证
node_cpu_seconds_total{office="https://www.cmy.com"}
2.relabel_configs使用target_label新增标签
2.1 修改配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-labels"
static_configs:
- targets: ["10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
labels:
auther: cmy
office: https://www.cmy.com
relabel_configs:
# 匹配源标签的值
- source_labels:
- job
# 将匹配到的值赋值给新标签
target_label: linux97_jobs
...
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
2.2 测试验证
node_cpu_seconds_total{linux97_jobs="cmy-linux97-node-exporter-labels"}
3.relabel_configs替换标签replace案例
3.1 修改配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-relabel_configs-regex-separator-replacement-action"
static_configs:
- targets: ["10.168.10.31:9100","10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
relabel_configs:
# 指定正则表达式匹配成功的label进行标签管理的列表
- source_labels:
- __scheme__
- __address__
- __metrics_path__
# 表示source_labels对应Label的名称或值进行匹配此处指定的正则表达式。
# 此处我们对数据进行了分组,后面replacement会是哟合格"${1}"和"$2"进行引用。
regex: "(http|https)(.*)"
# 指定用于连接多个source_labels为一个字符串的分隔符,若不指定,默认为分号";"。
# 假设源数据如下:
# __address__="10.168.10.41:9100"
# __metrics_path__="/metrics"
# __scheme__="http"
# 拼接后操作的结果为: "http10.168.10.41:9100/metrics"
separator: ""
# 在进行Label替换的时候,可以将原来的source_labels替换为指定修改后的label。
# 将来会新加一个标签,标签的名称为"cmy_prometheus_ep",值为replacement的数据。
target_label: "cmy_prometheus_ep"
# 替换标签时,将target_label对应的值进行修改成此处的值,如果不指定,默认使用"$1"
replacement: "${1}://${2}"
# 对Label或指标进行管理,场景的动作有replace|keep|drop|lablemap|labeldrop等,默认为replace。
# 参考链接地址:
# https://prometheus.io/docs/prometheus/2.53/configuration/configuration/#relabel_config
action: replace
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
3.2 测试验证
node_cpu_seconds_total{job="cmy-linux97-node-exporter-relabel_configs-regex-separator-replacement-action"}
4.relabel_configs新增标签映射labelmap案例
4.1 修改配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-relabel_configs-labelmap"
static_configs:
- targets: ["10.168.10.31:9100","10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
labels:
auther: cmy
relabel_configs:
- regex: "(job|auther)"
replacement: "${1}_cmy_labelmap_kubernetes"
# labelmap一般用于生成新的标签,通常用于取出匹配标签名的一部分生成新标签,旧的标签依旧会存在。
# 将regex对source label中指定的标签名称进行匹配,而后将匹配到的标签的赋值给replacement字段指定的标签。
action: labelmap
...
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
4.2 测试验证
node_cpu_seconds_total{job="cmy-linux97-node-exporter-relabel_configs-labelmap"}
5.relabel_configs删除标签labeldrop案例
5.1 修改配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-relabel_configs-labelmap-labeldrop"
static_configs:
- targets: ["10.168.10.31:9100","10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
labels:
auther: cmy
blog: https://www.cnblogs.com/cmy
office: https://www.cmy.com/
relabel_configs:
- regex: "(job|auther)"
replacement: "${1}_cmy_labelmap_kubernetes"
action: labelmap
- regex: "(job|blog|office)"
# 删除regex匹配到的标签
action: labeldrop
...
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
5.2 验证测试
node_cpu_seconds_total{job_cmy_labelmap_kubernetes="cmy-linux97-node-exporter-relabel_configs-labelmap-labeldrop"}
6.metric_relabel_configs修改metric标签案例
6.1 测试案例
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-linux97-node-exporter-metric_relabel_configs-drop"
static_configs:
- targets: ["10.168.10.31:9100","10.168.10.41:9100","10.168.10.42:9100","10.168.10.43:9100"]
metric_relabel_configs:
- source_labels:
- __name__
regex: "node_cpu_.*"
action: drop
#target_label: "xixi"
#action: uppercase
- regex: "(id|pretty_name|version_codename)"
action: labeldrop
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# systemctl stop prometheus-server.service
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# rm -rf data/
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./prometheus
6.2 测试验证
node_cpu_seconds_total{job="cmy-linux97-node-exporter-metric_relabel_configs-drop"}
node_os_info{job="cmy-linux97-node-exporter-metric_relabel_configs-drop"}
node_os_info # 可以对比其他的job来进行比较,不难发现,其中'(id|pretty_name|version_codename)'标签都被删除啦!
10 自定义监控
10.1 pushgateway(临时)
pushgateway组件实现自定义监控直播人数案例
1.什么是pushgateway
说白了,就是自定义监控。
2.部署pushgateway
wget https://github.com/prometheus/pushgateway/releases/download/v1.11.1/pushgateway-1.11.1.linux-amd64.tar.gz
SVIP:
[root@prometheus-server32 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/pushgateway/pushgateway-1.11.1.linux-amd64.tar.gz
3.解压软件包
[root@prometheus-server32 ~]# tar xf pushgateway-1.11.1.linux-amd64.tar.gz -C /usr/local/bin/ pushgateway-1.11.1.linux-amd64/pushgateway --strip-components=1
[root@prometheus-server32 ~]#
[root@prometheus-server32 ~]# ll /usr/local/bin/pushgateway
-rwxr-xr-x 1 1001 1002 21394840 Apr 9 21:24 /usr/local/bin/pushgateway*
[root@prometheus-server32 ~]#
4.运行pushgateway
[root@node-exporter41 ~]# pushgateway --web.telemetry-path="/metrics" --web.listen-address=:9091 --persistence.file=/cmy/data/pushgateway.data
5.访问pushgateway的WebUI
http://10.168.10.32:9091/#
6.模拟直播在线人数统计
6.1 使用curl工具推送测试数据pushgateway
[root@node-exporter42 ~]# echo "student_online 35" | curl --data-binary @- http://10.168.10.32:9091/metrics/job/cmy_student/instance/10.168.10.42
6.2 pushgateway查询数据是否上传成功
[root@node-exporter43 ~]# curl -s http://10.168.10.32:9091/metrics | grep student_online
# TYPE student_online untyped
student_online{instance="10.168.10.42",job="cmy_student"} 35
[root@node-exporter43 ~]#
7.Prometheus server监控pushgateway
7.1 修改prometheus的配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy-pushgateway"
static_configs:
- targets:
- 10.168.10.32:9091
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
7.2 热加载配置
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST 10.168.10.31:9090/-/reload
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
7.3 验证配置是否生效
http://10.168.10.31:9090/targets?search=
7.4 查询特定指标
student_online
8.Grafana出图展示
略,见视频。
9.模拟直播间人数的变化
[root@node-exporter42 ~]# echo "student_online $RANDOM" | curl --data-binary @- http://10.168.10.32:9091/metrics/job/cmy_student/instance/10.168.10.42
[root@node-exporter42 ~]# echo "student_online $RANDOM" | curl --data-binary @- http://10.168.10.32:9091/metrics/job/cmy_student/instance/10.168.10.42
10.1.1 监控网站丢包率
cat network_loss_monitor.sh
#!/bin/bash
##############################################################
# File Name:network_loss_monitor.sh
# Version:V1.0
# Author: cmy
# Desc:
##############################################################
#!/bin/bash
#!/bin/bash
##############################################################
# File Name:network_loss_monitor.sh
# Version:V1.0
# Author: cmy
# Desc: 监控网络丢包率并推送到Pushgateway
##############################################################
# 目标地址(使用数组形式定义多个地址)
TARGETS=("8.8.8.8" "www.baidu.com" "www.cmy.com" "www.cmysoft.fun")
# Pushgateway地址
PUSHGATEWAY="http://10.168.10.32:9091"
# 获取主机名
HOSTNAME=$(hostname)
for target in "${TARGETS[@]}"; do
# 执行ping测试(发送10个包)
ping_result=$(ping -c 10 -W 1 "$target" 2>&1)
# 提取丢包率
packet_loss=$(echo "$ping_result" | grep 'packet loss' | awk -F' |%' '{print $6}')
# 如果没有丢包,packet_loss可能为空,设置为0
packet_loss=${packet_loss:-0}
# 推送到Pushgateway
cat <<EOF | curl --data-binary @- "$PUSHGATEWAY/metrics/job/network_monitor/instance/$HOSTNAME/target/$target"
# TYPE network_packet_loss gauge
network_packet_loss{target="$target",host="$HOSTNAME"} $packet_loss
EOF
# 可选:同时推送ping延迟
done
10.2 自定义exporter
1.使用python程序自定义exporter案例
1.1 安装pip3工具包
[root@prometheus-server33 ~]# apt update
[root@prometheus-server33 ~]# apt install -y python3-pip
1.2 安装实际环境中相关模块库
[root@prometheus-server33 ~]# pip3 install flask prometheus_client -i https://mirrors.aliyun.com/pypi/simple
1.3 编写代码
[root@prometheus-server33 ~]# cat flask_metric.py
#!/usr/bin/python3
# auther: Jason Yin
# blog: https://www.cnblogs.com/cmy/
from prometheus_client import start_http_server,Counter, Summary
from flask import Flask, jsonify
from wsgiref.simple_server import make_server
import time
app = Flask(__name__)
# Create a metric to track time spent and requests made
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
COUNTER_TIME = Counter("request_count", "Total request count of the host")
@app.route("/apps")
@REQUEST_TIME.time()
def requests_count():
COUNTER_TIME.inc()
return jsonify({"office": "https://www.cmy.com"},{"auther":"Jason Yin"})
if __name__ == "__main__":
print("启动test程序: cmy-linux-python-exporter, 访问路径: http://0.0.0.0:8001/apps,监控服务: http://0.0.0.0:8000")
start_http_server(8000)
httpd = make_server( '0.0.0.0', 8001, app )
httpd.serve_forever()
[root@prometheus-server33 ~]#
[root@prometheus-server33 ~]#
1.4 启动python程序
[root@prometheus-server33 ~]# python3 flask_metric.py
启动test程序: cmy-linux-python-exporter, 访问路径: http://0.0.0.0:8001/apps,监控服务: http://0.0.0.0:8000
1.5 客户端测试
[root@node-exporter43 ~]# cat cmy_curl_metrics.sh
#!/bin/bash
URL=http://10.168.10.33:8001/apps
while true;do
curl_num=$(( $RANDOM%50+1 ))
sleep_num=$(( $RANDOM%5+1 ))
for c_num in `seq $curl_num`;do
curl -s $URL &> /dev/null
done
sleep $sleep_num
done
[root@node-exporter43 ~]#
[root@node-exporter43 ~]# bash cmy_curl_metrics.sh
2.prometheus监控python自定义的exporter实战
2.1 编辑配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: "cmy_python_custom_metrics"
static_configs:
- targets:
- 10.168.10.33:8000
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
2.2 检查配置文件语法
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
2.3 重新加载配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# curl -X POST http://10.168.10.31:9090/-/reload
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
2.4 验证prometheus是否采集到数据
http://10.168.10.31:9090/targets
2.5 grafana作图展示
request_count_total
testapps请求总数。
increase(request_count_total{job="cmy_python_custom_metrics"}[1m])
test每分钟请求数量曲线QPS。
irate(request_count_total{job="cmy_python_custom_metrics"}[1m])
test每分钟请求量变化率曲线
request_processing_seconds_sum{job="cmy_python_custom_metrics"} / request_processing_seconds_count{job="cmy_python_custom_metrics"}
test每分钟请求处理平均耗时
11 黑盒监控
黑盒监控(Blackbox Monitoring)是Prometheus生态中一种重要的监控方式,它从外部视角检测系统的可用性和功能性,而不关心系统内部实现细节。
黑盒监控核心概念
- 与白盒监控对比
- 白盒监控:监控系统内部指标(如CPU、内存、请求量等)
- 黑盒监控:模拟外部用户行为检测系统功能是否正常
- 典型应用场景
1.blackbox-exporter概述
一般用于监控网站是否监控,端口是否存活,证书有效期等。
blackbox exporter支持基于HTTP, HTTPS, DNS, TCP, ICMP, gRPC协议来对目标节点进行监控。
比如基于http协议我们可以探测一个网站的返回状态码为200判读服务是否正常。
比如基于TCP协议我们可以探测一个主机端口是否监听。
比如基于ICMP协议来ping一个主机的连通性。
比如基于gRPC协议来调用接口并验证服务是否正常工作。
比如基于DNS协议可以来检测域名解析。
11.1 部署blackbox-exporter
1.blackbox-exporter概述
一般用于监控网站是否监控,端口是否存活,证书有效期等。
blackbox exporter支持基于HTTP, HTTPS, DNS, TCP, ICMP, gRPC协议来对目标节点进行监控。
比如基于http协议我们可以探测一个网站的返回状态码为200判读服务是否正常。
比如基于TCP协议我们可以探测一个主机端口是否监听。
比如基于ICMP协议来ping一个主机的连通性。
比如基于gRPC协议来调用接口并验证服务是否正常工作。
比如基于DNS协议可以来检测域名解析。
2.下载blackbox-exporter
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.26.0/blackbox_exporter-0.26.0.linux-amd64.tar.gz
SVIP:
[root@prometheus-server32 ~]# wget http://192.168.17.253/Resources/Prometheus/softwares/blackbox_exporter/blackbox_exporter-0.26.0.linux-amd64.tar.gz
3.解压软件包
[root@node-exporter41 ~]# tar xf blackbox_exporter-0.26.0.linux-amd64.tar.gz -C /usr/local/
[root@node-exporter41 ~]#
[root@node-exporter41 ~]# cd /usr/local/blackbox_exporter-0.26.0.linux-amd64/
[root@node-exporter41 blackbox_exporter-0.26.0.linux-amd64]# ll
total 23232
drwxr-xr-x 2 1001 1002 4096 Feb 26 20:31 ./
drwxr-xr-x 11 root root 4096 Mar 28 11:55 ../
-rwxr-xr-x 1 1001 1002 23758662 Feb 26 20:30 blackbox_exporter*
-rw-r--r-- 1 1001 1002 1209 Feb 26 20:31 blackbox.yml
-rw-r--r-- 1 1001 1002 11357 Feb 26 20:31 LICENSE
-rw-r--r-- 1 1001 1002 94 Feb 26 20:31 NOTICE
[root@node-exporter41 blackbox_exporter-0.26.0.linux-amd64]#
4.启动blackbox服务
[root@node-exporter41 blackbox_exporter-0.26.0.linux-amd64]# ./blackbox_exporter
5.访问blackbox的WebUI
http://10.168.10.32:9115/
6.访问测试
http://10.168.10.41:9115/probe?target=www.cmy.com&module=http_2xx
http://10.168.10.41:9115/probe?target=prometheus.io&module=http_2xx
- Prometheus server整合blackbox实现网站监控
1.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
# 指定作业的名称,生成环境中,通常是指一类业务的分组配置。
- job_name: 'cmy-blackbox-exporter-http'
# 修改访问路径,若不修改,默认值为"/metrics"
metrics_path: /probe
# 配置URL的相关参数
params:
# 此处表示使用的是blackbox的http模块,从而判断相应的返回状态码是否为200
module: [http_2xx]
school: [cmy]
# 静态配置,需要手动指定监控目标
static_configs:
# 需要监控的目标
- targets:
# 支持https协议
- https://www.cmy.com/
# 支持http协议
- http://10.168.10.41
# 支持http协议和自定义端口
- http://10.168.10.31:9090
# 对目标节点进行重新打标签配置
relabel_configs:
# 指定源标签,此处的"__address__"表示内置的标签,存储的是被监控目标的IP地址
- source_labels: [__address__]
# 指定目标标签,其实就是在"Endpoint"中加了一个target字段(用于指定监控目标),
target_label: __param_target
# 指定需要执行的动作,默认值为"replace",常用的动作有: replace, keep, and drop。
# 但官方支持十几种动作: https://prometheus.io/docs/prometheus/2.45/configuration/configuration/
# 将"__address__"传递给target字段。
action: replace
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
# 指定要替换的值,此处我指定为blackbox exporter的主机地址
replacement: 10.168.10.32:9115
[root@prometheus-server31 ~]#
2.热加载配置
[root@prometheus-server31 ~]# curl -X POST http://10.168.10.31:9090/-/reload
3.验证webUI
http://10.168.10.31:9090/targets?search=
4.导入grafana的模板ID
7587
13659
11.2 tcp+icmp
- prometheus基于blackbox的ICMP监控目标主机是否存活
1 修改Prometheus配置文件
[root@prometheus-server31 ~]# vim /cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus.yml
...
- job_name: 'cmy-blackbox-exporter-icmp'
metrics_path: /probe
params:
# 如果不指定模块,则默认类型为"http_2xx",不能乱写!乱写监控不到服务啦!
module: [icmp]
static_configs:
- targets:
- 10.168.10.41
- 10.168.10.42
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
# 指定注意的是,如果instance不修改,则instance和"__address__"的值相同
# target_label: ip
target_label: instance
- target_label: __address__
replacement: 10.168.10.32:9115
2 检查配置文件是否正确
[root@prometheus-server31 ~]# cd /cmy/softwares/prometheus-2.53.4.linux-amd64/
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ll
total 261348
drwxr-xr-x 5 1001 fwupd-refresh 4096 Mar 28 14:35 ./
drwxr-xr-x 3 root root 4096 Mar 26 09:45 ../
drwxr-xr-x 2 1001 fwupd-refresh 4096 Mar 18 23:05 console_libraries/
drwxr-xr-x 2 1001 fwupd-refresh 4096 Mar 18 23:05 consoles/
drwxr-xr-x 4 root root 4096 Mar 26 14:49 data/
-rw-r--r-- 1 1001 fwupd-refresh 11357 Mar 18 23:05 LICENSE
-rw-r--r-- 1 1001 fwupd-refresh 3773 Mar 18 23:05 NOTICE
-rwxr-xr-x 1 1001 fwupd-refresh 137836884 Mar 18 22:52 prometheus*
-rw-r--r-- 1 1001 fwupd-refresh 4858 Mar 28 14:35 prometheus.yml
-rw-r--r-- 1 root root 1205 Mar 27 10:05 prometheus.yml2025-03-26
-rw-r--r-- 1 root root 2386 Mar 28 10:06 prometheus.yml2025-03-27
-rwxr-xr-x 1 1001 fwupd-refresh 129719117 Mar 18 22:52 promtool*
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
3 重新加载配置
[root@prometheus-server31 ~]# curl -X POST http://10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
4 访问prometheus的WebUI
http://10.168.10.31:9090/targets
5 访问blackbox的WebUI
http://10.168.10.41:9115/
6 grafana过滤jobs数据
基于"cmy-blackbox-exporter-icmp"标签进行过滤。
- prometheus基于blackbox的TCP案例监控端口是否存活
1 修改Prometheus配置文件
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
- job_name: 'cmy-blackox-exporter-tcp'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets:
- 10.168.10.41:80
- 10.168.10.42:22
- 10.168.10.31:9090
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.168.10.32:9115
2 检查配置文件是否正确
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]#
3 重新加载配置文件
[root@prometheus-server31 ~]# curl -X POST http://10.168.10.31:9090/-/reload
[root@prometheus-server31 ~]#
4 访问prometheus的WebUI
http://10.168.10.31:9090/targets
5 访问blackbox exporter的WebUI
http://10.168.10.41:9115/
6 使用grafana查看数据
基于"cmy-blackbox-exporter-tcp"标签进行过滤。
12 黑白名单
12.1 node-exporter的黑白名单
- node-exporter的黑白名单
参考链接:
https://github.com/prometheus/node_exporter
1.停止服务
[root@node-exporter41 ~]# systemctl stop node-exporter.service
2.配置黑名单
[root@node-exporter41 ~]# cd /cmy/softwares/node_exporter-1.9.1.linux-amd64/
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]#
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]# ./node_exporter --no-collector.cpu
3.配置白名单
[root@node-exporter41 node_exporter-1.9.1.linux-amd64]# ./node_exporter --collector.disable-defaults --collector.cpu --collector.uname
温馨提示:
相关指标测试
node_cpu_seconds_total
----》 cpu
node_uname_info
----》 uname
12.2 Prometheus server实现黑白名单
- Prometheus server实现黑白名单
1.黑名单
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
38 - job_name: "cmy_k8s_exporter"
39 params:
40 exclude[]:
41 - cpu
42 static_configs:
43 - targets: ["10.168.10.42:9100"]
2.白名单
[root@prometheus-server31 prometheus-2.53.4.linux-amd64]# vim prometheus.yml
...
34 - job_name: "cmy_dba_exporter"
35 params:
36 collect[]:
37 - uname
38 static_configs:
39 - targets: ["10.168.10.41:9100"]
13 prometheus实现https访问
参考连接
二、Prometheus TLS加密认证和基于 basic_auth 用户名密码访问_prometheus basicauth-CSDN博客
1 创建认证用户
[root@node43 ~]# htpasswd -nBC 8 ''|tr -d ':\n'
New password:
Re-type new password:
$2y$12$6yR84yKSqoYv3B2D70QAOuqggT0QvdpMp1wUNfLwBo63oLYWc1AYy
[root@node43 ~]#
2 创建key和crt认证文件
openssl req -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout \ node_exporter.key -out node_exporter.crt \ -subj "/C=CN/ST=Beijing/L=Beijing/O=Moelove.info/CN=localhost"
3.新增 config.yml 文件,使用TLS及basic_auth
[root@VM_2-45 /usr/local/prometheus]# vim config.yml
cat config.yml
basic_auth_users:
admin: $2y$12$6yR84yKSqoYv3B2D70QAOuqggT0QvdpMp1wUNfLwBo63oLYWc1AYy
tls_server_config:
cert_file: node_exporter.crt
key_file: node_exporter.key
4.配置prometheus采集TLS自身指标
- job_name: "prometheus"
basic_auth:
username: admin
password: 123456
scheme: https
tls_config:
ca_file: node_exporter.crt
insecure_skip_verify: true
static_configs:
- targets: ["localhost:9090"]
5.加载配置文件
./prometheus --config.file=prometheus.yml --web.config.file=config.yml --web.listen-address=:9090 --web.enable-lifecycle
修改启动脚本
systemctl cat prometheus-server.service
# /etc/systemd/system/prometheus-server.service
[Unit]
Description=cmy Linux Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
ExecStart=/bin/bash -c "/cmy/softwares/prometheus-2.53.4.linux-amd64/prometheus --config.file=/cmy/softwares/prometheus-2.53.4.linux-amd64/pr>
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65535
[Install]
WantedBy=multi-user.target