当前位置:网站首页>Apache Doris uses the Prometheus alertmanager module to send exception information to the nail alarm group
Apache Doris uses the Prometheus alertmanager module to send exception information to the nail alarm group
2022-07-21 03:04:00 【Zhangjiafeng】
Based on the environment
1.Prometheus edition :2.22.2
Download address :
https://github.com/prometheus/prometheus/releases/download/v2.22.2/prometheus-2.22.2.linux-amd64.tar.gz
2.Alertmanager edition :0.23
Download address :
https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
3.prometheus-webhook-dingtalk :1.4
Download address :
https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
1. New users and user groups
* groupadd prometheus* useradd -g prometheus -M -s /sbin/nologin prometheus
2. Installation configuration Prometheus Server
detailed Prometheus Please refer to :https://mp.weixin.qq.com/s/BcKN4s7qDokG_YmXn8Q-zQ It is necessary to ensure that after the service is started , Normal access :http://localhost:9090, And Doris Of metrics Already entered Prometheus in .
3. install AlertManager modular
3.1 Download installation package
* wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz* tar xf alertmanager-0.23.0.linux-amd64.tar.gz -C /soft* cd /soft* mv alertmanager-0.23.0.linux-amd64 alertmanager* cd alertmanager* mkdir data # System file persistence path * chown -R prometheus.prometheus /soft/alertmanager
explain : establish data The directory must be operated , Otherwise, start later alertmanager Will be abnormal .
3.2 To configure alertmanager.yml writing
route: group_by: ['alertname'] group_wait: 1s group_interval: 1m repeat_interval: 4h receiver: 'webhook2'receivers:- name: 'webhook2' webhook_configs: - &dingtalk_config send_resolved: true url: http://localhost:8060/dingtalk/webhook2/send#An inhibition rule mutes an alert (target) matching a set of matchers when an alert (source) exists that matches another set of matchers. Both target and source alerts must have the same label values for the label names in the equal list.inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
A term is used to explain :
- group_wait : Set the wait time , If you are currently within the waiting time Group New alarm received , These alarms will be combined into a notification to Receiver send out .
- group_interval : Used to define the same Group The time interval between sending alarm notifications .
- repeat_interval: Used to indicate when a rule is successfully sent , The second interval between sending the alarm repeatedly .
3.3 Check the configuration file configuration
Very important , It's about whether the program can start normally . When the following picture appears , explain alertmanager.yml The file configuration is normal .
* cd /soft/alertmanager* ./amtool check-config ./alertmanager.yml
3.4 establish Alertmanager System service startup file
* vim /usr/lib/systemd/system/alertmanager.service[Unit]Description=alertmanagerDocumentation=https://prometheus.io/After=network.target[Service]User=prometheusGroup=prometheusExecStart=/soft/alertmanager/alertmanager --config.file=/soft/alertmanager/alertmanager.yml --storage.path=/soft/alertmanager/dataRestart=on-failure[Install]WantedBy=multi-user.target
3.5 Start the service
systemctl daemon-reloadsystemctl enable alertmanager.servicesystemctl start alertmanager.servicesystemctl status alertmanager.service # View service status systemctl restart alertmanager.service # Restart the service to use
3.6 After starting the service
After the service starts , Can be in http://localhost:9093 see ui Interface
3.7 stay Prometheus Middle configuration Alertmanager modular Need to be in Prometheus Under the installation directory of prometheus.yml Add... To the document Alertmanager Communication address and port number , Simultaneous configuration alert rules Catalog , This directory mainly stores the configured alarm rules .
vim prometheus.yml alerting: alertmanagers: - static_configs: - targets: ['localhost:9093'] # According to the fact alertmanager Service address configuration .rule_files: - "rule/*.yml" # Custom rule storage directory , Can pass *.yml To configure all rules under this directory .
3.8 Reload Prometheus The configuration file , see alerting Whether the configuration is effective
Prometheus The service supports hot loading . When the configuration file changes , You can reload the configuration file with the following command :
* ./promtool check config prometheus.yml* systemctl reload prometheus.service
After a successful restart , Can pass :http://localhost:9090/config see alerting Whether the configuration is effective .
4. install prometheus-webhook-dingtalk plug-in unit
4.1 Download plug-ins
* wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz* tar -xf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz -C /soft* mv prometheus-webhook-dingtalk-1.4.0.linux-amd64 prometheus-webhook-dingtalk
4.2 Configure robots in the nail group When configuring robots , You need to add custom keywords , This keyword is used later when configuring rule alarm information , It also needs to appear , Otherwise, the alarm cannot be sent .
4.3 Repair the configuration file stay webhook2 Configure the address of the robot just applied in , What you need to pay attention to is where your robot path is configured , that alertmanager.yml Medium url The path will also change .
* cp config.example.yml config.yml## Request timeout# timeout: 5s## Customizable templates path Custom template location templates: - /soft/alertmanager/alarm_template/webhook.tmpl## You can also override default template using `default_message`## The following example to use the 'legacy' template from v0.3.0# default_message:# title: '{{ template "legacy.title" . }}'# text: '{{ template "legacy.content" . }}'## Targets, previously was known as "profiles"targets: webhook1: # Countersigned nail , You need to configure the key information of the signing robot at the same time url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx # secret for signature secret: SEC000000000000000000000 webhook2: # No sign and nail url: https://oapi.dingtalk.com/robot/send?access_token=cf9c2fd69723661108b7fd7****** webhook_legacy: url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx # Customize template content message: # Use legacy template title: '{{ template "legacy.title" . }}' text: '{{ template "legacy.content" . }}' webhook_mention_all: #@ Everyone nails url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx mention: all: true webhook_mention_users: #@ Specify user nails url: https://oapi.dingtalk.com/robot/send?access_token=cf9c2fd69723661108b7fd7**** mention: mobiles: ['152****30', '134****74']
4.4 establish webhook-dingtalk System service startup file
vim /usr/lib/systemd/system/webhook-dingtalk.service[Unit][Unit]Description=prometheus-webhook-dingtalkDocumentation=https://github.com/timonwong/prometheus-webhook-dingtalkAfter=network.target[Service]User=prometheusGroup=prometheusExecStart=/soft/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/soft/prometheus-webhook-dingtalk/config.ymlRestart=on-failure[Install]WantedBy=multi-user.target
4.5 Start the service
systemctl daemon-reloadsystemctl enable webhook-dingtalk.servicesystemctl start webhook-dingtalk.servicesystemctl status webhook-dingtalk.servicesystemctl restart webhook-dingtalk.service
4.6 see webhook-dingtalk Service status
5. Configure alarm rules
stay Prometheus Install under directory prometheus.yml View in file “rule_files” The directory address corresponding to the configuration , Create a new alarm rule under this directory . If you don't know how to configure rules , You can install Prometheus Of UI Address :http://localhost:9090/graph Check out metrics Information .example:
5.1 To configure Doris fe and be Alarm rules When Doris instance When it is normal ,up ==1; When instance down Behind the plane ,up ==0 Be careful : When configuring alarm rules , Alarm robot customized keywords , In the configuration rule file description Must appear in , Otherwise, the nail group cannot receive the alarm information .
* vim doris_instance.ymlgroups: - name: doris_instance_down rules: - alert: Doris Backends Down expr: up {group="be", job="pro-doris"} == 0 for: 20s labels: user: doris severity: error annotations: summary: "doris Instance {{ $labels.instance }} down" description: "doris {{ $labels.instance }} of job {{ $labels.job }} has been down for more than 20s." - alert: Doris Frontends Down expr: up {group="fe", job="pro-doris"} == 0 for: 20s labels: user: doris severity: error annotations: summary: "doris Instance {{ $labels.instance }} down" description: "doris {{ $labels.instance }} of job {{ $labels.job }} has been down for more than 20s."
5.2 Check rule file If the rule file is “SUCCESS” state , Then the rule file configuration is correct , Otherwise, you need to check the configuration file corresponding to the configuration .
* cd /soft/prometheus* ./promtool check config prometheus.yml
5.3 Reload Prometheus The configuration file
* systemctl reload prometheus.service
6. test
When Doris In the cluster instance appear down Behind the plane , The newly built robot in the nail group will send an alarm message .
When in alertmanager.yml The file is configured with send_resolved = true, Exception removal will also send a nail warning message , Otherwise, do not send .
thus ,Prometheus monitor Doris Abnormal information and send an alarm to the end of the nailing process
边栏推荐
- 国外域名能注册吗?
- Understand and apply continuous deployment Argo CD
- 使用renren-generator逆向生成CRUD代码
- CVPR 2020 | Social-STGCNN:基于图卷积的行人轨迹预测
- Technical dry goods | average surface distance of image segmentation based on mindspire
- 想请问一下我把在ecs上自建的mysql数据库的数据同步到MC中,使用binlog的方式同步,制定
- [wechat applet] text field input with maximum word limit (1/100)
- cnvd_ 2019_ twenty-two thousand two hundred and thirty-eight
- 三极管原理
- Technical dry goods | mindspire self-developed high-order optimizer source code analysis and practical application
猜你喜欢
[Muduo socket] InetAddress encapsulates the socket address type
TASK02|EDA初体验
Deep learning 1-perceptron
【科学文献计量】中英文文献标题及摘要分词字数与频数统计与可视化
Qualcomm and MTK customized modification method for national WiFi channel
HMS core machine learning service creates a new "sound" state of simultaneous interpreting translation, and AI makes international exchanges smoother
How to delete different text in Excel spreadsheet in batch?
什么是技术支持? | 每日趣闻
About Variables
Jenkins自动化部署
随机推荐
Principle of triode
Doris connector and Flink CDC realize accurate access to MySQL database and table exactly once
How to set percentage color for Excel data bar? Excel data bar auto fill color by percentage tutorial
Technical dry goods | solve 80% of the problems in the interview, and realize auc/roc based on mindspire
DOM之事件对象
解决mysql5.6数据库Specified key was too long; max key length is 767 bytes问题
Apache Doris ODBC Mysql外表在Ubuntu下使用方法及配置
Jenkins Automated Deployment
The week of the year corresponding to the return date
Common environment configuration nouns in development -dev, sit, pro, FAC, etc
How to convert Excel to word format? Method of converting Excel to word format
Swagger重点配置项
Highlight first! 2022 open atom global open source summit is scheduled to be held in Beijing on July 25-29
ZigBee safety overview
DNP3 simulator tutorial
[scientific literature measurement] keyword mining and visualization
Release test
Cmake basic grammar and practical project analysis
消息队列(MQ)
STL list constructor, size