当前位置：网站首页>Intelligent operation and maintenance scenario analysis: how to detect abnormal business system status through exception detection

Intelligent operation and maintenance scenario analysis: how to detect abnormal business system status through exception detection

2022-07-22 11:08:00 【Cloud smart aiops community】

Usually , The business system is abnormal , The most direct 、 The most intuitive reflection is the abnormal fluctuation of key business indicators . Take the insurance industry as an example , When the business system is abnormal , The ability of the system to process insurance policies will be significantly reduced , Corresponding to the business indicator description , namely ： When there is a problem with the business system ,“ Policy volume ” There will be a drop .

Insert picture description here

How to judge correctly “ Policy volume ” There is a decline ？ The traditional way is to set a fixed threshold , for example ： Define under normal circumstances , The number of insurance policies that the system can handle per minute should be 200～600 Between . When the number of insurance policies monitored in real time exceeds the above threshold , That is, the number of insurance policies is considered abnormal . Fixed threshold alarm of traditional monitoring system , It is to generate alarm information by setting a fixed alarm threshold and comparing it with real data .

This logic seems to be OK on the surface , But think about it , Every morning , How many new insurance policies will be submitted to the system （ Suppose the insurance company only accepts domestic business ）？ obviously , Every morning 10 Point to 12 The number of new insurance policies submitted to the system between points is far more than the number of insurance policies submitted to the system every morning .

And so on , There is also a significant difference in the number of insurance policies processed by the business system on holidays and working days . If we make an in-depth analysis based on this logic , Will find , It is difficult for enterprises to use pre-set rules （ threshold ） To judge whether the policy volume index of the business system is abnormal .

In order to solve the above problems , Cloud wisdom DOCP Platform DOEM Digital operation and maintenance event management products adopt multi algorithm integrated learning mode , And introduce 3 A method of anomaly detection for sequential monitoring indicators ： Dynamic baseline 、 Year on year / Month on month and index anomaly detection .

Insert picture description here

Dynamic limit

Based on historical data , After deep learning with intelligent algorithm , Accurately predict the value of each time point in the future , Take the predicted value as the baseline , And by comparing the deviation between the actual value and the baseline （ Percentage difference ） To monitor and alarm .

Dynamic baseline is applicable to scenarios where a certain data index is known to change periodically and there is no way to give the exact value of each cycle or the data in the cycle changes too much . Take the business scenario of the insurance industry as an example , We study according to the historical insurance policy quantity , Identify the trend and periodic changes of historical data , Predict the changes in the number of insurance policies in the future . At the same time, according to the distribution of historical data , Give the changes of the upper and lower limits in the future . When the index to be tested is higher than the baseline and higher than the upper limit / Below the lower limit , That is, it is judged as abnormal . Monitoring found that the predicted actual value data is frequently less than the predicted data , We effectively detect this anomaly , And trace the root cause of the incident .

Insert picture description here

Same as / Month on month anomaly detection

It is used to find out whether the change trend of an indicator to be monitored continues to improve or deteriorate . Compare the target monitoring value with the distribution of historical data in the same period and the changes in the same month on month , Judge whether the new data is abnormal according to the value or percentage difference , And judge whether to alarm .

Insert picture description here

single / Multi index anomaly detection

In order to cope with the differentiated data characteristics of the wrong business model ,DOEM Unsupervised ensemble learning algorithm is used to detect index anomalies , There is no need to manually set a fixed threshold and define the baseline deviation , The system depends on different data characteristics , Choose different algorithms to do targeted detection , And make an overall evaluation of the abnormality , An alarm message is generated after automatically identifying the data that does not meet the expectations .

Cloud wisdom DOEM（Digital Operation Event Management Abbreviation ） Digital operation and maintenance event management products are oriented to technology and management , Focus on events , Realize the global control of the whole life cycle of problem events .DOEM Based on big data technology and machine learning algorithm , Unified access and processing of alarm messages and data indicators from various monitoring systems , Support the filtering of alarm events 、 notice 、 Respond to 、 Management 、 grading 、 Tracking and multidimensional analysis .DOEM The product is based on various algorithms such as dynamic baseline , It can realize the alarm convergence of events 、 Anomaly detection 、 Root cause analysis 、 Intelligent prediction , Help enterprises get through the data island , Unified operation and maintenance standards and management norms , Reduce transactional interference to operation and maintenance , Improve the overall management level of operation and maintenance .

Open source benefits

Cloud intelligence has become an open source data visualization platform FlyFish . By configuring the data model, it provides users with hundreds of visual graphics components , Zero coding can achieve a cool visual large screen that meets your business needs . meanwhile , Flying fish also provides flexible expansion ability , Support component development 、 Customize the configuration of functions and global events , Facing complex demand scenarios can ensure efficient development and delivery .

Click the address link below , Welcome to *FlyFish Like to send Star.

GitHub Address ： https://github.com/CloudWise-OpenSource/FlyFish

Gitee Address ：https://gitee.com/CloudWise/fly-fish

原网站

版权声明
本文为[Cloud smart aiops community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/203/202207211957328431.html