当前位置:网站首页>[cann training camp] cann training camp_ Shengteng AI interesting application realizes AI interesting application (Part 2) essay
[cann training camp] cann training camp_ Shengteng AI interesting application realizes AI interesting application (Part 2) essay
2022-07-21 03:48:00 【Tianyi Li 1997】
Take on the above 《【CANN Training camp 】CANN Training camp _ Rise AI Interesting application implementation AI Interesting application ( On ) essays 》, Let's go on to analyze .
Let's first introduce npu-smi
Tools , Its function is similar to NVIDIA nvidia-smi
Are used to view hardware status and information , The difference is nvidia-smi
It is used to view the information of the graphics card ,npu-smi
It is used to view the information of the Pentium processor , Here is what we use Ascend 310.
With this command, you can view the information of the Pentium processor in real time , Data monitoring . Remember we left an open question last time ? Is to optimize and accelerate , Because before, in addition to model processing , The rest of the operation is basically using OpenCV Etc CPU complete , Here we have a lot to accelerate , Some operations will use dedicated hardware circuits to speed up processing , So as to optimize the performance , Let's introduce one by one .
DVPP
AIPP
AIPP yes DVPP To complement and perfect , Better complete data processing , Achieve performance acceleration .
Optimize and improve
Optimize and improve , In essence , It is based on the realization of basic functions , Maximize hardware computing power , So as to improve the performance . On the one hand, it is necessary to do the operation of affinity hardware , This requires familiarity with hardware features , The other is to find the current performance bottleneck , Targeted solutions to bottlenecks , This is an efficient working method , Small ones , Grasp the main contradiction , Don't pay too much attention to the details .
From Ben Demo, It mainly analyzes data preprocessing , The performance problems of model reasoning and post-processing , Optimize and improve respectively , Here's an analysis .
Data preprocessing
So much for that , Let's see how to speed up our project , The specific introduction is shown in the figure below :
As shown in the above figure, the orange and yellow parts , Is that we can use DVPP or AIPP To speed up .
Actually , Is to analyze the whole reasoning process , See which operations can be replaced by dedicated hardware , Dedicated hardware circuits are better than CPU Achieve more efficient and fast , To speed up .
Thanks to the acllite Do the packaging , Use DVPP and AIPP It's relatively simple to accelerate , Take a look at the following diagram :
If you want to know more details and the underlying interface , You can go to the official warehouse to investigate the case acllite Code and examples , We use it directly here .
post-processing
Post processing can fix the size of the output picture , So as to achieve performance improvement , Because the dynamic is too flexible , There will be a loss of performance . Actually , We can also make post-processing into a single operator + DVPP The operation of , Replace the current software operation with hardware , It should have a great performance improvement .
Model reasoning
The front is all about data processing , The model reasoning itself has not been optimized , Let's take a look at optimizing model reasoning , Then we need to correct the model “ Lay hands on ” 了 . At present, the main method is through AMCT quantitative , Its essence is to reduce the amount of calculation , From our simple thoughts , There is less to calculate , The performance is naturally improved .
The specific operation steps are as follows , But it has little effect on this model , Not to be considered .
Other tuning methods
- Tuning tuning
- AOE tuning
- many Batch
Simply speaking , It's reasoning about multiple pictures at one time , Compared with before , Only one picture at a time , We reason more at once , Generally speaking, it can improve performance . But not necessarily , Because if the model itself is reasoning a piece , It has occupied a lot of resources , many Batch It may not improve much , It may even be because the amount of data increases , Bring additional loss of data handling or segmentation , It's not worth it , But generally there will be gains , It's worth a try .
- Multithreading
Multithreading acceleration is widely used , It has a wide range of applications , It also applies here , Generally, when the hardware computing power is not brought into full play , such as NPU Of AI Core The utilization rate of has been 20% about , We can consider turning on Multithreading acceleration , Improve AI Core Utilization ratio , To improve performance .
The following figure shows the acceleration idea of multi-threaded processing of video files :
Conclusion
in general , Tuning is to maximize or squeeze the performance of hardware , Make the hardware utilization rate very high , Improve performance . Tuning requires specific analysis of specific problems , Ideas are the same , But the specific methods are very different , Accumulate more , Analyze more , Keep records and communicate frequently , To gain .
边栏推荐
- [cann training camp] AI CPU operator development based on shengteng cann platform
- (6) Pytorch deep learning: logistic regression (multi-layer and multi-dimensional feature input)
- Analysis of KL divergence and cross entropy
- Qcombobox in pyqt5 realizes multi selection function
- 电脑端微信有很多垃圾可以清理
- Use of Dameng DTS tool
- Li Hongyi machine learning 2020---p12 brief introduction of DL & p15 why DL
- (4) Pyqt5 series tutorials: use pychart to design the internal logic of pyqt5 in the serial port assistant parameter options (I)
- TensorFlow v1 入门教程
- (四)PyQt5系列教程:使用Pycharm对PyQt5在串口助手参数选项进行内部逻辑设计(一)
猜你喜欢
Netcat simple gadget simulates client / server
[application course of the first cann training camp advanced class in 2022] additional question - media data processing + model reasoning
Deployment of Dameng DEM
Tensorflow 1.x 和 Pytorch 中 Conv2d Padding的区别
MIMO-OFDM無線通信技術及MATLAB實現(2)-SISO下的室外信道模型
【2022年第一期 CANN训练营进阶班模型课】第一次大作业和附加内容
论文学习---Resource allocation in EE URLLC in Relay System
Xiaobai tutorial -- Anaconda's jupyter notebook automatic completion configuration tutorial
Five basic data types of redis (super detailed)
(二)PyTorch深度学习:梯度下降
随机推荐
Analyze the relationship between iteration, epoch and batchsize
(1) Pytorch deep learning: linear model training
详解pytorch fold和unfold用法
基于点云的深度学习方法综述
辨析Iteration、epoch及batchsize之间的关系
pytorch实现手写数字识别 | MNIST数据集(全连接神经网络)
Opencv系列教程(一):Opencv读取指定文件夹图片、视频,调用摄像头
(4) Pytorch deep learning: pyttorch realizes linear regression
Find a number between two Fibonacci series
Task scheduling: common types and tools
MIMO-OFDM无线通信技术及MATLAB实现读书笔记-衰落信道&室内信道(1)
Li Hongyi 2020 machine learning notes -- P10 classification
Thesis study ---- urllc benefit from noma (1)
[cann training camp] AI CPU operator development based on shengteng cann platform
Neural networks: a review of 2D target detection
qlineargradient中坐标的含义
Data warehouse products
Data warehouse OLAP OLTP modeling method
论文学习---Resource allocation in EE URLLC in Relay System
玩转CANN目标检测与识别一站式方案【介绍篇】