当前位置：网站首页>[cann training camp] cann training camp_ Shengteng AI interesting application realizes AI interesting application (Part 2) essay

[cann training camp] cann training camp_ Shengteng AI interesting application realizes AI interesting application (Part 2) essay

2022-07-21 03:48:00 【Tianyi Li 1997】

Take on the above 《【CANN Training camp 】CANN Training camp _ Rise AI Interesting application implementation AI Interesting application （ On ） essays 》, Let's go on to analyze .

Let's first introduce npu-smi Tools , Its function is similar to NVIDIA nvidia-smi Are used to view hardware status and information , The difference is nvidia-smi It is used to view the information of the graphics card ,npu-smi It is used to view the information of the Pentium processor , Here is what we use Ascend 310.

With this command, you can view the information of the Pentium processor in real time , Data monitoring . Remember we left an open question last time ？ Is to optimize and accelerate , Because before, in addition to model processing , The rest of the operation is basically using OpenCV Etc CPU complete , Here we have a lot to accelerate , Some operations will use dedicated hardware circuits to speed up processing , So as to optimize the performance , Let's introduce one by one .

DVPP

AIPP

AIPP yes DVPP To complement and perfect , Better complete data processing , Achieve performance acceleration .

Optimize and improve

Optimize and improve , In essence , It is based on the realization of basic functions , Maximize hardware computing power , So as to improve the performance . On the one hand, it is necessary to do the operation of affinity hardware , This requires familiarity with hardware features , The other is to find the current performance bottleneck , Targeted solutions to bottlenecks , This is an efficient working method , Small ones , Grasp the main contradiction , Don't pay too much attention to the details .

From Ben Demo, It mainly analyzes data preprocessing , The performance problems of model reasoning and post-processing , Optimize and improve respectively , Here's an analysis .

Data preprocessing

So much for that , Let's see how to speed up our project , The specific introduction is shown in the figure below ：

As shown in the above figure, the orange and yellow parts , Is that we can use DVPP or AIPP To speed up .

Actually , Is to analyze the whole reasoning process , See which operations can be replaced by dedicated hardware , Dedicated hardware circuits are better than CPU Achieve more efficient and fast , To speed up .

Thanks to the acllite Do the packaging , Use DVPP and AIPP It's relatively simple to accelerate , Take a look at the following diagram ：

If you want to know more details and the underlying interface , You can go to the official warehouse to investigate the case acllite Code and examples , We use it directly here .

post-processing

Post processing can fix the size of the output picture , So as to achieve performance improvement , Because the dynamic is too flexible , There will be a loss of performance . Actually , We can also make post-processing into a single operator + DVPP The operation of , Replace the current software operation with hardware , It should have a great performance improvement .

Model reasoning

The front is all about data processing , The model reasoning itself has not been optimized , Let's take a look at optimizing model reasoning , Then we need to correct the model “ Lay hands on ” 了 . At present, the main method is through AMCT quantitative , Its essence is to reduce the amount of calculation , From our simple thoughts , There is less to calculate , The performance is naturally improved .

The specific operation steps are as follows , But it has little effect on this model , Not to be considered .

Other tuning methods

Tuning tuning

AOE tuning

many Batch

Simply speaking , It's reasoning about multiple pictures at one time , Compared with before , Only one picture at a time , We reason more at once , Generally speaking, it can improve performance . But not necessarily , Because if the model itself is reasoning a piece , It has occupied a lot of resources , many Batch It may not improve much , It may even be because the amount of data increases , Bring additional loss of data handling or segmentation , It's not worth it , But generally there will be gains , It's worth a try .

Multithreading

Multithreading acceleration is widely used , It has a wide range of applications , It also applies here , Generally, when the hardware computing power is not brought into full play , such as NPU Of AI Core The utilization rate of has been 20% about , We can consider turning on Multithreading acceleration , Improve AI Core Utilization ratio , To improve performance .

The following figure shows the acceleration idea of multi-threaded processing of video files ：

Conclusion

in general , Tuning is to maximize or squeeze the performance of hardware , Make the hardware utilization rate very high , Improve performance . Tuning requires specific analysis of specific problems , Ideas are the same , But the specific methods are very different , Accumulate more , Analyze more , Keep records and communicate frequently , To gain .

原网站

版权声明
本文为[Tianyi Li 1997]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/202/202207200530024001.html