当前位置:网站首页>Ncnn OP forward code learning
Ncnn OP forward code learning
2022-07-21 11:16:00 【Early lunar month in Pingqiu】
OpenMP Supported programming languages :C、C++ and Fortran; Support OpenMp The compiler includes Visual studio,Sun Compiler,GNU Compiler and Intel Compiler, Clang. Specific view :https://www.openmp.org/resources/openmp-compilers-tools/
OpenMP One of the most powerful functions : Based on the source code of serial program , Just make a few changes , You can parallelize many serial for loop , Achieve the effect of significantly improving performance .
1. AbsVal: There are two interfaces ,forward and forward_inplace, The difference is inplace Replace in place , No additional memory space will be requested , In the specific operation , stay channel Dimension do pragma omp parallel for Parallel acceleration , if bottom value Being positive , be top value = bottom value, Otherwise, reverse .
2. ArgMax: There are two output modes ,1). Output only max index 2). Output max index At the same time , It will also output the index Corresponding val. Which output method is used , adopt load_param(FILE* paramfp) Interface out_max_val Parameters to determine , if out_max_val It's true , Then for 2) Mode output , Before output topk The value of is max index, after topk The values are max val; Otherwise, it is the way 1 Output .
int nscan = fscanf(paramfp, "%d %d", &out_max_val, &topk);
It is worth mentioning that internal targeting topk Sort ,ncnn Interface std::partial_sort Interface .
std::partial_sort(vec.begin(), vec.begin() + topk, vec.end(), std::greater< std::pair<float, int> >());
3. BatchNorm: BatchNorm There are also two interfaces ,forward and forward_inplace, Don't over elaborate . The core optimization is to calculate all the preset calculations during initialization , In practice forward Reduce the amount of calculation .
#pragma omp parallel for
for(int q = 0; q < channels; q++){
const float* ptr = bottom_blob.channel(q);
float* outptr = top_blob.channel(q);
float a = a_data_ptr[q];
float b = b_data_ptr[q];
for(int i=0; i<size; i++){
outptr[i] = b*ptr[i] + a;
}
边栏推荐
猜你喜欢
李宏毅老师2020年深度学习系列讲座笔记7
pycharm配置
Debezium grabs data from Oracle to Kafka
JS-----第二章 js逻辑控制
redis cluster搭建
Network Security Learning (x) simple test process of penetration
Li Hongyi 2020 machine learning deep learning notes 1+2 & deep learning foundation and practice course notes 2
Error when wmware enables virtualization function
Separable Convolution可分离卷积
李宏毅2020机器学习深度学习笔记1+2 &&深度学习基础与实践课程笔记2
随机推荐
网站资源
21_生命周期
stm32 栈的大小问题
RichTextbox 操作
Kubevirt manages virtual machines
bug汇总
PointRend解析
Force deduction ----- how many numbers are smaller than the current number
【英雄哥七月集训】第 20天:搜索二叉树
Musk: I uploaded my brain to the cloud. Sorry, 404
Understand the secondary node of industrial Internet identity analysis
2021/7/16 the first step of learning scattering Network - Introduction to neural network
Apprentissage de la sécurité des réseaux (vii) IIS
短信验证
Network Security Learning (IX) comprehensive experiment & PKI
进程/线程同步机制
关于开放封闭原则
氢创未来,中国氢能联盟举办2022氢能专精特新创业大赛启动仪式
通过detectron2学习AI
【蓝桥杯基础训练】十六进制转八进制