当前位置:网站首页>Analysis of common activation functions
Analysis of common activation functions
2022-07-20 09:33:00 【Early lunar month in Pingqiu】
sigmoid and ReLU
s i g m o i d ( x ) = 1 1 + e − x sigmoid(x) = \frac{1}{1+e^{-x}} sigmoid(x)=1+e−x1
s i g m o i d sigmoid sigmoid The problem with activating functions is that as the input approaches ± ∞ \pm\infty ±∞ when , The gradient will quickly become 0, When gradient returns , Shallow parameters cannot be effectively updated .
R e L U ( x ) = m a x ( 0 , x ) ReLU(x) = max(0, x) ReLU(x)=max(0,x)
R e L U ReLU ReLU stay x>0 when , Gradient constant 1, There will be no gradient vanishing . stay x<0 when , The gradient of 0, No reverse transmission , Can be similar d r o p o u t dropout dropout Introduce more nonlinearity . After adding the model , The stability and effect of training are better than s i g m o i d sigmoid sigmoid.
ReLU and ReLU6
R e L U 6 ( x ) = m i n ( 6 , m a x ( 0 , x ) ) ReLU6(x) = min(6, max(0, x)) ReLU6(x)=min(6,max(0,x))
Limit R e L U ReLU ReLU The maximum output of does not exceed 6, It can enhance the small model on the end , Robustness in low precision reasoning .
sigmoid and hard sigmoid
h a r d _ s i g m o i d ( x ) = ( R e L U ( x ) + 3 ) / 6 hard\_sigmoid(x) = (ReLU(x) + 3)/6 hard_sigmoid(x)=(ReLU(x)+3)/6
Approximatable s i g m o i d sigmoid sigmoid function , Less computation .
swish and hard swish
s w i s h ( x ) = x ⋅ s i g m o i d ( β x ) swish(x) = x\cdot sigmoid(\beta x) swish(x)=x⋅sigmoid(βx)
h a r d _ s w i s h ( x ) = x ⋅ ( R e L U 6 ( x ) + 3 ) / 6 hard\_swish(x) = x \cdot (ReLU6(x) + 3)/6 hard_swish(x)=x⋅(ReLU6(x)+3)/6
s w i s h swish swish Medium s i g m o i d sigmoid sigmoid operation , The calculation amount on the end is too heavy , So use h a r d _ s i g m o i d hard\_sigmoid hard_sigmoid To approximate . h _ s w i s h h\_swish h_swish The activation operation is in m o b i l e n e t v 3 mobilenetv3 mobilenetv3 Is used in .
边栏推荐
- SSD theory
- 论文解读《Semi-supervised Semantic Segmentation with Error Localization Network》
- 【CVPR2020】文章、代码和数据链接
- 【A800-9000】【MindSpore Ascend 910版本】安装后官方mindspore测试程序报错
- 【CVPR2021】文章、代码和数据链接
- SPIN流程
- 【Mindspore学习】【多标签分类】如何构建图像的多标签mindRecord格式数据集
- mindspore如何查看模型参数量?
- JS object addition_ JS object array attribute the same values are merged and added
- Mindspore running model_ Bert Thor in zoo, outputs are unreasonable
猜你喜欢
论文解读《PScL-HDeep:基于图像的蛋白质利用集成在人体组织中的亚细胞预测定位》
[mindspore] [installation] there is no available ascend 910 AI processor software package
Redis詳解(1)前言
【mindspore】【import erro】 undefined symbol _Z14DlogErrorInneriPK
一分钟掌握卡诺图化简法
Instructions for torch use
Samba的搭建
Openstack使用Dashboard进行镜像操作、管理实例等其他操作
Install Oracle 11g and build database based on Linux operating system (centos7x without graphics)
Absolute position information of the target in the image (1) -- how much position information do revolutionary neural networks encode
随机推荐
上采样和上卷积的区别
Construction of virtual host (multiple sites)
New urlsearchparams() the built-in object gets the parameters of the address bar and gets the value by means of keys
无监督特征学习的数据集简介
yolov2
jetson nano安装ros心得体会(失败)
ashx aspx
H5 page export to PDF file
让外国人我哥他
【Mindspore】【安装】无可用的Ascend 910 AI处理器软件配套包
inception系列
记录一下jetson xaiver 连接51串口
FPGA八股文(2)——笔试的FPGA问题汇总(持续更新)
基于linux操作系统安装oracle 11g及建库(centos7x无图形)
【Mindspore学习】【多标签分类】如何构建图像的多标签mindRecord格式数据集
How does mindspore view the model parameter quantity?
RAC deployment of centos7x Oracle 11g (no graphical installation)
SPIN流程
Pop up window at the bottom of uniapp applet
[cvpr2021] article, code and data link