当前位置:网站首页>Rgb+ depth image semantic segmentation paper reading notes (icra2021)
Rgb+ depth image semantic segmentation paper reading notes (icra2021)
2022-07-21 05:05:00 【Blue feather birds】
paper:Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
The main contributions of this paper are as follows :
Combined with depth image , Lifting only RGB Image segmentation mIOU
Design a mechanism , You can use tensorRT Realization , In turn, the NX Improve the efficiency of segmentation on the board , For example, robot scenes with limited computing power and battery capacity
The improved ResNet-based encoder and decoder. Reduce the amount of computation , Improve efficiency
The semantic segmentation of this paper combines RGB Images and depth images as input, Mainly for indoor scenes ,
RGB Images and depth maps are in encoder Enter two different branch, And then merge , The structure is as follows: :
It looks very complicated , In fact, it's not so complicated to decompose ,
Because if feature In more than one stage In the middle of fuse, The segmentation result will be improved , So you can see the depth map and RGB The features of the graph are in the middle stage There are RGB-D Fusion.
encode With ResNet As backbone, To improve efficiency , Did not like deepLabv3 Then convolute with holes , It's about using strided convolution.
stay encoder At the end of ,feature map Of size than input image narrow 32 times , It's using ResNet34,
But every one of them 3x3 The convolution is replaced by block, Namely the 3x3 It broke down into 3x1 and 1x3, Add ReLU In the middle ,
This module is called Non-Bottleneck-1D-Block, It is said that this can shorten the inference time , Improve performance.
RGB Fusion It's using Squeeze and Excitation modular . The details are shown in the green module in the lower left corner of the figure .
Context module Is the solution ResNet Limited receptive field problem , Make a difference scales Of feature combination , similar Pyramid Pooling Module in PSPNet,
And it was modified Avg Pooling, because TensorRT Only support fixed size Of pooling, So use and input resolution dependent pooling size replace adaptive pooling.
therefore , Depending on the data set ,pooling size It will be different .
Decoder
Yes 3 individual decoder modular
Transpose convolution is not used for up sampling , Because of the large amount of calculation , And will produce gridding artifacts, Here's the picture
This article uses learned upsampling Method , In the figure 1 In the dark green square ,
use NN upsampling To increase resolution, With another 3x3 depthwise Convolution layer to connect adjacent feature.
But some details will be lost after sampling , Because the details are encoder Of downsampling Will be lost ,
So the author designed skip connections, hold encoder The features of are connected to the same resolution Of decoder in ,
To make sure channel identical , It was used 1x1 Convolution ,
Doing so can restore some details .
After recovering to the ratio input image Small 4x After image of , With one 3x3 Convolution layer , Reuse 2 The upper sampling layer is restored to input image Of resolution.
General calculation loss Both use the result and ground truth Compare , In order to avoid using only the final result , Will be in every decoder The module outputs a result ,
And corresponding ground truth Calculate the scaled image loss, In this way, we can calculate on multiple scales loss.
Parameters
Training used 500 individual epoch, batch size by 8,
SGD Optimizer ,momentum=0.9, The learning rate is {0.00125, 0.0025, 0.005, 0.01, 0.02, 0.04}
Adam The learning rate is {0.0001, 0.0004}
weight decay by 0.0001,
The learning rate is pytorch Of one-cycle learning rate scheduler Adjust the
stay AGX Frame rate on
mIOU
You can see in this article ESANet Of mIOU stay mobile(SwiftNet, BiSeNet) He Fei mobile Between models ,
边栏推荐
- 如果在加密领域有段位,你是“青铜”还是“王者”?
- If:4+ iron metabolism and immune related gene markers predict clinical outcomes and molecular characteristics of triple negative breast cancer
- 二分图--
- Popular explanation: the difference between IAAs, PAAS and SaaS
- Why is CRM very important for enterprises? It's worth thinking after reading
- Summary of relevant operations on deploying Drupal website on the server
- keras MNIST手寫數字數據集數字識別
- 国内的边缘计算组织和产品调研
- Deep residual learning for image recognition -- RESNET classic paper
- unity 引用另一个类中的变量(自己实例)
猜你喜欢
RGB+深度图像 语义分割paper阅读笔记(ICRA2021)
Kubernetes deploys single node redis service
算法---判断子序列(Kotlin)
One dimensional convolution English film review emotion classification project
Deep parsing strings and memory functions
尚医通项目总结
Learn how to choose chart types, and Xiaobai can also play with data analysis
2022 年全球十大最佳自动化测试工具
Deep parsing of custom types
HCIP-8.OSPF的优化和拓展配置
随机推荐
Docker 学习笔记(十二)-- 部署Redis集群 实战
What does polardb for Postgres SQL mainly say?
mpf4_ Pricing European American barrier options_ CRR_ Leisen-Reimer_ Greeks_ Binary tree trigeminal tree grid_ Fine differences (explicit implicit) crank Nicolson_ Imp volatility
Deep parsing strings and memory functions
threeJS中dat.gui的使用显示文件夹点击时候及调色器
IP command usage guide
【云原生之kubernetes】kubernetes集群下初始化容器的使用方法
Unlock high scores | eBay deepens user experience and optimizes large screen device applications
C语言结构体柔型数组
【组队 PK 赛】积分商城已开启 | 即刻兑换专属好礼
拓扑排序-
一维卷积英语电影评论情感分类项目
LeetCode刷题--点滴记录016
Router link opens a new page Jump and a tag to prevent default jump and various attributes
About XML editing tools
Learn how to choose chart types, and Xiaobai can also play with data analysis
HCIP-8. OSPF optimization and expansion configuration
C# 反射与工厂模式
网络安全专业术语英文缩写对照表
mpf4_定价欧式美式障碍Options_CRR_Leisen-Reimer_Greeks_二叉树三叉树网格_Finite differences(显式隐式)Crank-Nicolson_Imp波动率