当前位置:网站首页>Deep and Direct Visual SLAM 深度、直接法SLAM
Deep and Direct Visual SLAM 深度、直接法SLAM
2022-07-20 14:47:00 【tian.z】
Deep and Direct Visual SLAM
摘要:Daniel Cremers 关于深度网络和直接法SLAM的报告演讲,演讲时间2021年10月。这次展示主要探讨了三个主题:直接法SLAM的介绍和优势及应用;使用深度网络从单张图片中估计深度并应用于单目视觉SLAM,端到端地估计两帧间位姿等信息;单目稠密重构。其介绍的大部分内容来自主讲人课题组的文章,在文末已经列出。
Direct Visual SLAM
Keypoint-Based frontend method is bound to be suboptimal for many
reasons:
- throw away potentially valuable brightness information from sensor.
You are not working on the raw sensory data. So invariably from a
statistical (say bayesian inference) point of view, this solution
will never be optimal. You create an intermediate abstraction and at
that point you throw away potentially valuable information - Any mistake you make in assigning correspondence will propagate and
deterioate your reconstructions.
In direct method, we don’t minimize a geometric reprojection error of
points in the image but we minimize a photometric color consistency
error to infer camera motion and 3D map.
Works:
- LSD-SLAM[1]
- DSO[2]
loss function:
min ξ ∈ R 6 ∫ Ω ∣ I K F ( x ) − I ( π ( g ξ ( u ⋅ x ) ) ) ∣ d x \min_{\xi\in\mathbb R^6} \int_\Omega \lvert I_{KF}(x) - I(\pi(g_\xi(u\cdot x)) ) \rvert dx ξ∈R6min∫Ω∣IKF(x)−I(π(gξ(u⋅x)))∣dx
each pixel x x x in the keyframe, the brightness of the key frame
I K F I_{KF} IKF should be the same as the brightness of corresponding point in new image I I I.
Deep Visual SLAM
Using deep learning approch to predict depth of pixels in single image,
and find a reconstruction s.t. the depth maps for each keyframe are
consistent with the deep net predictions.
This method is semi-supervised or self-supervised in the sense that you
basically take the second camera only in training, you predict depth so
that it’s consistent with the second camera intensities, and in
application you only need one camera.[3]
Applying deep learning both in the frontend tracking in terms of
non-linear factor graphs and also in the backend optimization where we
expand the classical loss function by additional terms that assure
consistency with these predictions. Deep Depth, Deep Pose, Deep
Uncertainty.[4]
Suppose two consecutive images It and It′
as the input of Pose Net, the out put be the transpose
Tt′t, use the brightness consistency as the
loss function of self-supervised learning:
L s e l f = r ( I t , I t ′ − t ) L_{self} = r(I_t, I_{t'-t}) Lself=r(It,It′−t)
To deal with vary of aperture, exposure, warped images, train a network
to compensate with an affine brightness transformation is also predicted
by this network:
⇒ L s e l f = r ( a t t ′ I t + b t t ′ , I t ′ − t ) \Rightarrow L_{self} = r({\color{lightgreen}a_t^{t'}} I_t + {\color{lightgreen}b_t^{t'}}, I_{t'-t}) ⇒Lself=r(att′It+btt′,It′−t)
It’s difficult to model all the phenomena (moving objects,
glass/metallic structure) in real word. One solution is to down weight
areas in the residual where the brightness is not likely preserved. This
is called aleutoric uncertainty that we can also perdict by the deep
network.
⇒ L s e l f = r ( a t t ′ I t + b t t ′ , I t ′ − t ) Σ t + log Σ t \Rightarrow L_{self} = \frac{r({\color{lightgreen}a_t^{t'}} I_t + {\color{lightgreen}b_t^{t'}}, I_{t'-t})}{\color{red}\Sigma_t} + \log{\color{red}\Sigma_t} ⇒Lself=Σtr(att′It+btt′,It′−t)+logΣt
It tells us how likely is the brightness preserved or not. And then with
a gaussian distribution we can down weight the residuals. To make sure
that not everything is down weighted we add log term in behind.
(最后的log项是概率分布取对数似然的结果,详见[5])
Mono Dense reconstruction
MonoRec(Felix Wimbauer et al.): A neural network to generate a dense
reconstruction, the network predicts depth not for a single frame but
for a sequence of consecutive frames so that we can exploit the
brightness consistency across frames for the prediction.
参考文献
[1] ENGEL J. SCHPS T. CREMERS D. LSD-SLAM:
Large-scale direct monocular SLAM[J]. Springer, Cham, 2014.
[2] ENGEL J. KOLTUN V. CREMERS D. Direct sparse
odometry[J]. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2017, 40(3): 611-625.
[3] YANG N. WANG R. STUCKLER J. 等. Deep Virtual
Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct
Sparse Odometry[C/OL]//Proceedings of the European Conference on
Computer Vision (ECCV). 2018: 817-833[2022-07-10].
https://openaccess.thecvf.com/content_ECCV_2018/html/Nan_Yang_Deep_Virtual_Stereo_ECCV_2018_paper.html.
[4] YANG N. STUMBERG L von. WANG R. 等. D3VO: Deep
Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry[C/OL]//Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 2020: 1281-1292[2022-07-10].
https://openaccess.thecvf.com/content_CVPR_2020/html/Yang_D3VO_Deep_Depth_Deep_Pose_and_Deep_Uncertainty_for_Monocular_CVPR_2020_paper.html.
[5] KLODT M. VEDALDI A. Supervising the new with
the old: learning SFM from SFM[C/OL]//Proceedings of the European
Conference on Computer Vision (ECCV). 2018: 698-713[2022-07-10].
https://openaccess.thecvf.com/content_ECCV_2018/html/Maria_Klodt_Supervising_the_new_ECCV_2018_paper.html.
边栏推荐
- [HMS core], [FAQ], [Health Kit] encountered some small problems in the process of integrating sports health services. Today, I share with you (Huawei watch, Bracelet + sports health service problems C
- 我是如何毕业就失业的?
- 基尔霍夫定律的验证与multisim仿真(附工程文件)
- [harmony OS] [FAQ] Hongmeng application development problem sharing (font / constructor)
- MySQL explicit lock
- VS stdio项目源文件中写多个main
- Jackson 解析json数据之忽略解析字段注解@JsonIgnoreProperties
- [HMS core] [wallet kit] [solution] why can't Huawei wallet's client sample code run
- 【Mindspore-ascend】【自定义算子】重复地对一个Tensor赋值为什么会影响另一个Tensor?
- [paper translation] tnt: target driven trajectory prediction
猜你喜欢
Using vant webapp in wechat applet
WPF 实现 RichTextBox 关键字查询高亮
[harmonyos] [arkui] Hongmeng linear gradient to achieve gradient, how to dynamically set it? I tried it for your reference
[Alibaba cloud server]
C#递归获取文件夹下所有文件 并绑定到 TreeView控件中
494.目标和·深度优先搜索·背包问题
Nodejs 包
[harmony OS] [FAQ] Hongmeng application development problem sharing (font / constructor)
Client and server of grpc magiconion Library (case version)
QT connects to MySQL and operates the database (the clearest)
随机推荐
RC串、并联选频网络特性的硬件分析与详解
如何区分虚拟网卡和物理网卡?
【阿里云服务器】
Nodejs 包
数据代理原理
【BERT】QA、阅读理解、信息检索
2022 Henan Mengxin League game (2)
Jackson parsing JSON data ignore parsing field annotation @jsonignoreproperties
高並發的深入理解
电路元件伏安特性的测量与multisim仿真(附工程文件)
SharePreference原理及跨进程数据共享的问题
典型周期性电信号的测量
js 验证只能输入数字和一个小数点
腾讯低代码平台正式开源!可拖拽、生成手机项目、PC 项目!接私活福利啊!...
微处理器原理之数值转换练习与解答
TransData算子是什么功能的,能否优化性能
Can Siyuan have a built-in password manager
Several ways to open SAP Hana Database Explorer in different locations
移动端 触摸事件
DOS assembly branch, loop programming and register analysis