当前位置:网站首页>ViT结构
ViT结构
2022-07-19 05:13:00 【平丘月初】
Vision Transformer
图像输入尺寸为 [ N , C , H , W ] [N, C, H, W] [N,C,H,W], C C C通常为3,为了构建为 T r a n s f o r m e r Transformer Transformer需要的输入,将输入图像切分为 p h ∗ p w ∗ C p_h * p_w * C ph∗pw∗C尺寸的 n n n个小图块,合计切出 h ∗ w h*w h∗w个小图块。
# reshape and flatten
[N, C, H, W] => [N, h*w, p_h * p_w * C] => [N, h*w, dim] # h = H // p_h, w = W // p_w, input flattened feature to nn.Linear, map into dim dimenstion.
# concat cls_tokens and add positional embedding
cls_token = nn.Parameter(torch.randn(1, 1, dim))
cls_token = repeat(cls_token, '() n d -> b n d', b=b)
pose_embedding = nn.Parameter(torch.randn(1, num_patches + 1, dim)
[N, n, dim] => [N, n + 1, dim] => [N, n + 1, dim] # n = h * w, cls_tokens -> positional embedding.
经过 n n n个 e n c o d i n g l a y e r s encoding\; layers encodinglayers构建成的 T r a n s f o r m e r Transformer Transformer提取特征后,输入到 M L P h e a d MLP\; head MLPhead 模块
[N, n + 1, dim] => [N, num_classes]
T r a n s f o r m e r Transformer Transformer的 e n c o d i n g l a y e r encoding\; layer encodinglayer模块的结构如下:
encoding layer = MSA + MLP
MSA: Multi-headed Self-Attention
MLP: Multi-Layer Perceptron
注意力模块如下:
多层注意力由多个单一的注意力模块提取信息后,concat到一起。
边栏推荐
- Opencv learning (3) color table operation logic operation channel separation, merging, mixing
- Vector exception thrown by opencv
- 【Ascend300t产品】【分布式训练功能】Model_zoo上的脚本多卡无法训练,单卡训练出现告警
- Yolov4 and V5
- ssd理论
- Uniapp wechat applet sharing and friend circle sharing function
- mysql45讲阅读笔记深入浅出索引上(四)
- [cvpr2021] article, code and data link
- Detailed explanation of ngnix (3) configuration file
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
猜你喜欢
Redis详解(1)前言
vs code 安装插件出现XHR error 解决办法
[introduction to Thesis] self supervised learning with data augmentationsprovably isolates content from style
Dilation convolution (void convolution)
【Mindspore】【lite端侧训练推理】Mindspore lit按照使用说明文档跑LeNet训练示例代码报错
Differences and connections between swing transformer and vision transformer
gan01
ssd理论
Opencv image bit operation
【mindspore】【import erro】 undefined symbol _Z14DlogErrorInneriPK
随机推荐
[cvpr2020] articles, codes and data links
算子Concat 拼接包含多个 tensor 的元组出错
【Mindspore】【Mindrecord】指定浮点精度后保存读取问题
mindspore官网教程中冻结网络参数怎么理解,能否解释下?
yolov3
Deep Snake for Real-Time Instance Segmentation
r-cnn
Solve the problem that QT cannot find Qt platform plug-ins
mindspore如何查看模型参数量?
RStudio作图
smplify-x笔记
yolov1
Golang: some operations that are easy to misunderstand
yolov4和v5
Create and manage databases using SQL statements
jetson nano安装ros心得体会(失败)
mysql45讲阅读笔记深入浅出索引上(四)
Dilation convolution (void convolution)
Crack detection of pytoch migration learning Version (resnet50)
gan01