当前位置:网站首页>[ascend300t product] [distributed training function] model_ The script on zoo cannot be trained with multiple cards, and an alarm occurs during single card training
[ascend300t product] [distributed training function] model_ The script on zoo cannot be trained with multiple cards, and an alarm occurs during single card training
2022-07-20 09:28:00 【Xiaole happy】
1. Introduction to the environment
The server :Taishan2280
OS:Euler 2.8
Mindspore edition :1.1.0
Model selection :SSD
Package installed :
A300t-9000-mcu_2.0.8.hpm
A300t-9000-npu-driver_20.2.0.b070_euleros2.8-aarch64.run
A300t-9000-npu-firmware_1.76.22.1.220.run
Ascend-cann-toolkit_20.2.rc1_linux-aarch64.run
SSD Model training cannot be done with two cards at the same time
problem 1:run_distribute_train.sh Execution has no effect
problem 2:NPU Card assignment IP Time display ip Conflict
problem 3:/etc/hccl.conf Caused by empty file rank*.json File cannot be generated
SSD Model single card training , There will be many alarms , Can train successfully
answer :
Single card training warning It's normal , Look at the tips from the log MindSpore and Ascend The version of does not match , but MindSpore Forward compatible, so it can be executed normally .
Multi card problem 2 Appoint IP What specific instructions are used ,/etc/hccn.conf Null may be due to the specified IP Failure
边栏推荐
- 记录一下脉冲控制伺服电机的过程
- Uniapp wechat applet sharing and friend circle sharing function
- 上采样和上卷积的区别
- r-cnn
- New urlsearchparams() the built-in object gets the parameters of the address bar and gets the value by means of keys
- Opencv learning (4) color conversion processing image rendering random number
- 论文解读《BiX-NAS: Searching Efficient Bi-directional Architecture for Medical Image Segmentation》
- 解决QT不能發現QT平臺插件
- [vscode advanced preliminary] vscode debug
- 论文解读《Semi-supervised Contrastive Learning for Label-efficient Medical Image Segmentation》
猜你喜欢
Redis詳解(1)前言
opencv学习(3)之颜色表操作 逻辑操作 通道分离,合并,混合
Application of deep learning in tissue sectioning
mindspore《实现一个图片分类应用》 运行错误
mysql45讲阅读笔记深入浅出索引上(四)
上采样和上卷积的区别
论文解读《Protein subcellular localization based on deep image features and criterion learning strategy》
mindspore官网教程中冻结网络参数怎么理解,能否解释下?
Detailed explanation of yolov1
yolov1
随机推荐
RetinaFace解析
Wechat applet - all pages turn on the sharing circle of friends function at one time (only need to be executed once) (wx.showsharemenu)
Power learning (1) - power system test
ViT结构
有趣的torch.einsum
让外国人我哥他
[summary of some knowledge points about Gran DAG]
uni-app. Develop wechat applet to realize message subscription
Ngnix详解(2)安装使用
mindspore官网教程中冻结网络参数怎么理解,能否解释下?
XHR error in vs code installation plug-in solution
opencv(1)之图像读取,显示,保存,色彩转换
Application of deep learning in tissue sectioning
Single chip microcomputer 2 -- some examples of dynamic digital tube
Pop up window at the bottom of uniapp applet
inception系列
smplify-x笔记
Introduction to unsupervised feature learning dataset
【Ascend300t产品】【分布式训练功能】Model_zoo上的脚本多卡无法训练,单卡训练出现告警
【Mindspore】【Mindrecord】指定浮点精度后保存读取问题