当前位置:网站首页>Data transfer principle between TX2 video memory and memory
Data transfer principle between TX2 video memory and memory
2022-07-22 11:15:00 【Happy breeder】
Preface : as everyone knows , stay NVIDIA Edge computing end use GPU When accelerating the reasoning of the model , It will inevitably involve the data transfer between video memory and memory , If you can make the data transfer and copy between video memory and memory quickly , It is of great benefit to the real-time performance of model reasoning . In this paper, edge computing end products TX2 For example , Explain the principle of data transmission between video memory and memory and carry out relevant tests .
Catalog
1. Video memory and memory data transfer SDK
2. Principle of data transmission between video memory and memory
3. Common errors in video memory and memory operations
1. Video memory and memory data transfer SDK
TX2 Use GPU When accelerating the reasoning model , First, you need to transfer the data to the video memory , Parallel computing reasoning in video memory ,NVIDIA Compile computing end products SDK The function of data transmission between video memory and memory is given in ——cudaMemcpy function , The definition is as follows :
/*
dst: The goal of data transmission
src: Data to be transmitted
count: The size of the data passed
kind: Data transfer model , For enumeration type , There are the following :
cudaMemcpyHostToHost: Copy one memory area to another
cudaMemcpyHostToDevice: Copy memory to video memory
cudaMemcpyDeviceToHost: Copy video memory to memory
cudaMemcpyDeviceToDevice: Copy video memory to video memory
*/
cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind);
I am here NVIDIA TX2 In the test TRT Model reasoning , It outputs the time of data transmission between video memory and memory , Data transfer from memory to video memory took 0.3~0.4ms about , The amount of data transferred 1.98MB, And it takes time to transfer from video memory to memory 6-10ms about , The amount of data transferred is 59.4KB, Almost bad 20 times , And the amount of data transferred from memory to video memory is much larger than that transferred to memory on site , What's the reason for this ?
2. Principle of data transmission between video memory and memory
stay cuda6 before , The data exchange between video memory and memory uses the above cudaMemcpy, The data exchange principle is shown in the figure below :
The code execution process is as follows :
cudaMalloc(buffers, buffersize);/// Here buffersize The unit of size is bytes
cudaMemcpy(buffers, input, inputdatasize, cudaMemcpyHostToDevice);
inputdatakernelprocess;// Parallel computing
cudaMemcpy(output, buffers, outputdatasize, cudaMemcpyDeviceToHost);
cudaFree(buffers);/// Release video memory
stay cuda6 in the future , The data exchange between video memory and memory adopts the way of unified memory management ,TX2 Video memory and memory are shared , Use cudaMallocManaged Function opens up memory space , Both video memory and memory are accessible , But only one of them is allowed to visit at the same time , or GPU visit , or CPU visit , but GPU The priority of access to this area is higher than CPU, stay GPU During the interview CPU cannot access .
The code execution process is as follows :
cudaMallocManaged(buffers, buffersize);/// Here buffersize The unit of size is bytes , The defined area of video memory and memory are accessible
Memcpy(buffers, input, inputdatasize, cudaMemcpyHostToDevice);// Read data into the shared area
inputdatakernelprocess;// Parallel computing
cudaDeviceSynchronize();// This statement is indispensable
Memcpy(output, buffers, outputdatasize, cudaMemcpyDeviceToHost);/// Take out the shared area data
cudaFree(buffers);// Release video memory
cudaDeviceSynchronize() The meaning of this function is to wait cuda End of Parallel Computing , If you don't use this statement , It's possible CPU and GPU Access the shared memory area at the same time , At this time, the system does not allow , So this sentence is very important , After executing this statement, it means GPU Shared memory has been used up , Now CPU Then you can access it .
3. Common errors in video memory and memory operations
The most common error in operating video memory and memory is that it occurs during operation segment fault or core dunped, It is easy for beginners to make this mistake , Especially for C++ When the middle pointer is unfamiliar . This error is actually that you have accessed an undefined memory area or video memory area , It is troublesome for beginners to check , Students who have relevant mistakes can confide in bloggers .
Sometimes you compile , No problem running , But it didn't turn out right , At this time, it is more difficult to check , Most of them are the following two cases :
(1) Your input data is not read into the corresponding memory ( memory ) Area ;
(2) The pointer position or size of the data you take out is wrong ;
Actually, there is so much nonsense , Just one thing —— The pointer , Solid C Language foundation is really too important .
边栏推荐
- 同花顺上面开户安全吗 reits基金怎么购买
- TCP 通信流程详解(附有案例代码)
- Answer to the virtual configuration of network construction and application of 2021 national vocational college skills competition
- Creation and call of stored procedure based on Oracle Database
- C#实现汉字转拼音
- 智能运维场景解析:如何通过异常检测发现业务系统状态异常
- Application of character sets and comparison rules
- Can you really use search engine?
- TCP 滑动窗口详解(非常实用)
- Branch merge
猜你喜欢
视频提取关键帧工具类KeyFramesExtractUtils.py,动态支持三种取帧方式,关键参数可配置,代码经过优化处理,效果和性能更好。
Keyframesextractutils Py, dynamically supports three framing methods, key parameters can be configured, and the code has been optimized for better effect and performance.
Data analysis from 0 to 1 --- Matplotlib article
工作任务“杂乱难”?这个小工具帮你轻松搞定!
云呐-咸宁通信机房动环监控系统,电信动环监控系统
2022 音视频技术风向标
ROS入门级教程
危化品化工企业双重预防机制五有标准是什么包括哪些内容
TCP 通信流程详解(附有案例代码)
SYSTEMd management process exporter
随机推荐
杭州动环监控系统供应商,动环监控设备
Pytorch训练模型固定随机种子(seed),保证精度可复现
基于torchvision对模型最后几层进行微调,用于训练自己的数据
Service worker guide-1
Can you really use search engine?
What are the five standards for the dual prevention mechanism of hazardous chemical enterprises and what are the contents
TypeScript—语法简介
啊啊啊啊?margin-top的百分比到底相对于谁
Dokcer running Nacos container automatic exit problem
Matlab natural spline function (constraining the slope at both ends)
炒股开户哪家证券好 网上开户安全吗
网络IP地址子网划分学习
SYSTEMd management process exporter
Value and technical thinking of vectorization engine for HTAP
你真的会使用搜索引擎吗?
学生管理系统(文件版)
什么是视频内容推荐引擎?
机房动环监控系统的功能,动环监控系统的主要功能
C # realize the conversion of Chinese characters to Pinyin
云呐-咸宁通信机房动环监控系统,电信动环监控系统