当前位置：网站首页>Data transfer principle between TX2 video memory and memory

Data transfer principle between TX2 video memory and memory

2022-07-22 11:15:00 【Happy breeder】

Preface ： as everyone knows , stay NVIDIA Edge computing end use GPU When accelerating the reasoning of the model , It will inevitably involve the data transfer between video memory and memory , If you can make the data transfer and copy between video memory and memory quickly , It is of great benefit to the real-time performance of model reasoning . In this paper, edge computing end products TX2 For example , Explain the principle of data transmission between video memory and memory and carry out relevant tests .

Catalog

1. Video memory and memory data transfer SDK

2. Principle of data transmission between video memory and memory

3. Common errors in video memory and memory operations

1. Video memory and memory data transfer SDK

TX2 Use GPU When accelerating the reasoning model , First, you need to transfer the data to the video memory , Parallel computing reasoning in video memory ,NVIDIA Compile computing end products SDK The function of data transmission between video memory and memory is given in ——cudaMemcpy function , The definition is as follows ：

/*
dst: The goal of data transmission 
src: Data to be transmitted 
count： The size of the data passed 
kind： Data transfer model , For enumeration type , There are the following ：
cudaMemcpyHostToHost：  Copy one memory area to another 
cudaMemcpyHostToDevice： Copy memory to video memory 
cudaMemcpyDeviceToHost： Copy video memory to memory 
cudaMemcpyDeviceToDevice： Copy video memory to video memory 
*/
cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind);

I am here NVIDIA TX2 In the test TRT Model reasoning , It outputs the time of data transmission between video memory and memory , Data transfer from memory to video memory took 0.3~0.4ms about , The amount of data transferred 1.98MB, And it takes time to transfer from video memory to memory 6-10ms about , The amount of data transferred is 59.4KB, Almost bad 20 times , And the amount of data transferred from memory to video memory is much larger than that transferred to memory on site , What's the reason for this ？

2. Principle of data transmission between video memory and memory

stay cuda6 before , The data exchange between video memory and memory uses the above cudaMemcpy, The data exchange principle is shown in the figure below ：

The code execution process is as follows ：

cudaMalloc(buffers, buffersize);/// Here buffersize The unit of size is bytes 
cudaMemcpy(buffers, input, inputdatasize, cudaMemcpyHostToDevice);
inputdatakernelprocess;// Parallel computing 
cudaMemcpy(output, buffers, outputdatasize, cudaMemcpyDeviceToHost);
cudaFree(buffers);/// Release video memory

stay cuda6 in the future , The data exchange between video memory and memory adopts the way of unified memory management ,TX2 Video memory and memory are shared , Use cudaMallocManaged Function opens up memory space , Both video memory and memory are accessible , But only one of them is allowed to visit at the same time , or GPU visit , or CPU visit , but GPU The priority of access to this area is higher than CPU, stay GPU During the interview CPU cannot access .

The code execution process is as follows ：

cudaMallocManaged(buffers, buffersize);/// Here buffersize The unit of size is bytes , The defined area of video memory and memory are accessible 
Memcpy(buffers, input, inputdatasize, cudaMemcpyHostToDevice);// Read data into the shared area 
inputdatakernelprocess;// Parallel computing 
cudaDeviceSynchronize();// This statement is indispensable 
Memcpy(output, buffers, outputdatasize, cudaMemcpyDeviceToHost);/// Take out the shared area data 
cudaFree(buffers);// Release video memory

cudaDeviceSynchronize() The meaning of this function is to wait cuda End of Parallel Computing , If you don't use this statement , It's possible CPU and GPU Access the shared memory area at the same time , At this time, the system does not allow , So this sentence is very important , After executing this statement, it means GPU Shared memory has been used up , Now CPU Then you can access it .

3. Common errors in video memory and memory operations

The most common error in operating video memory and memory is that it occurs during operation segment fault or core dunped, It is easy for beginners to make this mistake , Especially for C++ When the middle pointer is unfamiliar . This error is actually that you have accessed an undefined memory area or video memory area , It is troublesome for beginners to check , Students who have relevant mistakes can confide in bloggers .

Sometimes you compile , No problem running , But it didn't turn out right , At this time, it is more difficult to check , Most of them are the following two cases ：

（1） Your input data is not read into the corresponding memory （ memory ） Area ;

（2） The pointer position or size of the data you take out is wrong ;

Actually, there is so much nonsense , Just one thing —— The pointer , Solid C Language foundation is really too important .

原网站

版权声明
本文为[Happy breeder]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/203/202207212005464346.html