Differences

This shows you the differences between two versions of the page.

Link to this comparison view

asc:laboratoare:06 [2024/04/08 13:45]
emil.slusanschi [Resurse]
asc:laboratoare:06 [2024/04/10 12:02] (current)
emil.slusanschi [Executie asincrona Host si Device]
Line 158: Line 158:
 </​code>​ </​code>​
  
-Corect ar fi folosirea ​functie ​atomicAdd pentru a serializa accesul.+Corect ar fi folosirea ​functiei ​atomicAdd pentru a serializa accesul.
  
 <code C> <code C>
Line 218: Line 218:
   - Copiere memorie host -> device, avand < 64 KB   - Copiere memorie host -> device, avand < 64 KB
   - Copiere memorie host -> device, avand functii cu sufix Async   - Copiere memorie host -> device, avand functii cu sufix Async
-  - Functii memorie set+  - Functii memorie set (setare / initializare de memorie la o valoare)
  
 Pentru a face debug unor scenarii de executie asincrona se poate dezactiva complet executia asincrona setand variabila de mediu CUDA_LAUNCH_BLOCKING la 1. Executia de kernels este sincrona cand se ruleaza cu un profiler (Nsight, Visual Profiler). Pentru a face debug unor scenarii de executie asincrona se poate dezactiva complet executia asincrona setand variabila de mediu CUDA_LAUNCH_BLOCKING la 1. Executia de kernels este sincrona cand se ruleaza cu un profiler (Nsight, Visual Profiler).
 +
 +==== Fluxuri nonimplicite ====
 +
 +Pentru a folosi cudaMemcpyAsync,​ este necesar lucrul cu fluxuri nonimplictie (non-default streams), care, in C/C++ pot fi declarate, create si distruse in partea de cod de pe host (CPU) in urmatorul fel: 
 +
 +<code C>
 +cudaStream_t stream1;
 +cudaError_t result;
 +result = cudaStreamCreate(&​stream1)
 +result = cudaStreamDestroy(stream1)
 +</​code>​
 +
 +Odata creat un astfel de flux, el poate fi utilizat in procesul de copiere a memoriei host -> device astfel:
  
 <code C> <code C>
 result = cudaMemcpyAsync(d_a,​ a, N, cudaMemcpyHostToDevice,​ stream1) result = cudaMemcpyAsync(d_a,​ a, N, cudaMemcpyHostToDevice,​ stream1)
 +</​code>​
 +
 +Pentru a emite un kernel către un flux nonimplicit,​ specificăm identificatorul fluxului ca al patrulea parametru de configurare a execuției. Se observă și un al treilea parametru de configurare a execuției, care este folosit pentru a aloca memorie partajată device-ului (GPU-ului), utilizându-se 0 dacă nu se dorește acest aspect.
 +
 +<code C>
 +increment<<<​1,​N,​0,​stream1>>>​(d_a)
 </​code>​ </​code>​
  
Line 232: Line 251:
 ==== Executie si transfer date asincron ​ ==== ==== Executie si transfer date asincron ​ ====
  
-Anumite device-uri pot executa un transfer asincron memorie alaturi de o executie de kernel. Acest lucru este dependent de compute capability si se poate verifica in device property asyncEngineCount. De asemenea, se pot face transferuri de memorie intra-device simultan cu executia de kernel cand atat device property concurrentKernels si asyncEngineCount sunt 1.+Anumite device-uri pot executa un transfer asincron memorie alaturi de o executie de kernel. Acest lucru este dependent de compute capability si se poate verifica in device property asyncEngineCount. 
 + 
 +{{:​asc:​lab9:​cuda_async.png?​900|}} 
 + 
 +De asemenea, se pot face transferuri de memorie intra-device simultan cu executia de kernel cand atat device property concurrentKernels, cat si asyncEngineCount sunt 1. 
 + 
 +{{:​asc:​lab9:​cuda_async_2.png?​900|}}
  
 ===== Dynamic Paralellism ​ ===== ===== Dynamic Paralellism ​ =====
Line 259: Line 284:
  
 <​hidden>​ <​hidden>​
-{{:​asc:​lab9:​sol:​lab9_sol.tar.gz|Soluție Laborator ​9}}+{{:​asc:​lab9:​sol:​lab9_sol.tar.gz|Soluție Laborator ​6}}
 </​hidden>​ </​hidden>​
  
-/* {{:​asc:​lab9:​sol:​lab9_sol.zip|Solutie Laborator ​9}} */+/* {{:​asc:​lab9:​sol:​lab9_sol.zip|Solutie Laborator ​6}} */
  
-/* {{:​asc:​lab6:​asc_lab6.pdf|Enunt Laborator 6}} */+/* {{:​asc:​lab6:​asc_lab9.pdf|Enunt Laborator 6}} */
  
   * Responsabili laborator: Matei Barbu, Alexandru Bala   * Responsabili laborator: Matei Barbu, Alexandru Bala
  
 ==== Referinte ==== ==== Referinte ====
 +  * Bibliografie 
 +    * [[https://​booksite.elsevier.com/​9780124077263/​downloads/​advance_contents_and_appendices/​appendix_C.pdf|Graphics and Computing GPUs]]
   * Documentatie CUDA:   * Documentatie CUDA:
-    * [[https://​docs.nvidia.com/​pdf/CUDA_C_Programming_Guide.pdf|CUDA C Programming]]+    * [[https://​docs.nvidia.com/​cuda/cuda-c-programming-guide/​index.html|CUDA C Programming]]
     * [[https://​docs.nvidia.com/​cuda/​pdf/​CUDA_Compiler_Driver_NVCC.pdf| CUDA NVCC compiler]]     * [[https://​docs.nvidia.com/​cuda/​pdf/​CUDA_Compiler_Driver_NVCC.pdf| CUDA NVCC compiler]]
     * [[https://​docs.nvidia.com/​cuda/​profiler-users-guide/​index.html| CUDA Visual Profiler]]     * [[https://​docs.nvidia.com/​cuda/​profiler-users-guide/​index.html| CUDA Visual Profiler]]
-    * [[https://developer.download.nvidia.com/​compute/cuda/9.1/​Prod/​docs/​sidebar/​CUDA_Toolkit_Release_Notes.pdf|CUDA 9.1 Toolkit]]+    * [[https://docs.nvidia.com/​cuda/cuda-toolkit-release-notes/index.html|CUDA Dev Toolkit]]
     * [[https://​developer.nvidia.com/​cuda-gpus|CUDA GPUs]]     * [[https://​developer.nvidia.com/​cuda-gpus|CUDA GPUs]]
-    ​[[http://​www.nvidia.com/​docs/​io/​43395/​nv_ds_tesla_c2050_c2070_apr10_final_lores.pdf|NVIDIA Tesla 2050/2070]] +  ​Acceleratoare xl (NVidia P100) 
-    * [[https://cseweb.ucsd.edu/classes/fa12/cse141/pdf/​09/​GPU_Gahagan_FA12.pdf|NVIDIA ​CUDA Fermi/Tesla]]+    * [[https://www.nvidia.com/en-us/data-center/tesla-p100/​|NVIDIA ​Pascal P100]]
   * Advanced CUDA   * Advanced CUDA
-    * [[https://​devblogs.nvidia.com/​gpu-pro-tip-cuda-7-streams-simplify-concurrency/​|CUDA Streams]]+    ​* [[http://​www-personal.umich.edu/​~smeyer/​cuda/​grid.pdf | CUDA Thread Basics]] 
 +    * [[https://​devblogs.nvidia.com/​even-easier-introduction-cuda/​ | An Even Easier Introduction to CUDA]] 
 +    * [[https://​developer.download.nvidia.com/​CUDA/​training/​StreamsAndConcurrencyWebinar.pdf|CUDA Streams 1]] 
 +    ​* [[https://​devblogs.nvidia.com/​gpu-pro-tip-cuda-7-streams-simplify-concurrency/​|CUDA Streams ​2]]
     * [[https://​devblogs.nvidia.com/​introduction-cuda-dynamic-parallelism/​|CUDA Dynamic Parallelism]]     * [[https://​devblogs.nvidia.com/​introduction-cuda-dynamic-parallelism/​|CUDA Dynamic Parallelism]]
 +    * [[https://​developer.nvidia.com/​blog/​how-overlap-data-transfers-cuda-cc/​|How to Overlap Data Transfers in CUDA C/C++]]
 +
  
asc/laboratoare/06.1712573102.txt.gz · Last modified: 2024/04/08 13:45 by emil.slusanschi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0