This shows you the differences between two versions of the page.
asc:laboratoare:05 [2024/04/02 11:14] alexandru.bala [Ierarhia de memorie] |
asc:laboratoare:05 [2025/04/02 10:20] (current) alexandru.bala [Ierarhia de memorie] |
||
---|---|---|---|
Line 42: | Line 42: | ||
<code sh> | <code sh> | ||
/* marcam pentru compilator regValPi in register file */ | /* marcam pentru compilator regValPi in register file */ | ||
- | __private float regValPi = 3.14f; | + | __private__ float regValPi = 3.14f; |
/* compilatorul cel mai probabil oricum incadreaza regVal2Pi ca registru */ | /* compilatorul cel mai probabil oricum incadreaza regVal2Pi ca registru */ | ||
float regVal2Pi = 2 * 3.14f; | float regVal2Pi = 2 * 3.14f; | ||
Line 55: | Line 55: | ||
<code sh> | <code sh> | ||
/* fiecare work item salveaza un element */ | /* fiecare work item salveaza un element */ | ||
- | __local float lArray[lid] = data[gid]; | + | __local__ float lArray[lid] = data[gid]; |
</code> | </code> | ||
*In functie de implementarea hardware, 100GB/sec -> 2TB/sec | *In functie de implementarea hardware, 100GB/sec -> 2TB/sec | ||
Line 71: | Line 71: | ||
**Constant Memory** | **Constant Memory** | ||
<code sh> | <code sh> | ||
- | __const float pi = 3.14f | + | __const__ float pi = 3.14f |
</code> | </code> | ||
* In functie de implementarea hardware, 100GB/sec -> 1TB/sec | * In functie de implementarea hardware, 100GB/sec -> 1TB/sec | ||
Line 79: | Line 79: | ||
**Global Memory** | **Global Memory** | ||
<code sh> | <code sh> | ||
- | __kernel void process(__global float* data){ ... } | + | __kernel__ void process(__global__ float* data){ ... } |
</code> | </code> | ||
* In functie de implementarea hardware, 30GB/sec -> 500GB/sec | * In functie de implementarea hardware, 30GB/sec -> 500GB/sec | ||
* Video RAM (VRAM), de regula cu o capacitate intre 1GB si 12GB in functie de placa video | * Video RAM (VRAM), de regula cu o capacitate intre 1GB si 12GB in functie de placa video | ||
* Memorie dedicata specializata doar pentru placile grafice discrete (GPU-urile integrate in CPU folosesc RAM) | * Memorie dedicata specializata doar pentru placile grafice discrete (GPU-urile integrate in CPU folosesc RAM) | ||
- | * In general latime mare de banda (256-512 biti) si chipuri de memorii de mare viteza (GDDR5) | + | * In general latime mare de banda (256-512 biti) si chipuri de memorii de mare viteza (GDDR7) |
**Host Memory (RAM)** | **Host Memory (RAM)** | ||
Line 287: | Line 287: | ||
</hidden> | </hidden> | ||
- | * Responsabili laborator: Matei Barbu | + | * Responsabili laborator: Matei Barbu, Alexandru Bala |
==== Referinte ==== | ==== Referinte ==== | ||
Line 293: | Line 293: | ||
* [[https://booksite.elsevier.com/9780124077263/downloads/advance_contents_and_appendices/appendix_C.pdf|Graphics and Computing GPUs]] | * [[https://booksite.elsevier.com/9780124077263/downloads/advance_contents_and_appendices/appendix_C.pdf|Graphics and Computing GPUs]] | ||
* Documentatie CUDA: | * Documentatie CUDA: | ||
- | * [[https://docs.nvidia.com/pdf/CUDA_C_Programming_Guide.pdf|CUDA C Programming]] | + | * [[https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html|CUDA C Programming]] |
* [[https://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf| CUDA NVCC compiler]] | * [[https://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf| CUDA NVCC compiler]] | ||
* [[https://docs.nvidia.com/cuda/profiler-users-guide/index.html| CUDA Visual Profiler]] | * [[https://docs.nvidia.com/cuda/profiler-users-guide/index.html| CUDA Visual Profiler]] | ||
- | * [[https://developer.download.nvidia.com/compute/cuda/9.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf|CUDA 9.1 Toolkit]] | + | * [[https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html|CUDA Dev Toolkit]] |
* [[https://developer.nvidia.com/cuda-gpus|CUDA GPUs]] | * [[https://developer.nvidia.com/cuda-gpus|CUDA GPUs]] | ||
- | * Acceleratoare hpsl (hpsl-wn01, hpsl-wn02, hpsl-wn03) | + | * Acceleratoare xl (NVidia P100) |
- | * [[http://international.download.nvidia.com/tesla/pdf/tesla-k40-passive-board-spec.pdf|NVIDIA Tesla K40M]] | + | * [[https://www.nvidia.com/en-us/data-center/tesla-p100/|NVIDIA Pascal P100]] |
- | * [[https://en.wikipedia.org/wiki/Nvidia_Tesla|NVIDIA Tesla]] | + | |
- | * Acceleratoare dp (dp-wn01, dp-wn02, dp-wn03) | + | |
- | * [[https://www.nvidia.com/docs/IO/43395/NV_DS_Tesla_C2050_C2070_jul10_lores.pdf|NVIDIA Tesla C2070]] | + | |
- | * [[http://www.nvidia.com/docs/io/43395/nv_ds_tesla_c2050_c2070_apr10_final_lores.pdf|NVIDIA Tesla 2050/2070]] | + | |
- | * [[https://cseweb.ucsd.edu/classes/fa12/cse141/pdf/09/GPU_Gahagan_FA12.pdf|NVIDIA CUDA Fermi/Tesla]] | + | |
* Advanced CUDA | * Advanced CUDA | ||
- | * [[https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/|CUDA Streams]] | + | * [[https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf|CUDA Streams 1]] |
+ | * [[https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/|CUDA Streams 2]] | ||
* [[https://devblogs.nvidia.com/introduction-cuda-dynamic-parallelism/|CUDA Dynamic Parallelism]] | * [[https://devblogs.nvidia.com/introduction-cuda-dynamic-parallelism/|CUDA Dynamic Parallelism]] | ||
+ | * [[http://www-personal.umich.edu/~smeyer/cuda/grid.pdf | CUDA Thread Basics]] | ||
+ | * [[https://devblogs.nvidia.com/even-easier-introduction-cuda/ | An Even Easier Introduction to CUDA]] | ||