This shows you the differences between two versions of the page.
asc:laboratoare:05 [2024/04/08 13:48] emil.slusanschi [Referinte] |
asc:laboratoare:05 [2025/04/02 10:20] (current) alexandru.bala [Ierarhia de memorie] |
||
---|---|---|---|
Line 42: | Line 42: | ||
<code sh> | <code sh> | ||
/* marcam pentru compilator regValPi in register file */ | /* marcam pentru compilator regValPi in register file */ | ||
- | __private float regValPi = 3.14f; | + | __private__ float regValPi = 3.14f; |
/* compilatorul cel mai probabil oricum incadreaza regVal2Pi ca registru */ | /* compilatorul cel mai probabil oricum incadreaza regVal2Pi ca registru */ | ||
float regVal2Pi = 2 * 3.14f; | float regVal2Pi = 2 * 3.14f; | ||
Line 55: | Line 55: | ||
<code sh> | <code sh> | ||
/* fiecare work item salveaza un element */ | /* fiecare work item salveaza un element */ | ||
- | __local float lArray[lid] = data[gid]; | + | __local__ float lArray[lid] = data[gid]; |
</code> | </code> | ||
*In functie de implementarea hardware, 100GB/sec -> 2TB/sec | *In functie de implementarea hardware, 100GB/sec -> 2TB/sec | ||
Line 71: | Line 71: | ||
**Constant Memory** | **Constant Memory** | ||
<code sh> | <code sh> | ||
- | __const float pi = 3.14f | + | __const__ float pi = 3.14f |
</code> | </code> | ||
* In functie de implementarea hardware, 100GB/sec -> 1TB/sec | * In functie de implementarea hardware, 100GB/sec -> 1TB/sec | ||
Line 79: | Line 79: | ||
**Global Memory** | **Global Memory** | ||
<code sh> | <code sh> | ||
- | __kernel void process(__global float* data){ ... } | + | __kernel__ void process(__global__ float* data){ ... } |
</code> | </code> | ||
* In functie de implementarea hardware, 30GB/sec -> 500GB/sec | * In functie de implementarea hardware, 30GB/sec -> 500GB/sec | ||
* Video RAM (VRAM), de regula cu o capacitate intre 1GB si 12GB in functie de placa video | * Video RAM (VRAM), de regula cu o capacitate intre 1GB si 12GB in functie de placa video | ||
* Memorie dedicata specializata doar pentru placile grafice discrete (GPU-urile integrate in CPU folosesc RAM) | * Memorie dedicata specializata doar pentru placile grafice discrete (GPU-urile integrate in CPU folosesc RAM) | ||
- | * In general latime mare de banda (256-512 biti) si chipuri de memorii de mare viteza (GDDR5) | + | * In general latime mare de banda (256-512 biti) si chipuri de memorii de mare viteza (GDDR7) |
**Host Memory (RAM)** | **Host Memory (RAM)** | ||
Line 300: | Line 300: | ||
* Acceleratoare xl (NVidia P100) | * Acceleratoare xl (NVidia P100) | ||
* [[https://www.nvidia.com/en-us/data-center/tesla-p100/|NVIDIA Pascal P100]] | * [[https://www.nvidia.com/en-us/data-center/tesla-p100/|NVIDIA Pascal P100]] | ||
- | * Advanced CUDA | + | * Advanced CUDA |
- | [[https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf|CUDA Streams 1]] | + | * [[https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf|CUDA Streams 1]] |
* [[https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/|CUDA Streams 2]] | * [[https://devblogs.nvidia.com/gpu-pro-tip-cuda-7-streams-simplify-concurrency/|CUDA Streams 2]] | ||
* [[https://devblogs.nvidia.com/introduction-cuda-dynamic-parallelism/|CUDA Dynamic Parallelism]] | * [[https://devblogs.nvidia.com/introduction-cuda-dynamic-parallelism/|CUDA Dynamic Parallelism]] |