-
Notifications
You must be signed in to change notification settings - Fork 28
Interoperability: CUDA Streams #547
Copy link
Copy link
Open
Labels
Performance optimizationbackend: cudaSpecific to CUDA execution (GPUs)Specific to CUDA execution (GPUs)backend: hipSpecific to ROCm execution (GPUs)Specific to ROCm execution (GPUs)backend: syclSpecific to DPC++/SYCL execution (CPUs/GPUs)Specific to DPC++/SYCL execution (CPUs/GPUs)component: third partyChanges in pyAMReX that reflect a change in a third-party libraryChanges in pyAMReX that reflect a change in a third-party libraryenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Metadata
Metadata
Assignees
Labels
Performance optimizationbackend: cudaSpecific to CUDA execution (GPUs)Specific to CUDA execution (GPUs)backend: hipSpecific to ROCm execution (GPUs)Specific to ROCm execution (GPUs)backend: syclSpecific to DPC++/SYCL execution (CPUs/GPUs)Specific to DPC++/SYCL execution (CPUs/GPUs)component: third partyChanges in pyAMReX that reflect a change in a third-party libraryChanges in pyAMReX that reflect a change in a third-party libraryenhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
CUDA streams the same way as FFT plans and BLAS handles and MPI communicators are a shared resource and we need to be able to expose them (easy) but also be able to be constrained to pre-initialized handles, if we need them to be. We can already do this pretty well now for FFTs, MPI and some BLAS stuff.
Functions like
ParallelForwill callGpu::gpuStreamto get a stream for its kernel. But thelaunchfunction allows one to pass a stream.We could expose a function that let's the user the set the stream before
ParallelFor.Currently, AMReX builds 4 streams by default.
MFIterdefines the strategy (round robin) and sets the values ofGpu::gpuStream.