09:03:29 De Michael Klemm : Good morning! As usual, please feel to send questions to the chat and we will answer them as we go. 09:12:54 De Thomas Hayward-Schneider : what's the expected performance for a single/dual socket Xeon/epyc for these parameters? 09:13:44 De cristian sommariva : would have improved the performance using the shared memory of the GPU? 09:15:20 De Tiago Ribeiro : How do the threads teams, blocks, etc. map to the hardware when you split the OMP pragma between the two loops? E.g. #pragma omp target teams distribute reduction(max:err) for( j = 1; j < n-1; j++) { #pragma omp parallel for reduction(max:err) for( i = 1; i < m-1; i++ ) { 09:16:19 De cristian sommariva : perfect thanks 09:19:08 De Tiago Ribeiro : yes, thank you 09:28:28 De Michael Klemm : Small correction: to(input[:N]) is correct. For the "update" directive, the "to", "from", and "tofrom" are clauses. 09:32:33 De Thomas Hayward-Schneider : enter datas can "nest", right? 09:32:41 De Michael Klemm : yes 09:32:53 De Michael Klemm : I will have an example on this later with the use case 09:33:04 De Thomas Hayward-Schneider : perfect! 09:39:05 De Thomas Hayward-Schneider : a presence check would imply that it's not an implicit update. is that correct? 09:43:22 De Thomas Hayward-Schneider : how is the "from" part of the "tofrom" handled in such a map()? 09:43:42 De Thomas Hayward-Schneider : it's obviously not just a presence check 09:45:14 De Thomas Hayward-Schneider : yes thanks. (I think the question was partly based on a misunderstanding. Oops.) 09:45:25 De cristian sommariva : with !$omp target if the variable "x" is not found, will be also mapped using "map(tofrom:x)" and hence, transferred back to the CPU as well ? 09:48:11 De Christian Terboven à France Boillod-Cerneux (CEA)(Message direct) : RWTH continues in home office mode, as do most other public institutions in Germany. 09:48:11 De Yuuichi Asahi : Does a presence check works for a pointer member in a derived class? 09:48:11 De cristian sommariva : perfect thank you 09:48:19 De Tiago Ribeiro : but if “x” is a pointer, this would not work, right? 09:48:24 De Michael Klemm : yes 09:49:17 De Gabriele Fatigati : is it possible to specify a type of memory where the array will be mapped on gpu? or is allocated on global memory by default? 09:51:22 De Thomas Hayward-Schneider : Maybe I'm foreshadowing what's about to come, but can we use openmp tasking (host-side) for more advanced control of asynchronous data transfers and kernel launches? 09:52:30 De Michael Klemm : You don't have to. Async offloads are OpenMP tasks 09:52:52 De Thomas Hayward-Schneider : with openmp tasks's dependency support? 09:53:05 De Michael Klemm : @Gabriele: Yes, there is an API to specify memory traits also for GPUs 09:53:35 De Michael Klemm : @Thomas: Yes. Everything that you can do with OpenMP tasks. Incl. taskwait, taskrgoup, dependences, etc. 09:53:54 De Gabriele Fatigati : ok thanks 09:54:23 De Thomas Hayward-Schneider : Excellent! 09:55:37 De Michael Klemm : That's we really like tasking :-) 09:55:59 De Michael Klemm : (I confess, we have defined OpenMP target to promote the use of tasking in OpenMP :-)) 10:00:49 De Thomas Hayward-Schneider : Q: I don't think anything has been said about GPU(a)->GPU(b) transfers. Since these might have higher bandwidth than going via CPU, is there anything one can do with OpenMP? 10:02:31 De Michael Klemm : Not at the moment. We are working on defining this for OpenMP 6.0 10:03:14 De Thomas Hayward-Schneider : So multi-GPU hybrid MPI/OpenMP would currently be 1 MPI rank per GPU? 10:03:28 De Michael Klemm : Or 1 MPI rank with multiple GPUs 10:04:03 De Thomas Hayward-Schneider : Ok, so that works fine, but no direct GPU->GPU transfers? 10:04:11 De Michael Klemm : But then you need to go through the host to copy between GPUs 10:04:24 De Thomas Hayward-Schneider : Thansk 10:04:30 De Thomas Hayward-Schneider : *Danke 10:04:31 De Michael Klemm : (An OpenMP implementation might still do something smart here even though technically, the host is involved.) 10:10:45 De Guillaume Latu - CEA Cadarache : Recommendation for debugging such examples with task dependancies ? 10:12:19 De Guillaume Latu - CEA Cadarache : Pointers or URL for these Tools would be helpful ... 10:12:59 De VITTORIO SCIORTINO : it seems all quite simple, the only problem is that there are a lot of directive keywords to combine and to chain :) 10:13:10 De Christian Terboven : The ARCHER tool: https://www.vi-hps.org/tools/archer.html 10:13:40 De Christian Terboven : A recent webinar on correctness of MPI and OpenMP programs, showing the tools I mentioned in action: https://pop-coe.eu/blog/20th-pop-webinar-debugging-tools-for-correctness-analysis-of-mpi-and-openmp-applications 10:14:57 De Christian Terboven : @Vittorio: This is not simple. With all these constructs and clauses, you can express a highly complicated concurrent and asynchronous algorithm. But if you have understod that algorithm, you have the tools. 10:22:10 De Yuuichi Asahi : How does swapping device pointers work? 10:22:58 De Gabriele Fatigati : in the previous example, what happen if if you don't specify use_device_ptr? will the code crash? 10:23:45 De France Boillod-Cerneux (CEA) à VITTORIO SCIORTINO(Message direct) : vittorio, you ar unmutted 10:25:33 De Christian Terboven : Let got back to that slide later on. 10:28:30 De VITTORIO SCIORTINO à France Boillod-Cerneux (CEA)(Message direct) : ok, sorry 10:30:54 De Gabriele Fatigati : thanks 10:31:53 De VITTORIO SCIORTINO à France Boillod-Cerneux (CEA)(Message direct) : thank you for the lesson 10:32:46 De Gabriele Fatigati : just a little question about loop collapse. If I have some initialization between loops ( es: a=0) can we collapse the loop? 10:33:37 De Gabriele Fatigati : ok 10:34:14 De cristian sommariva : thank you very much for the session! 10:34:19 De VITTORIO SCIORTINO : thank you very much