09:03:29	 De  Michael Klemm : Good morning!  As usual, please feel to send questions to the chat and we will answer them as we go.
09:12:54	 De  Thomas Hayward-Schneider : what's the expected performance for a single/dual socket Xeon/epyc for these parameters?
09:13:44	 De  cristian sommariva : would have improved the performance using the shared memory of the GPU?
09:15:20	 De  Tiago Ribeiro : How do the threads teams, blocks, etc. map to the hardware when you split the OMP pragma between the two loops? E.g. 

#pragma omp target teams distribute reduction(max:err)
for( j = 1; j < n-1; j++) {
#pragma omp parallel for reduction(max:err)
  for( i = 1; i < m-1; i++ ) {
09:16:19	 De  cristian sommariva : perfect thanks
09:19:08	 De  Tiago Ribeiro : yes, thank you
09:28:28	 De  Michael Klemm : Small correction: to(input[:N]) is correct.  For the "update" directive, the "to", "from", and "tofrom" are clauses.
09:32:33	 De  Thomas Hayward-Schneider : enter datas can "nest", right?
09:32:41	 De  Michael Klemm : yes
09:32:53	 De  Michael Klemm : I will have an example on this later with the use case
09:33:04	 De  Thomas Hayward-Schneider : perfect!
09:39:05	 De  Thomas Hayward-Schneider : a presence check would imply that it's not an implicit update. is that correct?
09:43:22	 De  Thomas Hayward-Schneider : how is the "from" part of the "tofrom" handled in such a map()?
09:43:42	 De  Thomas Hayward-Schneider : it's obviously not just a presence check
09:45:14	 De  Thomas Hayward-Schneider : yes thanks. (I think the question was partly based on a misunderstanding. Oops.)
09:45:25	 De  cristian sommariva : with !$omp target if the variable "x" is not found, will be also mapped using "map(tofrom:x)" and hence, transferred back to the CPU as well ?
09:48:11	 De  Christian Terboven   à   France Boillod-Cerneux (CEA)(Message direct) : RWTH continues in home office mode, as do most other public institutions in Germany.
09:48:11	 De  Yuuichi Asahi : Does a presence check works for a pointer member in a derived class?
09:48:11	 De  cristian sommariva : perfect thank you
09:48:19	 De  Tiago Ribeiro : but if “x” is a pointer, this would not work, right?
09:48:24	 De  Michael Klemm : yes
09:49:17	 De  Gabriele Fatigati : is it possible to specify a type of memory where the array will be mapped on gpu? or is allocated on global memory by default?
09:51:22	 De  Thomas Hayward-Schneider : Maybe I'm foreshadowing what's about to come, but can we use openmp tasking (host-side) for more advanced control of asynchronous data transfers and kernel launches?
09:52:30	 De  Michael Klemm : You don't have to. Async offloads are OpenMP tasks
09:52:52	 De  Thomas Hayward-Schneider : with openmp tasks's dependency support?
09:53:05	 De  Michael Klemm : @Gabriele: Yes, there is an API to specify memory traits also for GPUs
09:53:35	 De  Michael Klemm : @Thomas: Yes.  Everything that you can do with OpenMP tasks.  Incl. taskwait, taskrgoup, dependences, etc.
09:53:54	 De  Gabriele Fatigati : ok thanks
09:54:23	 De  Thomas Hayward-Schneider : Excellent!
09:55:37	 De  Michael Klemm : That's we really like tasking :-)
09:55:59	 De  Michael Klemm : (I confess, we have defined OpenMP target to promote the use of tasking in OpenMP :-))
10:00:49	 De  Thomas Hayward-Schneider : Q: I don't think anything has been said about GPU(a)->GPU(b) transfers. Since these might have higher bandwidth than going via CPU, is there anything one can do with OpenMP?
10:02:31	 De  Michael Klemm : Not at the moment.  We are working on defining this for OpenMP 6.0
10:03:14	 De  Thomas Hayward-Schneider : So multi-GPU hybrid MPI/OpenMP would currently be 1 MPI rank per GPU?
10:03:28	 De  Michael Klemm : Or 1 MPI rank with multiple GPUs
10:04:03	 De  Thomas Hayward-Schneider : Ok, so that works fine, but no direct GPU->GPU transfers?
10:04:11	 De  Michael Klemm : But then you need to go through the host to copy between GPUs
10:04:24	 De  Thomas Hayward-Schneider : Thansk
10:04:30	 De  Thomas Hayward-Schneider : *Danke
10:04:31	 De  Michael Klemm : (An OpenMP implementation might still do something smart here even though technically, the host is involved.)
10:10:45	 De  Guillaume Latu - CEA Cadarache : Recommendation for debugging such examples with task dependancies ?
10:12:19	 De  Guillaume Latu - CEA Cadarache : Pointers or URL for these Tools would be helpful ...
10:12:59	 De  VITTORIO SCIORTINO : it seems all quite simple, the only problem is that there are a lot of directive keywords to combine and to chain :)
10:13:10	 De  Christian Terboven : The ARCHER tool: https://www.vi-hps.org/tools/archer.html
10:13:40	 De  Christian Terboven : A recent webinar on correctness of MPI and OpenMP programs, showing the tools I mentioned in action: https://pop-coe.eu/blog/20th-pop-webinar-debugging-tools-for-correctness-analysis-of-mpi-and-openmp-applications
10:14:57	 De  Christian Terboven : @Vittorio: This is not simple. With all these constructs and clauses, you can express a highly complicated concurrent and asynchronous algorithm. But if you have understod that algorithm, you have the tools.
10:22:10	 De  Yuuichi Asahi : How does swapping device pointers work?
10:22:58	 De  Gabriele Fatigati : in the previous example, what happen if if you don't specify use_device_ptr? will the code crash?
10:23:45	 De  France Boillod-Cerneux (CEA)   à   VITTORIO SCIORTINO(Message direct) : vittorio, you ar unmutted
10:25:33	 De  Christian Terboven : Let got back to that slide later on.
10:28:30	 De  VITTORIO SCIORTINO   à   France Boillod-Cerneux (CEA)(Message direct) : ok, sorry
10:30:54	 De  Gabriele Fatigati : thanks
10:31:53	 De  VITTORIO SCIORTINO   à   France Boillod-Cerneux (CEA)(Message direct) : thank you for the lesson
10:32:46	 De  Gabriele Fatigati : just a little question about loop collapse. If I have some initialization between loops ( es: a=0) can we collapse the loop?
10:33:37	 De  Gabriele Fatigati : ok
10:34:14	 De  cristian sommariva : thank you very much for the session!
10:34:19	 De  VITTORIO SCIORTINO : thank you very much