On days 4 and 5 of the NGCM Summer Academy 2016, Adrian Jackson from EPCC gave an introduction to parallel programming with the Intel Xeon Phi coprocessor, including a lot of very useful hands-on advice on serial and parallel code optimisation.
The course started with a general introduction to processors, comparing CPUs, GPUs and the Xeon Phi coprocessor. Only a small part of a typical CPU is dedicated to computing, i.e. floating point operations. Both GPUs and the Xeon Phi contain many computing cores that are used in parallel. They also have the use of graphics GDDR memory in common. The striking difference of the Xeon Phi to a GPU is the fact that it runs an operating system and can be used in a native mode, whereas graphics cards can only be used to offload computationally intensive work from the CPU.
Then Adrian went on showing some examples of the achievable performance. In general the Xeon Phi has potential for high performance, in particular at single precision. On the other hand speed-up will be worse if the code includes a lot of communication or memory access. The key to good performance is vectorisation of the code.
A significant proportion of the course included hands-on exercises for using the coprocessor in native mode as well as off-loading from the CPU. Exercises were available in both C and Fortran.
The advice on structuring a code in a way that guarantees optimal performance on the Xeon Phi spanned a wide range of topics from vectorisation over serial optimisation to specific aspects of MPI and offloading optimisation. There was also an opportunity to work on the optimisation of the participants' own code and porting it to the Xeon Phi.
All materials of the course can be found online.