|Version 5 (modified by 7 years ago) ( diff ),|
Programming for the Xeon Phi Coprocessor on Cypress
The Xeon Phi coprocessor is an accelerator used to provide many cores to parallel applications. While it has fewer threads available than a typical GPU accelerator, the processors are much "smarter" and the programming paradigm is similar to what you experience coding for CPUs.
Xeon Phi Coprocessor Hardware
Each compute node of Cypress is equipped with two (2) Xeon Phi 7120P coprocessors
The 7120p is equipped with
- 61 Physical cores running at 1.238 GHz
- Four (4) Hardware threads on each core
- 16GB GDDR5 memory
- Uniquely wide SIMD capabilities via 512-bit wide vectors (16 doubles!)
All this adds up to about 2GFLOPS of potential computing power.
Xeon Phi Usage Models
The intel suite provides parallel instantiations and compilers that support three distinct programming models:
- Automatic Offloading (AO) - the intel MKL library sends certain calculations to the Phi without any user input.
- Native Programming - Code is compiled to run on the Xeon Phi Coprocessor and ONLY on the Xeon Phi Coprocessor.
- Offloading - Certain Parallel sections of your source code are identified for offloading to the coprocessor. This provides the greatest amount of control and allows for the CPUs and coprocessors to work in tandem.
The number one thing to keep in mind is that all data traffic to and from the coprocessors must travel over PCIE. This is a relatively slow connection when compared to memory and the more you can minimize this communication, the faster you code will run.
We've only scratched the surface on the potential of the Xeon Phi coprocessor. If you are interested in learning more, Colfax International will be giving two days of instruction on coding for the Xeon Phi at Tulane this October. Interested parties can register at