Changes between Version 2 and Version 3 of cypress/Programming/SpeedupScaling
- Timestamp:
- 08/18/15 12:46:45 (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
cypress/Programming/SpeedupScaling
v2 v3 17 17 * When task switches occur frequently enough the illusion of parallelism is achieved. 18 18 19 [[Image(CPUpowerDensity.tiff)]]20 21 19 == Multi-core == 22 20 Now all computer has multi-core 21 22 [[Image(CPUpowerDensity.png,25%)]] 23 23 24 24 A multi-core processor is a single computing component with two or more independent actual central processing units (called "cores"), which are the units that read and execute program instructions. … … 33 33 === GPU === 34 34 A graphics processing unit (GPU), also occasionally called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel. In a personal computer, a GPU can be present on a video card, or it can be on the motherboard. 35 [[Image( )]]35 [[Image(gpu-computing-feature.jpg)]] 36 36 37 37 ==== GPU computing ==== … … 40 40 41 41 === Intel Xeon Phi === 42 Intel Many Integrated Core Architecture or Intel MIC (pronounced Mick or Mike)is a coprocessor computer architecture developed by Intel, the Teraflops Research Chip multicore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.42 Intel Many Integrated Core Architecture or Intel MIC is a coprocessor computer architecture developed by Intel, the Teraflops Research Chip multicore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor. 43 43 44 [[Image( )]]44 [[Image(6-18-201intelxeonphipciecard.jpg)]] 45 45 ---- 46 46 47 47 48 48 == Speedup Factor == 49 49 50 [[Image(Speedup_fac1.png)]] 50 * Suppose you have a code that takes ts~ seconds to run on one processor.51 51 52 * Suppose you have a code that takes Ts seconds to run on one processor. 52 53 * You find there are parallelizable sections in the code. But for the other sections, it still needs to run on a single processor. (For OpenMP) 53 54 * For MPI codes, the processes run in parallel from the beginning, but there must be some parts where all processes do same things. 54 55 * Those parts are considered as serial sections. 55 56 * The time ratio of the serial section to the total is f 57 , so time for the serial section is ft_{s} 58 and the parallelizable section is \left(1-f\right)t_{s} 59 . 60 61 * If you use n 62 processors, the time for the parallelizable section can be reduced to \left(1-f\right)t_{s}/n 63 . 64 56 * The time ratio of the serial section to the total is F, so time for the serial section is F Ts and the parallelizable section is (1-f) Ts. 57 * If you use N processors, the time for the parallelizable section can be reduced to (1-f) Ts/N. 65 58 * Here the Speedup factor is defined as 66 59 67 S \left(n\right)=\frac{\textrm{computing time on a single processor}}{\textrm{computing time on multiple processors}}=\frac{t_{s}}{ft_{s}+\left(1-f\right)t_{s}/n}=\frac{n}{1+\left(n-1\right)f}60 S(N)=(computing time on a single processor)/(computing time on multiple processors) = Ts/[F Ts+(1-f)Ts/N] = N/(1+(N-1)F) 68 61 69 62 70 63 This is the Amdahl's Law. 64 71 65 [[Image(Speedup_fac2.png)]] 72 66 73 67 == Overhead time == 74 75 68 * Since some extra procedures are required for parallelization, there are overheads of each parallel process. 76 69 * For MPI codes, the communication between processes is a major overhead. … … 79 72 80 73 [[Image(Speedup_fac3.png)]] 74