91 | | |
92 | | === Running MATLAB in Parallel with Multithreads === |
93 | | MATLAB supports multithreaded computation for a number of functions and expressions that are combinations of element-wise functions. |
94 | | These functions automatically execute on multiple threads if data size is large enough. |
95 | | Note that on Cypress, in default, MATLAB runs with a single threads, and you have to explicitly specify the number of threads in your code. |
96 | | For example, |
97 | | {{{#!matlab |
98 | | % Matlab Test Code "FuncTest.m" |
99 | | % |
100 | | LASTN = maxNumCompThreads(str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
101 | | nth = maxNumCompThreads; |
102 | | fprintf('Number of Threads = %d.\n',nth); |
103 | | |
104 | | N=2^(14); |
105 | | A = randn(N); |
106 | | st = cputime; |
107 | | tic; |
108 | | B = sin(A); |
109 | | realT = toc; |
110 | | cpuT = cputime -st; |
111 | | fprintf('Real Time = %f(sec)\n',realT); |
112 | | fprintf('CPU Time = %f(sec)\n',cpuT); |
113 | | fprintf('Ratio = %f\n',cpuT / realT); |
114 | | }}} |
115 | | |
116 | | In above code, the line, |
117 | | {{{#!matlab |
118 | | LASTN = maxNumCompThreads(str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
119 | | }}} |
120 | | defines the number of threads. |
121 | | The environmental variable, '''SLURM_JOB_CPUS_PER_NODE''' has the value set in SLURM script, for example, |
122 | | {{{#!bash |
123 | | #!/bin/bash |
124 | | #SBATCH --qos=normal # Quality of Service |
125 | | #SBATCH --job-name=matlabMT # Job Name |
126 | | #SBATCH --time=1:00:00 # WallTime |
127 | | #SBATCH --nodes=1 # Number of Nodes |
128 | | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
129 | | #SBATCH --cpus-per-task=10 # Number of threads per task (OMP threads) |
130 | | |
131 | | module load matlab |
132 | | matlab -nodesktop -nodisplay -nosplash -r "FuncTest; exit;" |
133 | | }}} |
134 | | The number of cores per process (task) is set by '''--cpus-per-task=10'''. |
135 | | This value goes to '''SLURM_JOB_CPUS_PER_NODE''' and you can use it to determine the number of threads used in the code. |
136 | | |
137 | | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
138 | | |
139 | | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
140 | | |
141 | | ==== Explicit parallelism ==== |
142 | | The ''parallel computing toolbox'' is available on Cypress. |
143 | | You can use up to 12 workers for shared parallel operations on a single node in the current MATLAB version. |
144 | | Our license does not include MATLAB Distributed Computing Server. Therefore, multi-node parallel operations are not supported. |
145 | | |
146 | | Workers are like independent processes. If you want to use 4 workers, you have to request at least 4 tasks within a node. |
147 | | |
148 | | [[Image(MatlabWorkers.jpeg)]] |
149 | | |
150 | | {{{#!bash |
151 | | #!/bin/bash |
152 | | #SBATCH --qos=normal # Quality of Service |
153 | | #SBATCH --job-name=matlabPool # Job Name |
154 | | #SBATCH --time=1:00:00 # WallTime |
155 | | #SBATCH --nodes=1 # Number of Nodes |
156 | | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
157 | | #SBATCH --cpus-per-task=4 # Number of threads per task (OMP threads) |
158 | | |
159 | | module load matlab |
160 | | matlab -nodesktop -nodisplay -nosplash -r "CreateWorker; ParforTest; exit;" |
161 | | }}} |
162 | | |
163 | | ''!CreateWorker.m'' is a Matlab code to create workers. |
164 | | {{{#!matlab |
165 | | % Parallel Tool Box Test "CreateWorker.m" |
166 | | % |
167 | | if isempty(getenv('SLURM_JOB_CPUS_PER_NODE')) |
168 | | nWorker = 1; |
169 | | else |
170 | | nWorker = min(12,str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
171 | | end |
172 | | % Create Workers |
173 | | parpool(nWorker); |
174 | | % |
175 | | }}} |
176 | | |
177 | | ''Parfor.m'' is a sample 'parfor' test code, |
178 | | {{{#!matlab |
179 | | % parfor "ParforTest.m" |
180 | | % |
181 | | iter = 10000; |
182 | | sz = 50; |
183 | | a = zeros(1,iter); |
184 | | % |
185 | | fprintf('Computing...\n'); |
186 | | tic; |
187 | | parfor i = 1:iter |
188 | | a(i) = max(svd(randn(sz))); |
189 | | end |
190 | | toc; |
191 | | % |
192 | | poolobj = gcp('nocreate'); % Returns the current pool if one exists. If no pool, do not create new one. |
193 | | if isempty(poolobj) |
194 | | poolobj = gcp; |
195 | | end |
196 | | fprintf('Number of Workers = %d.\n',poolobj.NumWorkers); |
197 | | % |
198 | | }}} |
199 | | |
200 | | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
201 | | |
202 | | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
203 | | |
204 | | === Running MATLAB with Automatic Offload === |
205 | | Internally MATLAB uses Intel MKL Basic Linear Algebra Subroutines (BLAS) and Linear Algebra package (LAPACK) routines to perform the underlying computations when running on Intel processors. |
206 | | |
207 | | Intel MKL includes Automatic Offload (AO) feature that enables computationally intensive Intel MKL functions to offload partial workload to attached '''Intel Xeon Phi''' coprocessors automatically and transparently. |
208 | | |
209 | | As a result, MATLAB performance can benefit from Intel Xeon Phi coprocessors via the Intel MKL AO feature when problem sizes are large enough to amortize the cost of transferring data to the coprocessors. |
210 | | |
211 | | In SLURM script, make sure that option '''--gres=mic:1''' is set and ''intel-psxe'' module as well as the MATLAB module has been loaded. |
212 | | |
213 | | {{{#!bash |
214 | | #!/bin/bash |
215 | | #SBATCH --qos=normal # Quality of Service |
216 | | #SBATCH --job-name=matlabAO # Job Name |
217 | | #SBATCH --time=1:00:00 # WallTime |
218 | | #SBATCH --nodes=1 # Number of Nodes |
219 | | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
220 | | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
221 | | #SBATCH --gres=mic:1 # Number of Co-Processors |
222 | | |
223 | | module load matlab |
224 | | module load intel-psxe |
225 | | |
226 | | export MKL_MIC_ENABLE=1 |
227 | | matlab -nodesktop -nodisplay -nosplash -r "MatTest; exit;" |
228 | | }}} |
229 | | |
230 | | Note that |
231 | | {{{#!bash |
232 | | export MKL_MIC_ENABLE=1 |
233 | | }}} |
234 | | enables Intel MKL Automatic Offload (AO). |
235 | | |
236 | | The sample cose is below: |
237 | | {{{#!matlab |
238 | | % |
239 | | % Matrix test "MatTest.m" |
240 | | % |
241 | | A = rand(10000, 10000); |
242 | | B = rand(10000, 10000); |
243 | | tic; |
244 | | C = A * B; |
245 | | realT = toc; |
246 | | fprintf('Real Time = %f(sec)\n',realT); |
247 | | }}} |
248 | | |
249 | | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
250 | | |
251 | | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
252 | | |
253 | | |
254 | | ---- |
255 | | |
| 132 | |
| 133 | |
| 134 | |
| 135 | === Running MATLAB in Parallel with Multithreads === |
| 136 | MATLAB supports multithreaded computation for a number of functions and expressions that are combinations of element-wise functions. |
| 137 | These functions automatically execute on multiple threads if data size is large enough. |
| 138 | Note that on Cypress, in default, MATLAB runs with a single threads, and you have to explicitly specify the number of threads in your code. |
| 139 | For example, |
| 140 | {{{#!matlab |
| 141 | % Matlab Test Code "FuncTest.m" |
| 142 | % |
| 143 | LASTN = maxNumCompThreads(str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
| 144 | nth = maxNumCompThreads; |
| 145 | fprintf('Number of Threads = %d.\n',nth); |
| 146 | |
| 147 | N=2^(14); |
| 148 | A = randn(N); |
| 149 | st = cputime; |
| 150 | tic; |
| 151 | B = sin(A); |
| 152 | realT = toc; |
| 153 | cpuT = cputime -st; |
| 154 | fprintf('Real Time = %f(sec)\n',realT); |
| 155 | fprintf('CPU Time = %f(sec)\n',cpuT); |
| 156 | fprintf('Ratio = %f\n',cpuT / realT); |
| 157 | }}} |
| 158 | |
| 159 | In above code, the line, |
| 160 | {{{#!matlab |
| 161 | LASTN = maxNumCompThreads(str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
| 162 | }}} |
| 163 | defines the number of threads. |
| 164 | The environmental variable, '''SLURM_JOB_CPUS_PER_NODE''' has the value set in SLURM script, for example, |
| 165 | {{{#!bash |
| 166 | #!/bin/bash |
| 167 | #SBATCH --qos=normal # Quality of Service |
| 168 | #SBATCH --job-name=matlabMT # Job Name |
| 169 | #SBATCH --time=1:00:00 # WallTime |
| 170 | #SBATCH --nodes=1 # Number of Nodes |
| 171 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 172 | #SBATCH --cpus-per-task=10 # Number of threads per task (OMP threads) |
| 173 | |
| 174 | module load matlab |
| 175 | matlab -nodesktop -nodisplay -nosplash -r "FuncTest; exit;" |
| 176 | }}} |
| 177 | The number of cores per process (task) is set by '''--cpus-per-task=10'''. |
| 178 | This value goes to '''SLURM_JOB_CPUS_PER_NODE''' and you can use it to determine the number of threads used in the code. |
| 179 | |
| 180 | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
| 181 | |
| 182 | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
| 183 | |
| 184 | ==== Explicit parallelism ==== |
| 185 | The ''parallel computing toolbox'' is available on Cypress. |
| 186 | You can use up to 12 workers for shared parallel operations on a single node in the current MATLAB version. |
| 187 | Our license does not include MATLAB Distributed Computing Server. Therefore, multi-node parallel operations are not supported. |
| 188 | |
| 189 | Workers are like independent processes. If you want to use 4 workers, you have to request at least 4 tasks within a node. |
| 190 | |
| 191 | [[Image(MatlabWorkers.jpeg)]] |
| 192 | |
| 193 | {{{#!bash |
| 194 | #!/bin/bash |
| 195 | #SBATCH --qos=normal # Quality of Service |
| 196 | #SBATCH --job-name=matlabPool # Job Name |
| 197 | #SBATCH --time=1:00:00 # WallTime |
| 198 | #SBATCH --nodes=1 # Number of Nodes |
| 199 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 200 | #SBATCH --cpus-per-task=4 # Number of threads per task (OMP threads) |
| 201 | |
| 202 | module load matlab |
| 203 | matlab -nodesktop -nodisplay -nosplash -r "CreateWorker; ParforTest; exit;" |
| 204 | }}} |
| 205 | |
| 206 | ''!CreateWorker.m'' is a Matlab code to create workers. |
| 207 | {{{#!matlab |
| 208 | % Parallel Tool Box Test "CreateWorker.m" |
| 209 | % |
| 210 | if isempty(getenv('SLURM_JOB_CPUS_PER_NODE')) |
| 211 | nWorker = 1; |
| 212 | else |
| 213 | nWorker = min(12,str2num(getenv('SLURM_JOB_CPUS_PER_NODE'))); |
| 214 | end |
| 215 | % Create Workers |
| 216 | parpool(nWorker); |
| 217 | % |
| 218 | }}} |
| 219 | |
| 220 | ''Parfor.m'' is a sample 'parfor' test code, |
| 221 | {{{#!matlab |
| 222 | % parfor "ParforTest.m" |
| 223 | % |
| 224 | iter = 10000; |
| 225 | sz = 50; |
| 226 | a = zeros(1,iter); |
| 227 | % |
| 228 | fprintf('Computing...\n'); |
| 229 | tic; |
| 230 | parfor i = 1:iter |
| 231 | a(i) = max(svd(randn(sz))); |
| 232 | end |
| 233 | toc; |
| 234 | % |
| 235 | poolobj = gcp('nocreate'); % Returns the current pool if one exists. If no pool, do not create new one. |
| 236 | if isempty(poolobj) |
| 237 | poolobj = gcp; |
| 238 | end |
| 239 | fprintf('Number of Workers = %d.\n',poolobj.NumWorkers); |
| 240 | % |
| 241 | }}} |
| 242 | |
| 243 | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
| 244 | |
| 245 | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
| 246 | |
| 247 | === Running MATLAB with Automatic Offload === |
| 248 | Internally MATLAB uses Intel MKL Basic Linear Algebra Subroutines (BLAS) and Linear Algebra package (LAPACK) routines to perform the underlying computations when running on Intel processors. |
| 249 | |
| 250 | Intel MKL includes Automatic Offload (AO) feature that enables computationally intensive Intel MKL functions to offload partial workload to attached '''Intel Xeon Phi''' coprocessors automatically and transparently. |
| 251 | |
| 252 | As a result, MATLAB performance can benefit from Intel Xeon Phi coprocessors via the Intel MKL AO feature when problem sizes are large enough to amortize the cost of transferring data to the coprocessors. |
| 253 | |
| 254 | In SLURM script, make sure that option '''--gres=mic:1''' is set and ''intel-psxe'' module as well as the MATLAB module has been loaded. |
| 255 | |
| 256 | {{{#!bash |
| 257 | #!/bin/bash |
| 258 | #SBATCH --qos=normal # Quality of Service |
| 259 | #SBATCH --job-name=matlabAO # Job Name |
| 260 | #SBATCH --time=1:00:00 # WallTime |
| 261 | #SBATCH --nodes=1 # Number of Nodes |
| 262 | #SBATCH --ntasks-per-node=1 # Number of tasks (MPI processes) |
| 263 | #SBATCH --cpus-per-task=1 # Number of threads per task (OMP threads) |
| 264 | #SBATCH --gres=mic:1 # Number of Co-Processors |
| 265 | |
| 266 | module load matlab |
| 267 | module load intel-psxe |
| 268 | |
| 269 | export MKL_MIC_ENABLE=1 |
| 270 | matlab -nodesktop -nodisplay -nosplash -r "MatTest; exit;" |
| 271 | }}} |
| 272 | |
| 273 | Note that |
| 274 | {{{#!bash |
| 275 | export MKL_MIC_ENABLE=1 |
| 276 | }}} |
| 277 | enables Intel MKL Automatic Offload (AO). |
| 278 | |
| 279 | The sample cose is below: |
| 280 | {{{#!matlab |
| 281 | % |
| 282 | % Matrix test "MatTest.m" |
| 283 | % |
| 284 | A = rand(10000, 10000); |
| 285 | B = rand(10000, 10000); |
| 286 | tic; |
| 287 | C = A * B; |
| 288 | realT = toc; |
| 289 | fprintf('Real Time = %f(sec)\n',realT); |
| 290 | }}} |
| 291 | |
| 292 | '''Note : Since the number of license is limited, it is recommended to compile Matlab code and make an executable.''' |
| 293 | |
| 294 | See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/Matlab#CompiledMatlab] |
| 295 | |