Context Navigation

WhileYourJobIsRunning

Timestamp:: 01/19/2026 12:50:07 PM (7 weeks ago)
Author:: Carl Baribault
Comment:: added Example 3 case for unused node

Legend:

: Unmodified
: Added
: Removed
: Modified

Workshops/JobParallelism/WhileYourJobIsRunning

-              v2
+              v3
   ||--mem=128||
+== Example 1: an idev job's core efficiency: (actual core usage) / (requested core allocation)
+== Core efficiency for running jobs: (actual core usage) / (requested core allocation)
+=== Example 1: an idev job for an idle interactive session
 . Log in to Cypress.
 . Use the SLURM '''squeue''' command to determine your job's node list - in this case an idle interactive session.
 …
   This is quite far from the ideal value, 1 - not very good usage of the node's 20 requested cores.
 == Example 2: a running batch job's core efficiency ==
+=== Example 2: a running batch job using R requesting 1 node ===
  The following is an example of the core usage for the R sampling code (see [wiki:cypress/R#PassingSLURMEnvironmentVariables here]) requesting 16 cores.
 …
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            3289740 workshop7        R tulaneID  R       0:00      1 cypress01-009
+# note - the following result after many attempts of top with result only %CPU ~= 100.0
 [tulaneID@cypress1 R]$ssh cypress01-009 'top -b -n 1 -u $USER' | \
 awk 'NR > 7 { sum_cpu += $9; sum_mem += $10 } \
 …
 Total %CPU: 1556.6
 Total %MEM: 3.3
 }}}
 …
 }}}
  This is quite close to the ideal value, 1 - fairly good usage of the node's 16 requested cores.
+=== Example 3: same R code requesting 2 nodes - 1 node unused ===
+ The following uses the same the R sampling code as above (see [wiki:cypress/R#PassingSLURMEnvironmentVariables here]) requesting 16 cores and 2 nodes (--nodes - one of which is unused.
+{{{
+[tulaneID@cypress1 R]$diff bootstrap.sh bootstrap2nodes.sh
+c7
+< #SBATCH --nodes=1               # Number of Nodes
+---
+> #SBATCH --nodes=2               # Number of Nodes
+[tulaneID@cypress1 R]$sbatch bootstrap2nodes.sh
+Submitted batch job 3289779
+[tulaneID@cypress1 R]$squeue -u $USER
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+           3289779 workshop7        R tulaneID  R       0:03      2 cypress01-[009-010]
+[tulaneID@cypress1 R]$ssh cypress01-009 'top -b -n 1 -u $USER' | \
+awk 'NR > 7 { sum_cpu += $9; sum_mem += $10 } \
+END { print "Total %CPU:", sum_cpu; print "Total %MEM:", sum_mem }'
+Total %CPU: 1587.6
+Total %MEM: 3.3
+[tulaneID@cypress1 R]$ssh cypress01-010 'top -b -n 1 -u $USER' | \
+awk 'NR > 7 { sum_cpu += $9; sum_mem += $10 } \
+END { print "Total %CPU:", sum_cpu; print "Total %MEM:", sum_mem }'
+Total %CPU: 13.3
+Total %MEM: 0
+}}}
+ The resulting core efficiency is
+{{{
+[tulaneID@cypress1 R]$bc <<< "scale=3; (1587.6 / 100) / 16"
+.992
+[tulaneID@cypress1 R]$bc <<< "scale=3; (13.3 / 100) / 16"
+.008
+}}}
+ Result:
+ * On the first node, cypress01-009, usage is '''nearly ideal''' (.992 ~= 1.0).
+ * On the second node, cypress01-010, usage is '''nearly non-existent''' (.008 ~= 0.0).
+== Running R on multiple nodes ==
+ See also [wiki:/cypress/R#RunningRonmultiplenodes Running R on multiple nodes].