Changes between Version 9 and Version 10 of cypress/using

Apr 8, 2015 4:07:59 PM (7 years ago)


  • cypress/using

    v9 v10  
    2626For those who are new to cluster computing and resource management, let's begin with an explanation of what a resource manager is and why it is necessary. Suppose you have a piece of C code that you would like to compile and execute, for example a helloworld program.
    28 [[Image(helloworld.png, 50%)]]
     31int main(){
     32             printf("Hello World\n");
     33             return 0;
    3137On your desktop you would open a terminal, compile the code using your favorite c compiler and execute the code. You can do this without worry as you are the only person using your computer and you know what demands are being made on your CPU and memory at the time you run your code. On a cluster, many users must share the available resources equitably and simultaneously. It's the job of the resource manager to choreograph this sharing of resources by accepting a description of your program and the resources it requires, searching the available hardware for resources that meet your requirements, and making sure that no one else is given those resources while you are using them.
    3743The syntax of these script directives is manager specific. For the SLURM resource manager, all script directives begin with "#SBATCH". Let's look at a basic SLURM script requesting one node and one core on which to run our helloworld program.
    39  [[Image(hello_srun.png, 50%)]]
     47#SBATCH --job-name=HiWorld    ### Job Name
     48#SBATCH --output=Hi.out       ### File in which to store job output
     49#SBATCH --error=Hi.err        ### File in which to store job error messages
     50#SBATCH --qos=normal          ### Quality of Service (like a queue in PBS)
     51#SBATCH --time=0-00:01:00     ### Wall clock time limit in Days-HH:MM:SS
     52#SBATCH --nodes=1             ### Node count required for the job
     53#SBATCH --ntasks-per-node=1   ### Nuber of tasks to be launched per Node
    4258Notice that the SLURM script begins with #!/bin/bash. This tells the Linux shell what flavor shell interpreter to run. In this example we use BASh (Bourne Again Shell). The choice of interpreter (and subsequent syntax) is up to the user, but every SLURM script should begin this way. This is followed by a collection of #SBATCH script directives telling the manager about the resources needed by our code and where to put the codes output. Lastly, we have the executable we wish the manager to run (note: this script assumes it is located in the same directory as the executable).
    4864Our job was successfully submitted and was assigned the job number 6041. We can check the output of our job by examining the contents of our output and error files. Referring back to the helloworld.srun SLURM script, notice the lines
    50 [[Image(output_error.png, 50%)]]
     67#SBATCH --output=Hi.out       ### File in which to store job output
     68#SBATCH --error=Hi.err        ### File in which to store job error messages
    5271These specify files in which to store the output written to standard out and standard error, respectively. If our code ran without issue, then the Hi.err file should be empty and the Hi.out file should contain our greeting.