wiki:cypress/about

Version 43 (modified by hoang, 7 years ago) ( diff )

About Cypress

Cypress is Tulane's newest HPC cluster, offered by Technology Services for use by the Tulane research community. It is a 124-node cluster, with each node providing dual 10-core 2.8 GHz Intel Xeon E5-2680 v2 CPUs, 64 GB of RAM, and dual Xeon Phi 7120P coprocessors. Nodes are interconnected on a 40 Gigabit Ethernet network using a single Dell Z9500 Ethernet Fabric Switch.

Getting an account

All accounts must be associated with a Tulane faculty member who is the Principal Investigator (PI) of his/her research group. Please have your PI contact us to establish a group on Cypress. (If some members of your group have Cypress accounts, your group is already established.) After your group is established, visit Tulane Service Now at:

https://tulane.service-now.com/

to request an account. After logging in with your Tulane credentials, go to the Self-Service section and enter the Service Catalog. Select "Account & ID Management", then "Cypress HPC Request" and fill out the form.

Logging in to Cypress after your account is created

Cypress provides two login nodes:

cypress1.tulane.edu
cypress2.tulane.edu

You will need a secure shell (SSH) client to log in. Your username and password will be the same as that for your Tulane user account. To reset your password, visit:

https://password.tulane.edu/

Note that this will also change your password for your e-mail and Single Sign On applications.

Storage: home directory

Your home directory on Cypress is intended to store customized source code, binaries, scripts, analyzed results, manuscripts, and other small but important files. This directory is limited to 10 GB (10,000 MB), and is backed up. To view your current quota and usage, run the command:

quota -f /home

Please do not use your home directory to perform simulations with heavy I/O (file read/write) usage. Instead, please use your group's Lustre project directory.

Storage: Lustre group project directory

Cypress has a 699 TB Lustre filesystem available for use by active jobs, and for sharing large files for active jobs within a research group. The Lustre filesystem has 2 Object Storage Servers (OSSs) which provide file I/O for 24 Object Storage Targets (OSTs). The Lustre filesystem is available to compute nodes via the 40 Gigabit Ethernet network. The default stripe count is set to 1.

Allocations on this filesystem are provided per project/research group. Each group is given a space allocation of 1 TB and an inode allocation of 1 million (i.e. up to 1 million files or directories) on the Lustre filesystem. If you need additional disk space to run your computations, your PI may request a quota adjustment. To request a quota adjustment, please provide details and an estimate of the disk space used/required by your computations. Allocations are based on demonstrated need and available resources.

The Lustre filesystem is not for bulk or archival storage of data. The filesystem is configured for redundancy and hardware fault tolerance, but is not backed up. If you require space for bulk / archival storage, please contact us, and we will take a look at the available options.

Your group's Lustre project directory will be at:

/lustre/project/<your-group-name>

"your-group-name" is your Linux group name, as returned by the command "id -gn". Your group is free to organize your project directory as desired, but it is recommended to create separate subfolders for different sets of data, or for different groups of simulations.

To view your group's current usage and quota, run the command:

lfs quota -g `id -gn` /lustre

To view your own usage, you can run:

lfs quota -u `id -un` /lustre

Software on Cypress

Software on Cypress is organized using the Environment Modules (modules) package. The modules command is "module". To see the available software, run:

module avail

To load a specific package for use, you can run "module add", for example:

module add intel-psxe

to load Intel Parallel Studio.

Specialized software requirements

  • Gaussian 09—Use of the Gaussian 09 software requires you to be in the gaussian group. Contact us to have your account added.

SLURM (resource manager)

Cypress uses SLURM to manage job submission. For general information and documentation, see the SLURM website:

http://slurm.schedmd.com/documentation.html

A slurm QOS (quality of service) is similar to a queue in other resource managers (e.g. TORQUE). Cypress has several QOSs available for jobs. Each QOS has limits on the requestable resources, as shown below:

QOS limits
QOS name maximum job size (node-hours) maximum walltime per job maximum nodes per user
interactive N/A 1 hour 1
normal N/A 24 hours 18
long 168 168 hours 8

The "normal" QOS is intended for users running parallel programs. This is the preferred type of usage to best take advantage of Cypess's parallel processing capability. Each user is limited to simultaneous usage of 18 nodes (360 cores) over all his/her jobs.

The "long" QOS is intended for jobs which are not very parallel, but have longer runtime. Each job has a job-size limit of 168 node-hours, calculated as the number of nodes requested multiplied by number of hours requested. For example a job submitted to the long QOS may request 1 node for 7 days, or 2 nodes for 3.5 days. You are limited to 8 nodes across all your jobs in this QOS. So, for example, you may run up to 8 jobs each using 1 node and running for 7 days.

The "interactive" QOS is intended to be used for testing SLURM script submission, and is limited to 1 job per user. To use it, set your partition and QOS to "interactive". For example, with idev:

export MY_PARTITION=interactive
export MY_QUEUE=interactive
idev

Be sure to unset these variables to resume normal job submission:

unset MY_PARTITION
unset MY_QUEUE

Other QOS's on the system are for staff testing use only.

Job scheduling and priority

We would like each of our research groups to have equal opportunity to use the cluster. Instead of giving each research group a fixed allocation of CPU-time (where the ability to run jobs is cut off after the allocation is reached), SLURM uses a "Fair-share" feature to attempt to give each research group its fair share of resources. Each job has a priority, which is a number that determines which queued jobs are to be scheduled to run first.

You may use the "sprio" command to see the priority of queued jobs. For example, the command:

sprio -o "%Y %u %i" | sort -nr

will return a list of queued jobs in priority order, and

sprio -j <jobid>

(where <jobid> should be replaced by the actual Job ID) will show the components which go into the priority. These components are:

  • Fair-share: Fair-share is based on historical usage. For details on SLURM's Fair-share implementation, see here: https://slurm.schedmd.com/priority_multifactor.html#fairshare . In short, the more CPU-time previously used, the lower the priority for subsequent jobs will (temporarily) become. SLURM has a half-life decay parameter so that more recent usage is weighted more strongly. We set this half-life on Cypress to 1 week.
  • Age: Jobs that have been waiting in the queue longer get higher priority.
  • Job Size: Larger jobs (i.e. jobs with more CPUs/nodes requested) have higher priority to favor jobs that take advantage of parallel processesing (e.g. MPI jobs).

SLURM calculates each priority component as a fraction (value between 0 and 1), which is then multiplied by a weight. The current weights are: Fair-share: 100,000; Age: 10,000; Job Size: 1,000. That is, Fair-share is the major contributor to priority. The weighted components are added to give the final priority.

Running serial jobs

If you are running a large number of serial jobs, it is recommended to submit them as a job array to make the best use of your allocated resources. For example, suppose you are running 100 serial jobs using scripts located in a "scripts" folder, each of which does a serial calculation: scripts/run1.sh, scripts/run2.sh, …, scripts/run100.sh. You would create an sbatch script named "run100scripts.srun" with contents:

#!/bin/bash 
#SBATCH -J array_example
#SBATCH --array=0-4
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --time=01:00:00

srun ./runscript.sh

The contents of the script "runscript.sh" would be:

#!/bin/bash

RUNNUMBER=$((SLURM_ARRAY_TASK_ID*SLURM_NTASKS + SLURM_PROCID + 1))
./scripts/run$RUNNUMBER.sh

Make sure your scripts have executable permissions. Then, submitting with:

sbatch run100scripts.srun

will run the 100 scripts as a 5 job array, with 20 tasks each.

Requesting memory for your job

Our standard nodes on Cypress will allow you to use up to 64 GB of memory per node (3.2 GB per core requested). This should be sufficient for many types of jobs, and you do not need to do anything if your job uses less than this amount of memory. If your jobs require more memory to run, use the --mem option of sbatch to request a larger amount of memory. For example, to request 16 GB of memory per node, put the following in your sbatch script:

#SBATCH --mem=16000

If you need more than 64 GB of memory per node, we have a few larger memory nodes available. To request 128 GB nodes for your job, put the following in your sbatch script:

#SBATCH --mem=128000

or, to request 256 GB memory nodes, use the following:

#SBATCH --mem=256000

We have a limited number of the larger memory nodes, so please only request a larger amount of memory if your job requires it. You can ask SLURM for an estimate of the amount of memory used by jobs you have previously run using sacct -j <jobid> -o maxvmsize . For example:

$ sacct -j 2660 -o maxvmsize
 MaxVMSize 
---------- 
           
 39172520K

This shows that job 2660 allocated close to 40 GB of memory.

Citation / Acknowledgment

We would greatly appreciate acknowledgment in papers, posters, presentations, or reports resulting from the use of Tulane HPC resources. We suggest something similar to the following:

"This research was supported in part using high performance computing (HPC) resources and services provided by Technology Services at Tulane University, New Orleans, LA."

We would also appreciate informing us of any publications resulting from use of Tulane HPC resources. This information is important for acquiring funding for new resources.

Getting help / Contact us

Contact us by email at: hpcadmin (at) tulane.edu

Note: See TracWiki for help on using the wiki.