wiki:cypress/about

Version 22 (modified by hoang, 10 years ago) ( diff )

About Cypress

Cypress is Tulane's newest HPC cluster, offered by Technology Services for use by the Tulane research community. It is a 124-node cluster, with each node providing dual 10-core 2.8 GHz Intel Xeon E5-2680 v2 CPUs, 64 GB of RAM, and dual Xeon Phi 7120P coprocessors. Nodes are interconnected on a 40 Gigabit Ethernet network using a single Dell Z9500 Ethernet Fabric Switch.

Getting an account

Resource allocations on Cypress will be determined by a faculty committee, which will determine which projects can utilize Cypress.

All accounts must be part of a project. A Tulane faculty member (the PI, principal investigator) may create a project for members of his/her research group. Please send an email to: hpcadmin (at) tulane.edu for details.

Logging in to Cypress after your account is created

Cypress provides two login nodes:

cypress1.tulane.edu
cypress2.tulane.edu

You will need a secure shell (SSH) client to log in. Your username and password will be the same as that for your Tulane user account. To reset your password, visit:

https://password.tulane.edu/

Note that this will also change your password for your e-mail and Single Sign On applications.

Storage: home directory

Your home directory on Cypress is intended to store customized source code, binaries, scripts, analyzed results, manuscripts, and other small but important files. This directory is limited to 10 GB (10,000 MB), and is backed up. Use the "quota" command to view your current usage.

Storage: Lustre filesystem

Cypress has a 699 TB Lustre filesystem available for use by active jobs, and for sharing large files for active jobs within a research group. The Lustre filesystem has 2 Object Storage Servers (OSSs) which provide file I/O for 24 Object Storage Targets (OSTs). The Lustre filesystem is available to compute nodes via the 40 Gigabit Ethernet network. The default stripe count is set to 1.

Allocations on this filesystem are provided per project/research group. Each group is given a free allocation of XXX GB on the Lustre filesystem. If you need additional disk space to run your computations, your PI may request a quota adjustment. To request a quota adjustment, please provide details and an estimate of the disk space used/required by your computations. Allocations are based on demonstrated need and available resources.

The Lustre filesystem is not for bulk or archival storage of data. The filesystem is configured for redundancy and hardware fault tolerance, but is not backed up. If you require space for bulk / archival storage, please contact us, and we will take a look at the available options.

Your group's Lustre project directory will be at:

/lustre/project/<your-group-name>

"your-group-name" is your Linux group name, as returned by the command "id -gn". Your group is free to organize your project directory as desired, but it is recommended to create separate subfolders for different sets of data, or for different groups of simulations.

To view your group's current usage and quota, run the command:

lfs quota -g `id -gn` /lustre

To view your own usage, you can run:

lfs quota -u `id -un` /lustre

Software on Cypress

Software on Cypress is organized using the Environment Modules (modules) package. The modules command is "module". To see the available software, run:

module avail

To load a specific package for use, you can run "module add", for example:

module add intel-psxe

to load Intel Parallel Studio.

SLURM (resource manager)

Cypress uses SLURM to manage job submission. For general information and documentation, see the SLURM website:

http://slurm.schedmd.com/documentation.html

To submit jobs using SLURM, you must first load the slurm module. To do this, on a login node run:

module add slurm

A slurm QOS (quality of service) is similar to a queue in other resource managers (e.g. TORQUE). Cypress has several QOSs available for jobs. Each QOS has limits on the requestable resources, as shown below:

QOS limits
QOS name maximum job size (node-hours) maximum walltime per job maximum nodes per user
interactive N/A 4 hours 1
normal N/A 24 hours 12
long 168 168 hours 8

The "normal" QOS is intended for users running parallel programs. This is the preferred type of usage to best take advantage of Cypess's parallel processing capability. Each user is limited to simultaneous usage of 12 nodes (240 cores) over all his/her jobs.

The "long" QOS is intended for jobs which are not very parallel, but have longer runtime. Each job has a job-size limit of 168 node-hours, calculated as the number of nodes requested multiplied by number of hours requested. For example a job submitted to the long QOS may request 1 node for 7 days, or 2 nodes for 3.5 days. You are limited to 8 nodes across all your jobs in this QOS. So, for example, you may run up to 8 jobs each using 1 node and running for 7 days.

The "interactive" QOS is intended to be used for testing SLURM script submission, and is limited to 1 node and 4 hours.

Other QOS's on the system are for staff testing use only.

Requesting memory for your job

If your jobs require a significant amount of memory (approximately more than 16 GB per node), it is recommended that you explicitly request the amount of memory desired. To do this, use the --mem option of sbatch and specify the number of megabytes (MB) of memory per node required. For example, in your sbatch script, you would include the following line to request 32 GB of memory per node:

#SBATCH --mem=32000

You can request up to 64 GB of memory per node for your jobs in this way.

If your jobs require more than 64 GB of memory per node, we have some large memory nodes in an experimental stage that are available for testing. To use these large memory nodes, you will need to add to your job a request for the "bigmem" partition. You would use the following to request 128 GB of memory per node (this is the maximum available at this time):

#SBATCH --mem=128000
#SBATCH --partition=bigmem

Due to the very limited number of large memory nodes at this time, only the "interactive", and "normal" QOS's are available when using the "bigmem" partition.

Citation / Acknowledgment

We would greatly appreciate acknowledgment in papers, posters, presentations, or reports resulting from the use of Tulane HPC resources. We suggest something similar to the following:

"This research was supported in part using computational resources and services provided by Tulane University, New Orleans, LA."

We would also appreciate informing us of any publications resulting from use of Tulane HPC resources. This information is important for acquiring funding for new resources.

Getting help

Contact us by email at: hpcadmin (at) tulane.edu

Note: See TracWiki for help on using the wiki.