Changes between Initial Version and Version 1 of cypress/about


Ignore:
Timestamp:
01/12/15 16:32:58 (10 years ago)
Author:
hoang
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • cypress/about

    v1 v1  
     1== About Cypress ==
     2
     3Cypress is Tulane's newest HPC cluster, offered by Technology Services for use by the Tulane research community.  It is a 124-node cluster, with each node providing dual 10-core 2.8 GHz Intel Xeon E5-2680 v2 CPUs, 64 GB of RAM, and dual Xeon Phi 7120P coprocessors.  Nodes are interconnected on a 40 Gigabit Ethernet network using a single Dell Z9500 Ethernet Fabric Switch.
     4
     5== Getting an account ==
     6
     7Resource allocations on Cypress will be determined by a faculty committee, which will determine which projects can utilize Cypress.
     8
     9All accounts must be part of a project.  A Tulane faculty member (the PI, principal investigator) may create a project for members of his/her research group.  Please send an email to:  hpcadmin@tulane.edu for details.
     10
     11== Logging in to Cypress after your account is created ==
     12
     13Cypress provides two login nodes:
     14
     15cypress1.tulane.edu
     16cypress2.tulane.edu
     17
     18You will need a secure shell (SSH) client to log in, and you must be on the Tulane campus network, or connected to the Tulane VPN.  Your username and password will be the same as that for your Tulane user account.  To reset your password, visit:
     19
     20https://password.tulane.edu/
     21
     22Note that this will change your password for your e-mail and Single Sign On applications.
     23
     24== Storage: home directory ==
     25
     26Your home directory on Cypress is intended to store customized source code, scripts, analyzed results, manuscripts, and other small but important files.  This directory is limited to 10 GB (10,000 MB), and is backed up.  Use the "quota" command to view your current usage.
     27
     28== Storage: Lustre filesystem ==
     29
     30Cypress has a 700 TB Lustre filesystem available for use by active jobs, and for sharing large files for active jobs within a research group.  The Lustre filesystem has 2 Object Storage Servers (OSSs) which provide file I/O for 24 Object Storage Targets (OSTs).  The Lustre filesystem is available to compute nodes via the 40 Gigabit Ethernet network.  The default stripe count is set to 1.
     31
     32Allocations on this filesystem are provided per project/research group.  Each group is given a free allocation of XXX GB on the Lustre filesystem.  If you need additional disk space to run your computations, your PI may request an adjustment.  To request an adjustment, please provide details and an estimate of the disk space used/required by your computations.  Allocations are based on demonstrated need and available resources.
     33
     34The Lustre filesystem is not for bulk or archival storage of data.  The filesystem is configurated for redundancy and hardware fault tolerance, but is not backed up.  If you require space for bulk / archival storage, please contact us, and we will take a look at the available options.
     35
     36Your group's Lustre project directory will be at:
     37
     38{{{
     39/lustre/project/<your-group-name>
     40}}}
     41
     42To view your group's current usage and quota, run the command:
     43
     44{{{
     45lfs quota -g `id -gn` /lustre
     46}}}
     47
     48To view your own usage, you can run:
     49
     50{{{
     51lfs quota -u `id -un` /lustre
     52}}}
     53
     54== SLURM (resource manager) ==
     55
     56Cypress uses SLURM to manage job submission.  For general information and documentation, see the SLURM website:
     57
     58http://slurm.schedmd.com/documentation.html
     59
     60A slurm QOS (quality of service) is similar to a queue in other resource managers (e.g. TORQUE).  Cypress has several QOSs available for jobs.  Each QOS has limits on the requestable resources, as shown below:
     61
     62||||||||= '''QOS limits''' =||
     63|| '''QOS name''' || '''maximum job size (node-hours)''' || '''walltime''' || '''maximum nodes per user''' ||
     64|| interactive || N/A ||   4 hours ||  1 ||
     65|| normal      || 288 ||  24 hours || 24 ||
     66|| long        || 168 || 168 hours ||  8 ||
     67
     68The "normal" QOS is intended for users running parallel programs.  This is the preferred type of usage to best take advantage of Cypess's parallel processing capability.  Each job has a job-size limit of 288 node-hours, calculated as the number of nodes requested multiplied by number of hours requested.  For example a job submitted to the normal QOS may request 12 nodes for 24 hours, or 24 nodes for 12 hours.  Regardless of job size, each user is limited simultaneous usage of 24 nodes (480 cores) over all his/her jobs.
     69
     70The "long" QOS is intended for jobs which are not very parallel, but have longer runtime.  The maximum walltime for the job is 168 hours (7 days), and the job size is limited to 168 node-hours.  So, for example, a job may request 1 node for 7 days, or 2 nodes for 3.5 days.
     71
     72The "interactive" QOS is intended to be used for testing SLURM script submission, and is limited to 1 node and 4 hours.
     73
     74
     75== Large memory jobs ==
     76
     77If your jobs require more than 64 GB of memory (RAM), please contact us with details of your simulations.  We have a number of large-memory nodes in an experimental/testing stage.
     78
     79== Getting help ==
     80
     81Contact us by email at:  hpcadmin@tulane.edu