Changes between Version 10 and Version 11 of cypress/WGSA

Jul 20, 2020 3:02:42 PM (19 months ago)


  • cypress/WGSA

    v10 v11  
    167167wget --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
     170Guidance for using external resources (COSMIC, SPIDEX, CADD indel, dbNSFP) can be found [ here].
     172=== Procedure to run on Cypress ===
     173==== 1.Prepare input files ====
     174 Two input files are needed. One is a variant file and the other is a configuration/setting file.
     175 The standard variant file is a plain text format file with TAB-delimited columns (tsv format).
     177An example of variant file, 'clinvar_subset.txt' can be downloaded [ here].
     179A setting/configuration file is a plain text format file, in which the users provide information for the name of the input file, name of the output file, directory to various resources and options for annotation. Example template files can be found [ here].
     181To run the pipeline on a local machine, '''the directories settings (line 3 to 9) shall be modified''' to reflect the absolute paths to the corresponding directories on the local machine.
     185input file name:                    clinvar_subset.txt                #name of the input file
     186output file name:                   clinvar_subset.txt.annotated             #name of the output file
     187resources dir:                      /lustre/project/hpcstaff/fuji/WGSA/resources/                                  #the location
     188 of the resouces folder
     189annovar dir:                        /lustre/project/hpcstaff/fuji/WGSA/annovar2019Oct24/annovar/                    #the locatio
     190n of the ANNOVAR
     191snpeff dir:                         /lustre/project/hpcstaff/fuji/WGSA/snpeff/snpEff/                              #the location
     192 of the snpEff snpEff.jar
     193vep dir:                            /lustre/project/hpcstaff/fuji/WGSA/vep/ensembl-vep-release-94/   #the location of the VEP va
     195.vep dir:                           /lustre/project/hpcstaff/fuji/WGSA/.vep/                                       #the location
     196 of the .vep folder
     197tmp dir:                            /lustre/project/hpcstaff/fuji/WGSA/tmp/                                        #the location
     198 of the tmp folder, used for VEP on-the-fly annotation
     199work dir:                           /lustre/project/hpcstaff/fuji/WGSA/work/                                       #the location
     200 of the working folder, used for storing intermediate files
     201retain intermediate file:           b                            #supported option: snp or s, indel or i, both or b, no or n
     202ANNOVAR/Ensembl:                    b                            #supported option: snp or s, indel or i, both or b, no or n
     203ANNOVAR/RefSeq:                     b                            #supported option: snp or s, indel or i, both or b, no or n
     204ANNOVAR/UCSC:                       b                            #supported option: snp or s, indel or i, both or b, no or n
     207In the example above, $WGSA_DIR='/lustre/project/hpcstaff/fuji/WGSA'.
     209==== 2. Upload input files ====
     210 Upload two input files to Cypress. You can place them in any directory. Here let's create a directory 'WGSA_TEST' under '/lustre/project/hpcstaff/fuji/'
     214mkdir /lustre/project/hpcstaff/fuji/WGSA_TEST
     215cd /lustre/project/hpcstaff/fuji/WGSA_TEST
     218==== 3. Create the pipeline slurm job script ====
     219Example of Slurm job script is:
     222#SBATCH --job-name=WGSA       # Job Name
     223#SBATCH --output=WGSA.out     # File in which to store job output
     224#SBATCH --error=WGSA.err      # File in which to store job error messages
     225#SBATCH --qos=normal          # Quality of Service (like a queue in PBS)
     226#SBATCH --time=0-10:00:00     # Wall clock time limit in Days-HH:MM:SS
     227#SBATCH --nodes=1             # Node count required for the job
     228#SBATCH --ntasks-per-node=1   # Number of tasks to be launched per Node
     229#SBATCH --cpus-per-task=20    # Number of cores per task
     230#SBATCH --mem=128000          # Max RAM request 128GByte
     232# Module load
     233module load java-openjdk/1.8.0
     235# Set the dirctry where WGSA installed
     236export WGSA_DIR=/lustre/project/hpcstaff/fuji/WGSA
     238# Set 'setting/configuration file'
     241# Setup
     242echo "Understand" | java -cp $WGSA_DIR WGSA085 $SETTING_FILE -m 128 -t 20 -v hg19
     244# Run job
     245sh ./${SETTING_FILE}.sh
     247Save it with a name, for example 'Slurmscript' on the same directory where two input files are placed.
     249==== 4. Run the pipeline job script ====
     252sbatch Slurmscript
     255It will take about 7 hours to finish.