[[PageOutline]] = Installing and Setup WGSA in a local directory on Cypress = This instruction is based on [https://sites.google.com/site/jpopgen/wgsa/setting-up-wgsa-linux this] page and adapted for Cypress. Decide a folder dedicated for the pipeline, for example '/lustre/project/group/WGSA'. Setup an environment variable and create workspaces as {{{ export WGSA_DIR=/lustre/project/group/WGSA mkdir $WGSA_DIR cd $WGSA_DIR mkdir work mkdir tmp chmod 777 work chmod 777 tmp }}} Create a space for ANNOVAR, {{{ mkdir $WGSA_DIR/annovar2019Oct24 }}} Download the ANNOVAR main package from [http://download.openbioinformatics.org/annovar_download_form.php here]. The package comes as annovar.latest.tar.gz, save it to $WGSA_DIR/annovar2019Oct24. Unzip it. {{{ cd $WGSA_DIR/annovar2019Oct24 tar -zxvf annovar.latest.tar.gz }}} Download !RefSeq and Ensembl gene models for ANNOVAR: {{{ cd $WGSA_DIR/annovar2019Oct24/annovar perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar ensGene humandb/ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar knownGene humandb/ perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/ perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar ensGene humandb/ perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar knownGene humandb/ }}} Install !SnpEff (required for annotating indels with !SnpEff or annotating SNVs with !SnpEff on-the-fly) Download !SnpEff v4.3t main package and save the zip file to $WGSA_DIR/snpeff: {{{ mkdir $WGSA_DIR/snpeff cd $WGSA_DIR/snpeff wget http://sourceforge.net/projects/snpeff/files/snpEff_v4_3t_core.zip unzip snpEff_v4_3t_core.zip }}} To use a newer version of JavaSDK, you have to login to a computing node. Start a interactive session: {{{ idev -c 1 -t 4 }}} It will take more than one hour. See [https://wiki.hpc.tulane.edu/trac/wiki/Workshops/IntroToHpc2015/using#SubmittingInteractiveJobs here] for more about 'idev'. Once you get to a computing node, make sure your corrent directory is $WGSA_DIR/snpeff Download !RefSeq and Ensembl gene models for !SnpEff: {{{ module load java-openjdk/1.8.0 cd snpEff java -jar snpEff.jar download -v hg19 java -jar snpEff.jar download -v GRCh37.75 java -jar snpEff.jar download -v hg38 java -jar snpEff.jar download -v GRCh38.86 }}} Exit from the computing node: {{{ exit }}} Install htslib, which is required for VEP API. {{{ mkdir $WGSA_DIR/htslib cd $WGSA_DIR/htslib wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2 tar -vxjf htslib-1.9.tar.bz2 cd htslib-1.9 make prefix=$WGSA_DIR/htslib install }}} Setup the environmental variables {{{ export PATH=$WGSA_DIR/htslib/bin:$PATH export CPATH=$WGSA_DIR/htslib/include:$CPATH export LD_LIBRARY_PATH=$WGSA_DIR/htslib/lib:$LD_LIBRARY_PATH }}} Install VEP (required for annotating indels with VEP or annotating SNVs with VEP on-the-fly) Download VEP 94 main package and save it to $WGSA_DIR/vep: {{{ mkdir $WGSA_DIR/vep cd $WGSA_DIR/vep wget https://github.com/Ensembl/ensembl-vep/archive/release/94.zip unzip 94.zip }}} Install VEP API to /WGSA/vep and download !RefSeq and Ensembl gene models to $WGSA_DIR/.vep {{{ cd $WGSA_DIR/vep/ensembl-vep-release-94/ mkdir $WGSA_DIR/.vep export DEST_DIR=$WGSA_DIR export PERL5LIB=$WGSA_DIR perl INSTALL.pl -c $WGSA_DIR/.vep --ASSEMBLY GRCh37 }}} Go through the steps of the installing process and following the guidance at http://useast.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html. When being asked for the cache files, choose “242 : homo_sapiens_merged_vep_94_GRCh37.tar.gz”. When being asked for fasta files, choose “27 : homo_sapiens”. When being asked for the plugins, choose "7:LOF". The fasta file downloading is required for the current version of WGSA. *This takes very long time... {{{ perl INSTALL.pl -c $WGSA_DIR/.vep --ASSEMBLY GRCh38 }}} When being asked for the cache files, choose "243 : homo_sapiens_merged_vep_94_GRCh38.tar.gz". When being asked for fasta files, choose “54: homo_sapiens”. When being asked for the plugins, choose "n" as LOF has already been installed. *This takes very long time... Change the permissions for these directories... {{{ chmod 777 $WGSA_DIR/.vep/Plugins chmod 777 $WGSA_DIR/.vep/homo_sapiens/94_GRCh37 chmod 777 $WGSA_DIR/.vep/homo_sapiens/94_GRCh38 }}} Install LOFTEE LOF plugin for VEP API {{{ cd $WGSA_DIR/.vep/Plugins wget https://github.com/konradjk/loftee/archive/v0.1.1-beta.zip unzip -j v0.1.1-beta.zip rm v0.1.1-beta.zip }}} == Download the pipeline programs and other resources == {{{ cd $WGSA_DIR wget http://web.corral.tacc.utexas.edu/WGSAdownload/WGSA085.class mkdir $WGSA_DIR/resources cd $WGSA_DIR/resources wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/javaclass/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/hg19/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/hg38/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/precomputed/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/SpliceAI/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GRASP/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/human_ancestor_GRCh37_e71/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/Neandertal/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GWAS_catalog/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GenoCanyon/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/clinvar/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GeneHancer/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" }}} Guidance for using external resources (COSMIC, SPIDEX, CADD indel, dbNSFP) can be found [https://sites.google.com/site/jpopgen/wgsa/use-external-resources here]. === Procedure to run on Cypress === ==== 1.Prepare input files ==== Two input files are needed. One is a variant file and the other is a configuration/setting file. The standard variant file is a plain text format file with TAB-delimited columns (tsv format). An example of variant file, 'clinvar_subset.txt' can be downloaded [https://sites.google.com/site/jpopgen/wgsa/using-wgsa-via-aws/clinvar_subset.txt?attredirects=0&d=1 here]. A setting/configuration file is a plain text format file, in which the users provide information for the name of the input file, name of the output file, directory to various resources and options for annotation. Example template files can be found [https://sites.google.com/site/jpopgen/wgsa/using-wgsa-via-aws/example-config-file here]. To run the pipeline on a local machine, '''the directories settings (line 3 to 9) shall be modified''' to reflect the absolute paths to the corresponding directories on the local machine. {{{ input file name: clinvar_subset.txt #name of the input file output file name: clinvar_subset.txt.annotated #name of the output file resources dir: /lustre/project/hpcstaff/fuji/WGSA/resources/ #the location of the resouces folder annovar dir: /lustre/project/hpcstaff/fuji/WGSA/annovar2019Oct24/annovar/ #the locatio n of the ANNOVAR annotate_variation.pl snpeff dir: /lustre/project/hpcstaff/fuji/WGSA/snpeff/snpEff/ #the location of the snpEff snpEff.jar vep dir: /lustre/project/hpcstaff/fuji/WGSA/vep/ensembl-vep-release-94/ #the location of the VEP va riant_effect_predictor.pl .vep dir: /lustre/project/hpcstaff/fuji/WGSA/.vep/ #the location of the .vep folder tmp dir: /lustre/project/hpcstaff/fuji/WGSA/tmp/ #the location of the tmp folder, used for VEP on-the-fly annotation work dir: /lustre/project/hpcstaff/fuji/WGSA/work/ #the location of the working folder, used for storing intermediate files retain intermediate file: b #supported option: snp or s, indel or i, both or b, no or n ANNOVAR/Ensembl: b #supported option: snp or s, indel or i, both or b, no or n ANNOVAR/RefSeq: b #supported option: snp or s, indel or i, both or b, no or n ANNOVAR/UCSC: b #supported option: snp or s, indel or i, both or b, no or n }}} In the example above, $WGSA_DIR='/lustre/project/hpcstaff/fuji/WGSA'. ==== 2. Upload input files ==== Upload two input files to Cypress. You can place them in any directory. Here let's create a directory 'WGSA_TEST' under '/lustre/project/hpcstaff/fuji/' {{{ mkdir /lustre/project/hpcstaff/fuji/WGSA_TEST cd /lustre/project/hpcstaff/fuji/WGSA_TEST }}} See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/FileTransfer here] for the file transfer. ==== 3. Create the pipeline slurm job script ==== Example of Slurm job script is: {{{ #!/bin/bash #SBATCH --job-name=WGSA # Job Name #SBATCH --output=WGSA.out # File in which to store job output #SBATCH --error=WGSA.err # File in which to store job error messages #SBATCH --qos=normal # Quality of Service (like a queue in PBS) #SBATCH --time=0-10:00:00 # Wall clock time limit in Days-HH:MM:SS #SBATCH --nodes=1 # Node count required for the job #SBATCH --ntasks-per-node=1 # Number of tasks to be launched per Node #SBATCH --cpus-per-task=20 # Number of cores per task #SBATCH --mem=128000 # Max RAM request 128GByte # Module load module load java-openjdk/1.8.0 # Set the dirctry where WGSA installed export WGSA_DIR=/lustre/project/hpcstaff/fuji/WGSA # Set 'setting/configuration file' SETTING_FILE=test1000g-hg38-WGSA085.EC2.setting # Setup echo "Understand" | java -cp $WGSA_DIR WGSA085 $SETTING_FILE -m 128 -t 20 -v hg19 # Run job sh ./${SETTING_FILE}.sh }}} Save it with a name, for example 'Slurmscript' on the same directory where two input files are placed. ==== 4. Run the pipeline job script ==== {{{ sbatch Slurmscript }}} See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/using#SubmittingJobsonCypress here] about SLURM. It will take about 7 hours to finish.