[[PageOutline]]
= Installing and Setup WGSA in a local directory on Cypress =
This instruction is based on [https://sites.google.com/site/jpopgen/wgsa/setting-up-wgsa-linux this] page and adapted for Cypress.

Decide a folder dedicated for the pipeline, for example '/lustre/project/group/WGSA'.

Setup an environment variable and create workspaces as
{{{
export WGSA_DIR=/lustre/project/group/WGSA
mkdir $WGSA_DIR
cd $WGSA_DIR
mkdir work
mkdir tmp
chmod 777 work
chmod 777 tmp
}}}

Create a space for ANNOVAR,

{{{
mkdir $WGSA_DIR/annovar2019Oct24
}}}


Download the ANNOVAR main package from [http://download.openbioinformatics.org/annovar_download_form.php here].
The package comes as annovar.latest.tar.gz, save it to $WGSA_DIR/annovar2019Oct24. Unzip it.


{{{
cd $WGSA_DIR/annovar2019Oct24
tar -zxvf annovar.latest.tar.gz
}}}

Download !RefSeq and Ensembl gene models for ANNOVAR:

{{{
cd $WGSA_DIR/annovar2019Oct24/annovar
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar ensGene humandb/
perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar knownGene humandb/
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb/
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar ensGene humandb/     
perl annotate_variation.pl -buildver hg38 -downdb -webfrom annovar knownGene humandb/    
}}}

Install !SnpEff (required for annotating indels with !SnpEff or annotating SNVs with !SnpEff on-the-fly)
Download !SnpEff v4.3t main package and save the zip file to $WGSA_DIR/snpeff:

{{{
mkdir $WGSA_DIR/snpeff
cd $WGSA_DIR/snpeff
wget http://sourceforge.net/projects/snpeff/files/snpEff_v4_3t_core.zip
unzip snpEff_v4_3t_core.zip
}}}

To use a newer version of JavaSDK, you have to login to a computing node.

Start a interactive session:

{{{
idev -c 1 -t 4
}}}
It will take more than one hour.
See [https://wiki.hpc.tulane.edu/trac/wiki/Workshops/IntroToHpc2015/using#SubmittingInteractiveJobs here] for more about 'idev'.

Once you get to a computing node, make sure your corrent directory is $WGSA_DIR/snpeff

Download !RefSeq and Ensembl gene models for !SnpEff:

{{{
module load java-openjdk/1.8.0
cd snpEff
java -jar snpEff.jar download -v hg19
java -jar snpEff.jar download -v GRCh37.75
java -jar snpEff.jar download -v hg38
java -jar snpEff.jar download -v GRCh38.86
}}}

Exit from the computing node:

{{{
exit
}}}

Install htslib, which is required for VEP API.

{{{
mkdir $WGSA_DIR/htslib
cd $WGSA_DIR/htslib
wget https://github.com/samtools/htslib/releases/download/1.9/htslib-1.9.tar.bz2
tar -vxjf htslib-1.9.tar.bz2
cd htslib-1.9
make prefix=$WGSA_DIR/htslib install
}}}

Setup the environmental variables

{{{
export PATH=$WGSA_DIR/htslib/bin:$PATH
export CPATH=$WGSA_DIR/htslib/include:$CPATH
export LD_LIBRARY_PATH=$WGSA_DIR/htslib/lib:$LD_LIBRARY_PATH
}}}


Install VEP (required for annotating indels with VEP or annotating SNVs with VEP on-the-fly)

Download VEP 94 main package and save it to $WGSA_DIR/vep:

{{{
mkdir $WGSA_DIR/vep
cd $WGSA_DIR/vep
wget https://github.com/Ensembl/ensembl-vep/archive/release/94.zip
unzip 94.zip
}}}

Install VEP API to /WGSA/vep and download !RefSeq and Ensembl gene models to $WGSA_DIR/.vep
{{{
cd $WGSA_DIR/vep/ensembl-vep-release-94/
mkdir $WGSA_DIR/.vep
export DEST_DIR=$WGSA_DIR
export PERL5LIB=$WGSA_DIR
perl INSTALL.pl -c $WGSA_DIR/.vep --ASSEMBLY GRCh37
}}}
Go through the steps of the installing process and following the guidance at http://useast.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html. When being asked for the cache files, choose “242 : homo_sapiens_merged_vep_94_GRCh37.tar.gz”. When being asked for fasta files, choose “27 : homo_sapiens”. When being asked for the plugins, choose "7:LOF". The fasta file downloading is required for the current version of WGSA.

*This takes very long time...

{{{
perl INSTALL.pl -c $WGSA_DIR/.vep --ASSEMBLY GRCh38
}}}
When being asked for the cache files, choose "243 : homo_sapiens_merged_vep_94_GRCh38.tar.gz". When being asked for fasta files, choose “54: homo_sapiens”. When being asked for the plugins, choose "n" as LOF has already been installed.

*This takes very long time...

Change the permissions for these directories...
{{{
chmod 777 $WGSA_DIR/.vep/Plugins
chmod 777 $WGSA_DIR/.vep/homo_sapiens/94_GRCh37
chmod 777 $WGSA_DIR/.vep/homo_sapiens/94_GRCh38
}}}

Install LOFTEE LOF plugin for VEP API
{{{
cd $WGSA_DIR/.vep/Plugins
wget https://github.com/konradjk/loftee/archive/v0.1.1-beta.zip
unzip -j v0.1.1-beta.zip
rm v0.1.1-beta.zip
}}}

== Download the pipeline programs and other resources ==
{{{
cd $WGSA_DIR
wget http://web.corral.tacc.utexas.edu/WGSAdownload/WGSA085.class
mkdir $WGSA_DIR/resources
cd $WGSA_DIR/resources
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/javaclass/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/hg19/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/hg38/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*" 
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/precomputed/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/SpliceAI/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GRASP/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/human_ancestor_GRCh37_e71/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/Neandertal/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GWAS_catalog/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GenoCanyon/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/clinvar/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
wget http://web.corral.tacc.utexas.edu/WGSAdownload/resources/GeneHancer/ --recursive --continue --timestamping --no-host-directories --cut-dirs=2 --no-parent --reject="index.html*"
}}}

Guidance for using external resources (COSMIC, SPIDEX, CADD indel, dbNSFP) can be found [https://sites.google.com/site/jpopgen/wgsa/use-external-resources here].

=== Procedure to run on Cypress ===
==== 1.Prepare input files ====
 Two input files are needed. One is a variant file and the other is a configuration/setting file.
 The standard variant file is a plain text format file with TAB-delimited columns (tsv format). 

An example of variant file, 'clinvar_subset.txt' can be downloaded [https://sites.google.com/site/jpopgen/wgsa/using-wgsa-via-aws/clinvar_subset.txt?attredirects=0&d=1 here]. 

A setting/configuration file is a plain text format file, in which the users provide information for the name of the input file, name of the output file, directory to various resources and options for annotation. Example template files can be found [https://sites.google.com/site/jpopgen/wgsa/using-wgsa-via-aws/example-config-file here]. 

To run the pipeline on a local machine, '''the directories settings (line 3 to 9) shall be modified''' to reflect the absolute paths to the corresponding directories on the local machine.


{{{
input file name:                    clinvar_subset.txt                #name of the input file
output file name:                   clinvar_subset.txt.annotated             #name of the output file
resources dir:                      /lustre/project/hpcstaff/fuji/WGSA/resources/                                  #the location
 of the resouces folder
annovar dir:                        /lustre/project/hpcstaff/fuji/WGSA/annovar2019Oct24/annovar/                    #the locatio
n of the ANNOVAR annotate_variation.pl
snpeff dir:                         /lustre/project/hpcstaff/fuji/WGSA/snpeff/snpEff/                              #the location
 of the snpEff snpEff.jar
vep dir:                            /lustre/project/hpcstaff/fuji/WGSA/vep/ensembl-vep-release-94/   #the location of the VEP va
riant_effect_predictor.pl
.vep dir:                           /lustre/project/hpcstaff/fuji/WGSA/.vep/                                       #the location
 of the .vep folder
tmp dir:                            /lustre/project/hpcstaff/fuji/WGSA/tmp/                                        #the location
 of the tmp folder, used for VEP on-the-fly annotation
work dir:                           /lustre/project/hpcstaff/fuji/WGSA/work/                                       #the location
 of the working folder, used for storing intermediate files
retain intermediate file:           b                            #supported option: snp or s, indel or i, both or b, no or n
ANNOVAR/Ensembl:                    b                            #supported option: snp or s, indel or i, both or b, no or n
ANNOVAR/RefSeq:                     b                            #supported option: snp or s, indel or i, both or b, no or n
ANNOVAR/UCSC:                       b                            #supported option: snp or s, indel or i, both or b, no or n
}}}

In the example above, $WGSA_DIR='/lustre/project/hpcstaff/fuji/WGSA'.

==== 2. Upload input files ====
 Upload two input files to Cypress. You can place them in any directory. Here let's create a directory 'WGSA_TEST' under '/lustre/project/hpcstaff/fuji/'


{{{
mkdir /lustre/project/hpcstaff/fuji/WGSA_TEST
cd /lustre/project/hpcstaff/fuji/WGSA_TEST
}}}
See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/FileTransfer here] for the file transfer.

==== 3. Create the pipeline slurm job script ====
Example of Slurm job script is: 
{{{
#!/bin/bash
#SBATCH --job-name=WGSA       # Job Name
#SBATCH --output=WGSA.out     # File in which to store job output
#SBATCH --error=WGSA.err      # File in which to store job error messages
#SBATCH --qos=normal          # Quality of Service (like a queue in PBS)
#SBATCH --time=0-10:00:00     # Wall clock time limit in Days-HH:MM:SS
#SBATCH --nodes=1             # Node count required for the job
#SBATCH --ntasks-per-node=1   # Number of tasks to be launched per Node
#SBATCH --cpus-per-task=20    # Number of cores per task
#SBATCH --mem=128000          # Max RAM request 128GByte

# Module load
module load java-openjdk/1.8.0 

# Set the dirctry where WGSA installed
export WGSA_DIR=/lustre/project/hpcstaff/fuji/WGSA

# Set 'setting/configuration file'
SETTING_FILE=test1000g-hg38-WGSA085.EC2.setting

# Setup
echo "Understand" | java -cp $WGSA_DIR WGSA085 $SETTING_FILE -m 128 -t 20 -v hg19

# Run job
sh ./${SETTING_FILE}.sh
}}}
Save it with a name, for example 'Slurmscript' on the same directory where two input files are placed.

==== 4. Run the pipeline job script ====

{{{
sbatch Slurmscript
}}}
See [https://wiki.hpc.tulane.edu/trac/wiki/cypress/using#SubmittingJobsonCypress here] about SLURM.
It will take about 7 hours to finish.