= Example Pipeline for Data Transfer Using Globus and Computation in Batch Jobs = Here we consider a pipeline to do the following processes. 1. Transfer data files from Box to the Cypress Lustre directory. 2. Perform computation using the data. 3. Transfer the results to Box. 4. Delete files in Cypress Lustre. == Scripts == === Job Submission Script === '''submitJob.sh''' {{{ # # Pipeline for Data Transfer Using Globus and Computation # # Job name JOB_NAME="COMPUTING1" # Set path export BOX_DATA_DIR="/Test/" export CYPRESS_WORK_DIR="/lustre/project/group/userid/test/" export BOX_RESULT_DIR="/Test_result/" # # Submit a job to transfer data from Box to Cypress JOB1=`sbatch --job-name=${JOB_NAME}_DL ./transferData.sh DOWNLOAD KEEP | awk '{print $4}'`; echo $JOB1 "Submitted" # Submit a job to process data on Cypress JOB2=`sbatch --job-name=${JOB_NAME} --dependency=afterok:$JOB1 ./computing.sh | awk '{print $4}'`; echo $JOB2 "Submitted" # Submit a job to transfer data from Cypress to Box JOB3=`sbatch --job-name=${JOB_NAME}_UL --dependency=afterok:$JOB2 ./transferData.sh UPLOAD DELETE | awk '{print $4}'`; echo $JOB3 "Submitted" }}} '''JOB_NAME''' is the job name. '''BOX_DATA_DIR''' is the directory in Box where the source data is stored. '''CYPRESS_WORK_DIR''' is the directory where the downloaded data is stored. '''BOX_RESULT_DIR''' is the directory where results are uploaded in Box. === Data Transfer Script === '''transferData.sh''' {{{ #!/bin/bash #SBATCH --partition=centos7 #SBATCH --qos=long #SBATCH --time=7-00:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 # Check options if [ $# -ne 2 ]; then echo 'Usage: transferData.sh [DOWNLOAD | UPLOAD] [KEEP | DELETE]' exit 1 fi # Check path if [[ -z "${BOX_DATA_DIR}" ]]; then echo "ERROR! BOX_DATA_DIR isn't set." exit 1 fi if [[ -z "${CYPRESS_WORK_DIR}" ]]; then echo "ERROR! CYPRESS_WORK_DIR isn't set." exit 1 fi if [[ -z "${BOX_RESULT_DIR}" ]]; then echo "ERROR! BOX_RESULT_DIR isn't set." exit 1 fi # Start Globus Connect module load globusconnectpersonal/3.2.5 globusconnect -start & # Set up CLI environment source activate globus-cli # Obtain local UUID MY_UUID=$(globus endpoint local-id) uuid_code=$? if [ $uuid_code -ne 0 ]; then echo "ERROR! Globus Connect isn't activated." globusconnect -stop exit 1 fi # Make the source and destination path if [[ "$1" == "DOWNLOAD" ]]; then SOURCE_EP=$TULANE_BOX:$BOX_DATA_DIR DEST_EP=$MY_UUID:$CYPRESS_WORK_DIR else SOURCE_EP=$MY_UUID:$CYPRESS_WORK_DIR DEST_EP=$TULANE_BOX:$BOX_DATA_DIR fi # Check logged in to Globus output=$(globus whoami >/dev/null 2>&1) output_code=$? if [ $output_code -ne 0 ]; then echo "ERROR! Not logged in to Globus" globusconnect -stop exit 1 fi task_id=$(globus transfer "$SOURCE_EP" "$DEST_EP" --label "$SLURM_JOB_NAME" | tail -1 | awk '{print $3}') output_code=$? if [ $output_code -ne 0 ]; then echo "ERROR! The transfer of data in could not be started." globusconnect -stop exit 1 fi # wait util the task done. output=$(globus task wait $task_id) output_code=$? if [ $output_code -ne 0 ]; then echo "ERROR! The transfer of data was failed." globus task cancel $task_id globusconnect -stop exit 1 fi # Check if the delete option is set if [[ "$2" == "DELETE" ]]; then task_id=$(globus rm --recursive $SOURCE_EP |& awk '{print $6}' | sed -e "s/\"//g") globus task wait $task_id fi # done successfully source deactivate globus-cli globusconnect -stop exit 0 }}} === Computing Script === '''computing.sh''' {{{ #!/bin/bash #SBATCH --partition=defq #SBATCH --qos=normal #SBATCH --time=1-00:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=20 # cd to working directory cd ${CYPRESS_WORK_DIR} pwd # module load ... computing something touch RES sleep 5 #done exit 0 }}} == How to submit a job == {{{ sh ./SubmitJob.sh }}}