Getting Started#

This guide contains the basic codes to get you started on NGS on the cluster at GIS. It is based on the following guides:

  1. GIS Intranet
  2. GIS (HPC) Wiki
  3. Bioinfo Best Practices

Account#

Apply for an account by emailing GIS Scientific Support and Shared Services scientificsupport@gis.a-star.edu.sg or sharedservices@gis.a-star.edu.sg. You will need to complete a questionnaire process.

Once completed, you will receive an email with log in information and instructions:

Disk Space#

Request for space from shared services or from members in the group. The group is allocated a total space, and each project under the group is also allocated a fixed amount of space. Since the space allocation overlaps, it is unclear which one sets the space limit.

To check the amount space available, use the code df -h ..

Research folder#

To access the group's storage (not cluster) disk, use smb://nlsmb.gis.a-star.edu.sg and log in with your GIS username and password.

Location of metasub sequencing filenames. smb://nlsmb.gis.a-star.edu.sg/Research/CSB/CSB5/Eliza/Food microbiome 2019/MetaSub/MetaSub Analysis

Start up script#

You can configure bash startup scripts in either ~/.bashrc or ~/.bash_profile.

VPN#

GlobalConnect#

  • Connect to vpn server at vpngw.gis.a-star.edu.sg
  • Log in using your A*STAR ID and Password

OpenConnect#

  • View hipreport file from GlobalConnect under the troubleshoot setting (PanGPS.log).
  • Download, compile and install openconnect for your platform
    • (For Linux) download the vpnc script and run ./configure --with-vpnc-script=[location of vpnc-script] make sudo make install
    • Edit /trojan/hipreport.sh
    • Update the field DOMAIN, COMPUTER, and HOSTID
    • Update the field <entry name="anti-malware"> and <network-interface>
    • Run the command below
sudo openconnect -v --protocol=gp vpngw.gis.a-star.edu.sg --csd-wrapper=./trojans/hipreport2.sh

Cluster Access#

ssh <username>@<node>.gis.a-star.edu.sg

<username> is the username given by GIS. It will typically start with your last name. <nodes> include aquila, aquilaln2, and ionode.

Aquila is the login node and should not be used to run jobs or processes. To do that, (submit jobs)[##submitting_jobs] to other nodes via qsub or open an interactive session as follow:

qrsh -l h_rt=23:59:59 -l mem_free=8G -pe OpenMP 4 -q interactive.q

Change h_rt and mem_free depending on your own needs.

SFTP#

Download filezilla or any other file manager.

sftp <username>@ionode.gis.a-star.edu.sg

X Forwarding / Graphical Interface#

To enable X forwarding on a mac, download xquartz.

To enable X forwarding on a windows using WSL, refer to this. Substitute localhost for 127.0.0.1 instead.

Set up ssh configuration file accordingly as follow:

~/.ssh/config
/etc/ssh/ssh_config

Run

# -Y enables graphical interaction over ssh
ssh -Y <username>@<node>.gis.a-star.edu.sg
# Once logged in test with
echo $DISPLAY
xterm
xclock

Submitting Jobs#

On the cluster, check out /opt/uge-8.1.7p3/examples/jobs/array_submitter.sh and /opt/uge-8.1.7p3/examples/jobs/jobnet_submitter.sh for examples on how to submit jobs.

For a simple way to submit jobs, use

# standard multithread
qsub -pe OpenMP 4 -l h_rt=23:59:59 -l mem_free=16G SCRIPT.sh

# Add env var PATH to the job
qsub -pe OpenMP 4 -l h_rt=23:59:59 -l mem_free=16G -v PATH SCRIPT.sh

# Note: If a command runs in interactive mode but not when submitted as a job, it is likely due to differences in environment.
# To copy the environment over when submitting a job, use `-V`.
qsub -pe OpenMP 4 -l h_rt=23:59:59 -l mem_free=16G -V SCRIPT.sh

# Add a name to the job
qsub -N $NAME_OF_THE_JOB -pe OpenMP 4 -l h_rt=23:59:59 -l mem_free=16G -V SCRIPT.sh

# Send input output jobs to ionode.q specifically
qsub -q ionode.q -N $NAME_OF_THE_JOB -pe OpenMP 1 -l h_rt=23:59:59 -l mem_free=16G -V SCRIPT.sh

# Add log files
qsub -q ionode.q -N $NAME_OF_THE_JOB -pe OpenMP 1 -l h_rt=23:59:59 -l mem_free=16G -V -e $DIR/log -o $DIR/log SCRIPT.sh

# Submitting through echo STDIN instead of using a file
echo "commands1; comamnds2; commands3" | qsub -N $NAME_OF_THE_JOB -pe OpenMP 4 -l h_rt=23:59:59 -l mem_free=16G -V -e $DIR/log -o $DIR/log

Input/Output#

  • Use the ionode instead of aquila to transfer large amounts of data out of GIS server.
  • Method 1 - Create a script using scp with a ssh key to transfer file. Since the submitted job does not allow us to input a password when submitting a job, create a passphraseless private/public key instead. Use the -i option to select the private key.
  • Method 2 - Create a script using rsync with a ssh key to transfer file. Use passphraseless private/public key again to log in without a password.
  • Submit a job.
  • Remove the private key after
# transfer.sh
## Method 1 - scp
scp -r -i [private key file] [source directory or file] [target]:[target directory]

## Method 2 - rsync
rsync -a [source directory or file] [target]:[target directory]
rsync -anv [source directory or file] [target]:[target directory] #to check files that'll be transferred
rsync --log-file [file] --stats --sizeonly -a [source directory or file] [target]:[target directory]
#### Examples
#Transfer only *fq.gz
rsync -e "ssh -i $HOME/.ssh/jonai" -anvm --include="*.fq.gz" --include="*/" --exclude="*" jon@jonai.teojy.com:/media/JonData/NEA/data/processed /home/users/astar/gis/teojyj/scratch/nea/data  # --include="*/" needed for recurssion
# To rsync a list of files, use --files-from. Add r for recursive
rsync -ar --stats --size-only --log-file ~/transfer_general.rsync.log --files-from=isolate_temp.txt jon@jonai.teojy.com:/media/JonData/CRE/data/raw/isolate_reads /home/teojyj/nagarajan_pool/CRE/DATA/ISOLATE_TEMP

qsub -q ionode.q -N transfer_files -pe OpenMP 1 -l h_rt=95:59:59 -l mem_free=16G -V transfer.sh

Troubleshooting#

Log files are incredibly important to troubleshoot issues with your code. Below are a few tips that I use.

  • When submitting a job, add the options the -o and -e options to write the stdout and stderr streams into their respective log files.
  • If a job failed but stdout and stderr didn't provide any clues, use to check the exit code.
    qacct -j <job-id>