Guide to running kraken on AWS#

Setup AWS#

  1. Follow the guide
  2. Install nextflow and docker
  3. Attach volumes

Kraken2-build#

Just an example.

HOSTDIR="/media/JonData/code/kraken/databases"
dkraken="docker run -u 1000 -v $HOSTDIR:/home/jon/databases --rm macadology/kraken2:2.1.3"
DBNAME="/home/jon/databases/standard_20240123"
sudo $dkraken kraken2-build --download-taxonomy --db $DBNAME
sudo $dkraken kraken2-build --download-library human --db $DBNAME
sudo $dkraken kraken2-build --build --db $DBNAME --threads 32 --max-db-size 52000000000

Instance type#

As of 2024, the fastest more cost efficient instance type for the k2_nt_20231129 database is r7i.24xlarge for 768 gb of ram and 96 computing cores.

Kraken2-nt database#

# Increase /dev/shm size one time
sudo mount -o remount,size=710G /dev/shm #710 is enough for k2d files, not the whole folder

# Copy databases to /dev/shm
mkdir -p /dev/shm/database/k2_nt_20231129
cp /mnt/databases/k2_nt_20231129/*k2d /dev/shm/database/k2_nt_20231129

# Run nextflow kraken and bracken (Example)
nextflow kraken.nf --querydir /home/ec2-user/mnt/data/food_fermentation/data/processed/dosa_36hr_fermentation --queryglob "fastp*{fastq,fastq.gz,fq,fq.gz}" --size 2 --outputdir /home/ec2-user/mnt/data/food_fermentation/data/processed/ --krakenDB /dev/shm/database/k2_nt_20231129 --krakenReadlength 150 --krakenKeepOutput true --krakenMMAP --krakenThreads 2 --profilers kraken2,bracken
### Note that I included the --krakenMMAP option while also specifying the preloaded krakenDB on dev shm. Note that they should be the same source and target folder for this trick to work on the nextflow pipeline.

# Increase it at startup (Not recommended since the size depends on instance)
vi /etc/fstab
none         /dev/shm            tmpfs       defaults,size=