使用fastq-dump下載多個fastq文件


1

我想在Salmon中同時下載以下fastq文件:

 - SRR10611214
 - SRR10611215
 - SRR10611215
 - SRR10611216
 - SRR10611217

有沒有一種方法可以使用bash進行循環或fastq轉儲?或預取

4

A sample code is given in the salmon documentation as follows. Source

#!/bin/bash
mkdir data
cd data
for i in `seq 25 40`; 
do 
  mkdir DRR0161${i}; 
  cd DRR0161${i}; 
  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR016/DRR0161${i}/DRR0161${i}_1.fastq.gz; 
  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR016/DRR0161${i}/DRR0161${i}_2.fastq.gz; 
  cd ..; 
done
cd .. 

This could be modified as follows.

#!/bin/bash
mkdir data
cd data
for i in `seq 14 17`; 
do 
  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/0${i}/SRR106112${i}/SRR106112${i}_1.fastq.gz; 
  wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/0${i}/SRR106112${i}/SRR106112${i}_2.fastq.gz; 
done
cd .. 

You can save the code as a shell script and run it from the linux terminal. For example bash download_fastq.sh


0

You can use parallel.

parallel -j 3 fastq-dump {} ::: SRR10611214 SRR10611215 SRR10611215 SRR10611216 SRR10611217

The option -j says how many jobs should maximal run parallel. So in this case maximal 3 identifier would be handled at the same time.

How many jobs you can run parallel depends on your machine.

You can also take a look at parallel-fastq-dump.


0

Make a list.txt file containing a single column of SRA numbers to download.

then:

for i in $(cat list.txt); do echo $i; date; fasterq-dump -S $i; done

It works well to use NCBI's web interface to find SRA samples of interest, download and open findings in Excel, then copy single column containing SRA numbers and paste into list.txt using document editor such as vim.

After downloading including the "R" can be nice:

for i in *_1.fastq; do mv $i ${i%_1.fastq}_R1.fastq; done

for i in *_2.fastq; do mv $i ${i%_2.fastq}_R1.fastq; done

and zip:

pigz *fastq

If needed a conda option for downloading fasterq-dump:

conda install -c bioconda sra-tools