Introduction
DecGPU (Distributed short read Error Correction on GPUs) is the first parallel and distributed error correction algorithm for high-throughput short reads using CUDA and MPI parallel programming models. Performan evaluation using simulated and real datasets revealed that our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the Hybrid SHREC error correction algorithm The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale datasets. This algorithm has been reported by Genome Technology Magazine (in the article "Next-Gen GPUs" written by Matthew Dublin, 2011)
DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation.
Downloads
- latest source code (release 1.0.7)
more details about the changes in this version are availabe at changelog].
Citation
- Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI". BMC Bioinformatics, 2011, 12:85.
Other related papers
- Yongchao Liu, Jan Schroeder and Bertil Schmidt: " Musket: a multistage k-mer spectrum based error corrector for Illumina sequence data". Bioinformatics, 2013, 29(3): 308-315
- Adrianto Wirawan, Robert S Harris, Yongchao Liu, Bertil Schmidt and Jan Schroeder: "HECTOR: A parallel multistage homopolymer spectrum based error corrector for 454 sequencing data." BMC Bioinformatics, 2014, 15:131
Parameters
Input:
- -fasta infile1 [infile2] (input reads file in FASTA format)
- -fastq infile1 [infile2] (input reads file in FASTQ format)
Correct:
- -k <integer> (the kmer size (odd number between 0 and 32), default value: 21 )
- -paired (indicating all the input reads are paired-end internally, and the output are also paired)
- -minmulti <integer> (the minimum multiplicty cutoff, default value: 6 )
- -minvotes <integer> (the mininum votes per nucleotide per position, default value: 3 )
- -numsearch <integer> (the number of searches, default value: 1 )
- -maxtrim <integer> (the maximum number of bases that can be trimmed, default value: 4 )
- -est_bf_size (estimate the bloom filter size (recommended for small datasets, use the MAX size by default)
- -numthreads <integer> (the number of OpenMP threads per process, default value: 2 )
Others:
- -version (print out the version)
Installation and Usage
- A "configure" script has been provided. Users can generate a Makefile for either the CPU-based
or the GPU-based version. The following are some typical example usages
(a)For CUDA-enabled GPUs, you can simply run configure with the architecture capablity specified.
./configure --with-cuda=sm_13 --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64
./configure --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64/
./configure --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-openmpi=~/OPENMPI/
./configure --prefix=~/test --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-mpich=/opt/mpich2/gnu(b) For CPUs, you must not specify the option "--with-cuda".
./configure --with-mpich=/opt/mpich2/gnu/
./configure --prefix=~/test --with-openmpi=~/OPENMPI/
./configure --with-mvapich=~/MVAPICH/
./configure --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64(c)the option "--with-seqname" can be specified to use the same sequence name as the original sequences when output.
./configure --with-mpich=/opt/mpich2/gnu/ --with-seqname
./configure --with-cuda=sm_13 --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64 --with-seqname(d) After running "configure"
make [&& make install] - DecGPU works with FASTA and FASTQ file formats. For paired-end reads, it assumes
that each read is next to its mate read. In other words,if the reads are indexed from 0, then
reads 0 and 1 are paired, 2 and 3 are paired, 4 and 5 etc. If you want to keep the sequence pairs
are outputed together (in pairs), please specify the "-paired" option when performing error correction,
otherwise, the sequence order will be disturbed.
If you have paired reads stored in two different FASTA (FASTQ) files but in corresponding order, the bundled Perl script shuffleSequences_fasta.pl and shuffleSequences_fastq.pl (we use the scripts provided in Velvet) will merge the two files into one as approximate. To use it, type:
./shuffleSequences_fasta.pl reads_1.fa reads_2.fa output.fa - When running on a GPU cluster, you must make sure that the number of MPI processes running on
a node must not be more than the number of available GPUs. This constraint can be ensured using a
hostfile. An example of hostfile is as follows, where each node contains two GPUs.
compute-0-0 slots=2 max-slots=2
compute-0-1 slots=2 max-slots=2
compute-0-2 slots=2 max-slots=2
compute-0-3 slots=2 max-slots=2 - Running the program using mpirun command (using OpenMPI for this case)
- ./decgpu -help" or "./decgpu -?"
to get command line options
- mpirun -hostfile gpuhostfile -np 8 ./decgpu directory -fastq readsIn.fastq
- mpirun -hostfile cpuhostfile -np 8 ./decgpu directory -fastq readsIn.fastq -numthreads 4
- ./decgpu directory -fastq readsIn.fastq -numthreads 4
- ./decgpu directory -fasta readsIn.fasta -k 29 -minmulti 10 -numthreads 4
- ./decgpu directory -fasta readsIn.fasta -k 29 -minmulti 10 -maxtrim 0 -numthreads 4
When you specify the "-minmulti" value,we recommend specifying the "-minvotes" to be at least half of "-minmulti".
- ./decgpu -help" or "./decgpu -?"
Change Log
- Aug 1, 2012 (Release 1.0.7)
- Added some synchronization points after some file operations for MPI processes.
Contact
If any questions or improvements, please feel free to contact Liu, Yongchao.