Introduction

DecGPU (Distributed short read Error Correction on GPUs) is the first parallel and distributed error correction algorithm for high-throughput short reads using CUDA and MPI parallel programming models. Performan evaluation using simulated and real datasets revealed that our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the Hybrid SHREC error correction algorithm The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale datasets. This algorithm has been reported by Genome Technology Magazine (in the article "Next-Gen GPUs" written by Matthew Dublin, 2011)

DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation.


Downloads


Citation

Other related papers


Parameters

Input:

Correct:

Others:


Installation and Usage

  1. A "configure" script has been provided. Users can generate a Makefile for either the CPU-based or the GPU-based version. The following are some typical example usages

    (a)For CUDA-enabled GPUs, you can simply run configure with the architecture capablity specified.
    ./configure --with-cuda=sm_13 --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64
    ./configure --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64/
    ./configure --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-openmpi=~/OPENMPI/
    ./configure --prefix=~/test --with-cuda=sm_13 --with-cudasdkdir=~/NVIDIA_GPU_Computing_SDK --with-mpich=/opt/mpich2/gnu

    (b) For CPUs, you must not specify the option "--with-cuda".
    ./configure --with-mpich=/opt/mpich2/gnu/
    ./configure --prefix=~/test --with-openmpi=~/OPENMPI/
    ./configure --with-mvapich=~/MVAPICH/
    ./configure --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64

    (c)the option "--with-seqname" can be specified to use the same sequence name as the original sequences when output.
    ./configure --with-mpich=/opt/mpich2/gnu/ --with-seqname
    ./configure --with-cuda=sm_13 --with-mvapich=~/MVAPICH/ --with-openfabric=/opt/ofed/lib64 --with-seqname

    (d) After running "configure"
    make [&& make install]

  2. DecGPU works with FASTA and FASTQ file formats. For paired-end reads, it assumes that each read is next to its mate read. In other words,if the reads are indexed from 0, then reads 0 and 1 are paired, 2 and 3 are paired, 4 and 5 etc. If you want to keep the sequence pairs are outputed together (in pairs), please specify the "-paired" option when performing error correction, otherwise, the sequence order will be disturbed.

    If you have paired reads stored in two different FASTA (FASTQ) files but in corresponding order, the bundled Perl script shuffleSequences_fasta.pl and shuffleSequences_fastq.pl (we use the scripts provided in Velvet) will merge the two files into one as approximate. To use it, type:
    ./shuffleSequences_fasta.pl reads_1.fa reads_2.fa output.fa

  3. When running on a GPU cluster, you must make sure that the number of MPI processes running on a node must not be more than the number of available GPUs. This constraint can be ensured using a hostfile. An example of hostfile is as follows, where each node contains two GPUs.

    compute-0-0 slots=2 max-slots=2
    compute-0-1 slots=2 max-slots=2
    compute-0-2 slots=2 max-slots=2
    compute-0-3 slots=2 max-slots=2

  4. Running the program using mpirun command (using OpenMPI for this case)
    • ./decgpu -help" or "./decgpu -?"

      to get command line options

    • mpirun -hostfile gpuhostfile -np 8 ./decgpu directory -fastq readsIn.fastq
    • mpirun -hostfile cpuhostfile -np 8 ./decgpu directory -fastq readsIn.fastq -numthreads 4
    • ./decgpu directory -fastq readsIn.fastq -numthreads 4
    • ./decgpu directory -fasta readsIn.fasta -k 29 -minmulti 10 -numthreads 4
    • ./decgpu directory -fasta readsIn.fasta -k 29 -minmulti 10 -maxtrim 0 -numthreads 4

      When you specify the "-minmulti" value,we recommend specifying the "-minvotes" to be at least half of "-minmulti".


Change Log


Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.