Title
Workshop: Flexible read decomposition for improved short read error correction
Abstract
Error correction is often an important first step prior to analyzing reads from next-generation DNA sequencers. Intuitively, error correction with respect to a specific genomic position can be achieved by laying out all the reads covering the position, and examining the base in that specific position from all these reads. Because errors are infrequent and random, reads that contain an error in this specific position can be corrected using the majority of the reads that have this base correctly. This intuition cannot be directly implemented because the source genome is unknown and multiple sequence alignment is an NP-hard problem. Therefore, all error correction algorithms resort to correcting errors in subreads of a fixed length, such as kmers from the input reads. By taking kmers with slight variations to be spanning the same genomic region, these methods can avoid the need for multiple sequence alignment and enable error correction for sequencers that predominantly make only substitution errors. This general framework is besieged with multiple issues: 1) Larger k promotes specificity while shorter k provides sufficient frequency to improve accuracy. 2) Memory usage may increase exponentially with k. 3) Multiple equally likely correction choices may lead to ambiguity in correction. 4) Correction of subreads may impact overlapping subreads that were previously deemed correct through frequency of occurrence, causing multiple conflicting choices.
Year
DOI
Venue
2011
10.1109/ICCABS.2011.5729931
ICCABS
Keywords
Field
DocType
flexible read decomposition,substitution error,error correction,multiple issue,multiple sequence alignment,specific genomic position,error correction algorithm,larger k,specific position,likely correction choice,improved short read error,multiple conflicting choice,accuracy,dna,data structures,molecular biophysics,bioinformatics,genomics
Data structure,Computer science,Intuition,Error detection and correction,Bioinformatics,Multiple sequence alignment,Ambiguity
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Xiao Yang125918.93
Karin S Dorman211113.10
Aluru, Srinivas31166122.83