원문정보
초록
영어
Fast and exact searching for sequences similar to a query sequence in genomic databases remains a challenging task in molecular biology. In this paper, the problem of finding all e-matches in a large genomic database is considered, i.e. all local alignments over a given length w and an error rate of at most e. A new database searching algorithm called QFLA is designed to solve this problem. The proposed algorithm is a full-sensitivity algorithm which is a refined q-gram filter and implemented on a q-gram index. First, new features are extracted from match-regions by logically partitioning both query sequence and genomic database. Second, a large part of irrelevant subsequences are eliminated quickly by these new features during the searching process. Last, the unfiltered regions are verified by the well-known smith-waterman algorithm. The experimental results demonstrate that our algorithm saves time by improving filtration efficiency in a short filtration time.
목차
1. Introduction
2. Preliminaries
3. A Refined Q-gram Filter
3.1. Match-region Feature Extraction Based on Partition
3.2. New Filter
3.3. Invalidation and Degeneration
4. Analysis
5. Experimental Results
5.1. Experimental Environment
5.2. Parameter z
5.3. Performance
5.4. Discussion
6. Conclusion
References