초록
영어
Generally, a suffix tree is an efficient data structure since it reveals the detailed internal structures of given sequences within linear time. However, it is difficult to implement a suffix tree for a large number of sequences because of memory size constraints. Therefore, in order to compare multimega base genomic sequence sets using suffix trees, there is a need to re-construct the suffix tree algorithms. We introduce a new method for constructing a suffix tree on secondary storage of a large number of sequences. Our algorithm divides three files, in a designated sequence, into parts, storing references to the locations of edges in hash tables. To execute experiments, we used 1,300,000 sequences around 300Mbyte in EST to generate a suffix tree on disk.
목차
1. Introduction
2. Proposed Method
2.1 Data structure
2.2 Storing Edges
2.3 Node Numbering Process
2.4 Storing a Hash Table
3. Experimentation and Analysis
4. Discussion and Conclusion
References