원문정보
초록
영어
The average mutual information (AMI) known from information theory has been reported as a strong genome signature in some literature and we have reported the use of oligonucleotide frequencies as a genome signature. In this work we improve the use of AMI as a training feature for Growing Self Organising Maps (GSOM). Although the range of k is considered as an important parameter in AMI, no standard range for k is proposed. Our first contribution is to introduce a new growth threshold (GT) for GSOM and use it to identify the best range of k for clustering prokaryotic sequence fragments of 10 kb. We then, compare the results using the best k range of AMI against our previously published results using oligonucleotide frequencies. These experiments showed that the newly proposed GT equation makes GSOM able to efficiently and effectively analyse different data features for the same data. The results also emphasize our use of oligonucleotide frequencies as opposed to AMI.
목차
1. Introduction
2. Backgrounds
2.1. The problem in the original Growth Threshold (GT) equation in GSOM
2.2. A generalised GT equation
2.3. Average mutual information for DNA sequences
2.4. Quality measurement of the clustering performance in a mixing region
3. Results
4. Conclusion
References