원문정보
초록
영어
A fast text retrieval algorithm using the idea of random blocking for massive‐content text based on
Latent Semantic Analysis is proposed in this paper. Firstly, by fully considering the correlation between terms, retrieve and massive‐content text are represented in lower‐dimensional space and the model is improved using the way of singular value decomposition. Secondly, a random blocking query method is used for the retrieval of paragraphs which take the cosine similarity as the fitness function between the retrieve and massive‐content text and then the candidate paragraphs are output when there similarity value are higher than threshold. Experiments show that the proposed method has high performance in text retrieval by considering the semantic information fully and can achieve text retrieval quickly.
중국어
本文基于潜在语义分析技术提出了一种对大容量文本进行随机分块的快速文本检索算法。首先,充分考虑了词项之间的相关性,在低维空间中表示待检索文本的各个段落与检索文本,利用奇异值分解方法模型对其进行了改进;其次,利用随机分块检索算法,以检索文本和待检索文本各段落之间的余弦相似度作为适应度函数进行检索,将相似度超过阈值的候选段落输出;通过对实验结果分析,本文算法充分考虑文本语义信息,检索效果较好,能够实现快速文本检索。
목차
Abstract
0. 引言
1. 设计方案
2. 关键技术的实现
2.1 分词
2.2 文本表示
2.3 潜在语义索引LSI)与奇异值分解方法(SVD)
3. 基于潜在语义分析的随机分块文本检索算法
4. 实验结果及分析
4.1 实验过程
4.2 实验结果评估指标
4.3 结果与分析
5. 结束语
参考文献