基于潜在语义分析的随机分块文本检索算法

赵亚慧; 金小峰; 崔荣一

基于潜在语义分析的随机分块文本检索算法

원문정보

Random Blocking Text Retrieval Algorithm Based on Latent Semantic Analysis

기우잠재어의분석적수궤분괴문본검색산법

赵亚慧, 金小峰, 崔荣一

한국어정보학회 한국어정보학 제11권 2호 2009.12 pp.112-116

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

A fast text retrieval algorithm using the idea of random blocking for massive‐content text based on
Latent Semantic Analysis is proposed in this paper. Firstly, by fully considering the correlation between terms, retrieve and massive‐content text are represented in lower‐dimensional space and the model is improved using the way of singular value decomposition. Secondly, a random blocking query method is used for the retrieval of paragraphs which take the cosine similarity as the fitness function between the retrieve and massive‐content text and then the candidate paragraphs are output when there similarity value are higher than threshold. Experiments show that the proposed method has high performance in text retrieval by considering the semantic information fully and can achieve text retrieval quickly.

중국어

本文基于潜在语义分析技术提出了一种对大容量文本进行随机分块的快速文本检索算法。首先，充分考虑了词项之间的相关性，在低维空间中表示待检索文本的各个段落与检索文本，利用奇异值分解方法模型对其进行了改进；其次，利用随机分块检索算法，以检索文本和待检索文本各段落之间的余弦相似度作为适应度函数进行检索，将相似度超过阈值的候选段落输出；通过对实验结果分析，本文算法充分考虑文本语义信息，检索效果较好，能够实现快速文本检索。

摘要
Abstract
0. 引言
1. 设计方案
2. 关键技术的实现
  2.1 分词
  2.2 文本表示
  2.3 潜在语义索引LSI）与奇异值分解方法(SVD)
3. 基于潜在语义分析的随机分块文本检索算法
4. 实验结果及分析
  4.1 实验过程
  4.2 实验结果评估指标
  4.3 结果与分析
5. 结束语
参考文献

키워드

저자정보

赵亚慧 조아혜. China 133002 延吉延边大学工学院计算机科学与技术系智能信息处理研究室
金小峰 김소봉. China 133002 延吉延边大学工学院计算机科学与技术系智能信息处理研究室
崔荣一 최영일. China 133002 延吉延边大学工学院计算机科学与技术系智能信息处理研究室

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.
※ 학술발표대회집, 워크숍 자료집 중 4페이지 이내 논문은 '요약'만 제공되는 경우가 있으니, 구매 전에 간행물명, 페이지 수 확인 부탁 드립니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle