A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory




In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.


 1. Introduction
 2. Methods of Research
  2.1 Long Short-Term Memory
  2.2 One-hot vector and LSTM with pooling for text
  2.3 Combination of one-hot LSTM with self-information weight attention mechanism
 3. Experiments and Evaluations
  3.1 Testing corpora
  3.2 Experimental settings
  3.3 Evaluation measure
  3.4 Experimental results
 5. Discussion and Conclusions


  • Hee sook Kim Department of Computer Information, Inchon Campus of Korea Polytechnic, Korea
  • Min Hi Lee Department of Architecture, Howon University, Korea


자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 4,000원

      0개의 논문이 장바구니에 담겼습니다.