원문정보
초록
영어
In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.
목차
1. Introduction
2. Methods of Research
2.1 Long Short-Term Memory
2.2 One-hot vector and LSTM with pooling for text
2.3 Combination of one-hot LSTM with self-information weight attention mechanism
3. Experiments and Evaluations
3.1 Testing corpora
3.2 Experimental settings
3.3 Evaluation measure
3.4 Experimental results
5. Discussion and Conclusions
References