원문정보
초록
영어
In this paper, we focus on the problem of signature and role extraction from large-scale mail archives. Due to the huge scale and great diversity of large-scale mail archives, the extraction methods should not only be able to extract signatures and roles accurately without any training data, but also be general enough to work well with large-scale mail archives with different characteristics. To address this problem, we first propose an unsupervised language model based method to identify sig-natures from large numbers of emails, and then present an unsupervised two-stage method to effectively extract roles from the identified signatures. Experimental results on two real-world datasets show that our methods are general and effective for both the signature and role extrac-tion from large-scale mail archives.
목차
1. Introduction
2. Related Work
3. Problem Formulation
4. Unsupervised Signature Extraction
5. Unsupervised Role Extraction
5.1 Candidate Role Identification
5.2 Role Distillation
6. Experiments
6.1 Experimental Design
6.2 Experimental Results
7. Conclusion
References
