원문정보
초록
영어
In modern society, email is at the heart of communications in businesses. Its own simplicity of usage and low cost compared to other means of communications leave workers highly replying on it by using them for a variety of purposes from informal communications to exchanges of confidential information. However, possibly due to such trend of usage, a lot of problematic cases have occurred where a business’s security got in trouble by a worker’s careless email use. This study introduces a method to help detect such problematic and careless email use based on analyzing actual enterprise email data. It is designed to help discover email messages that have replies from someone but possibly asked no reply, which is the case we suggest can be a clue to the writer’s carelessness. The method involves building document vectors through a basic bag-of-words model and Word2Vec, which is a state-of-the-art method to create document vectors out of text documents. For an experiment of classifying email messages to find likely careless email messages, Enron email dataset was utilized to be input to the system we made. What the experiment results produce is a list of email messages to be watched before other email messages for the purpose of finding careless email messages.
목차
1. Introduction
2. Unintentional Insider Threats
2.1. Sending Details of Adoptive Parents
2.2. Federal Police’s Publishing Metadata from Criminal Investigations
2.3. University of Nottingham Sending Personal Details of Job Applicants
2.4. California’s Posting 14,000’s Social Security Numbers
2.5. IRS’s Posting Social Security Numbers on Website
2.6. Patient Data Breach at Rady Children’s Hospital
2.7. Conclusion from the Cases
3. Related Work
3.1. Word2Vec
3.2. Doc2Vec
4. Enron Email Dataset
5. Finding Careless Email Messages
5.1. Distinguishing Email Messages with Replies
5.2. Intermediate Result Data Format
5.3. Classifying Email Messages
5.4. Experiment Result
6. Conclusion
References
