초록
영어
In this study, we described linguistic patterns of Korean time expressions including digits observed in real texts such as on-line daily newspapers. Time information is one of the most important information that any information extraction systems need to recognize automatically. As this information is conveyed by some linguistic patterns specific to each natural language, the exhaustive and reliable description of these expressions is strongly required. We first observed some types of time expressions from on-line newspapers and then classified them into 13 classes. We presented these classes under the LGG formalism which is adequate to present finite-local constraints. The Unitex system, conceived for the compilation of this formalism, makes LGG graphs converted into finite-state automata and applied during text analysis to extract time expressions as presented in the graphs.
목차
2. 신문기사 텍스트에서 관찰되는 숫자 시간표현
2.1. 숫자 시간표현 코퍼스
2.2. 연구 범위의 한정
2.3. 하위 분류의 방법 및 기준
3. 숫자포함 시간표현 인식을 위한 정규문법
3.1. 정규표현과 오토마타
3.2. 유한 그래프문법 LGG
3.3. 숫자포함 시간표현 LGG-DigitTimex
4. 맺음말
참고문헌