원문정보
초록
영어
OCR is a complicated process, there are many factors that can influence the recognition rate. Early period people tried to optimize the classifier to obtain high recognition rate, but the premise is that there is only one character no matter print or handwritten. For the performance of classifier has been promoted a lot, recognition rate for single character is high enough for commercial use. With the development of the demand for handwritten text recognition, how to raise the recognition rate of OCR system becomes very important. Unlike OCR system for print which focus on classifier. The research of OCR system for handwritten text is mainly on character segmentation. Statistical analysis showed that the mistake made by missegment is more than the mistake made by classifier. This is decided by the feature of handwritten text. There are more randomness and the lines are not horizontal, besides that, handwritten Chinese characters are more like overlapped and the gaps between characters are smaller. So this is the difficulty of handwritten Chinese characters. In this paper, the mutil-step searching nonlinear line exaction algorithm the paper proposed is easy and the accuracy is high, which can tackle the some weaknesses of direct projection method and indirect projection.
목차
1. Introduction
2. Image Pre-processing
2.1. Smooth Denoising
2.2. Binaryzation of Images
2.3. Estimation of Stroke Width
3. Extract of Character Row in Text
4. Segmentation of Characters
4.1. Segmentation of Non-touching Characters
4.2. Segmentation of Touching Chinese Characters
5. Optimization of Segmentation Paths Based on Genetic Algorithm
5.1. Encoding
5.2. Parameter Setting
5.3. Experiment
6. Conclusion
Acknowledgments
References