

Unicode Han Character Lookup Service Based on Similar Radicals




Unicode 6.1 (2012) had encoded more than 74,000 Han characters. This great repertory could solve the problem of unencoded Han characters to a significant extent. However, most information systems today still only support input and display of the first 20,902 encoded Han characters in Unicode 1.0 (1991). Even in latest systems, designed to support 32-bit Unicode and with suitable fonts installed, it is not easy to use these newly encoded Han characters. We note that many of these newly encoded Han characters are rarely used in users’ everyday life. An ordinary user may have confusions of their glyph shapes, pronunciations, meanings, and usages. IMEs (input method editors) for Han characters usually require users to have good knowledge of wanted Han characters. It is not unusual users try but fail to input unfamiliar Han characters. In this paper, we present an auxiliary Unicode Han character lookup service by radicals. One can use any Han character IME to key in one or more radicals to look up a wanted Han character. Every Unicode Han character is decomposed as a glyph expression of radicals. The similarity between the glyph expression and user input is estimated by a derived edit distance algorithm. The most similar Unicode Han characters are returned. As a result, the system provides users a convenient way to look up unfamiliar Unicode Han characters.


 1. Introduction
 2. The Method to Measure the Similarity
  2.1. Glyph Expressions
  2.2. Similarity of Two Han characters
  2.3. Han Character Lookup via Radicals
  3.1. Similarity Between Basic Radicals
  3.2. Design of Cost Functions
  3.3. Implementation
 4. Conclusion


  • Jeng-Wei Lin Department of Information Management, Tunghai University
  • Feng-Sheng Lin Institute of Information Science, Academia Sinica


자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.