earticle

논문검색

Language Engineering for Creating Relevance Corpus

초록

영어

Building large relevance datasets is important for the training and evaluation of Information Retrieval (IR) systems. This process involves the collection of documents, queries and assessors' judgments of the degree of relevance of a query to a document. This process is expensive and time consuming. Additionally, it is not a one-of-a-kind project as it can be repeated for different languages and different corpora scopes and with different techniques. This paper presents a software engineering solution for the process of creating relevance corpora that achieves reusability, flexibility, multilingualism and modularity, in order to respect the experimental nature of IR field. The software engineering solution is presented as UML models. This paper then shows how the proposed design model was used to implement the process of building an open source relevance Arabic corpus based on the Clue Web 2009 data set for the purpose of supporting research evaluating and improving search engines for Arabic language.

목차

Abstract
 1. Introduction
 2. Literature Review
  2.1. Documents:
  2.2 Queries:
  2.3. Judgment:
 3. Software Engineering Solution for Creating Relevance Corpus
  3.1. Design Goals
  3.2. Object Oriented Models
 4. Implementation
 5. Conclusions and Future Work
 References

저자정보

  • Nuha H. El-Khalili Faculty of Information Technology, University of Petra, Amman, Jordan
  • Bassam Haddad Faculty of Information Technology, University of Petra, Amman, Jordan
  • Haya El-Ghalayini Faculty of Information Technology, University of Petra, Amman, Jordan

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.