earticle

논문검색

장바구니 크기가 연관규칙 척도의 정확성에 미치는 영향

원문정보

Effect of Market Basket Size on the Accuracy of Association Rule Measures

김남규

피인용수 : 0(자료제공 : 네이버학술정보)

초록

영어

Recent interests in data mining result from the expansion of the amount of business data and the growing business needs for extracting valuable knowledge from the data and then utilizing it for decision making process. In particular, recent advances in association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from the voluminous transactional data. Certainly, one of the major purposes of association rule mining is to utilize acquired knowledge in providing marketing strategies such as cross-selling, sales promotion, and shelf-space allocation. In spite of the potential applicability of association rule mining, unfortunately, it is not often the case that the marketing mix acquired from data mining leads to the realized profit. The main difficulty of mining-based profit realization can be found in the fact that tremendous numbers of patterns are discovered by the association rule mining. Due
to the too many patterns, data mining experts should perform additional mining of the results of initial mining in order to extract only actionable and profitable knowledge, which exhausts much time and costs.
In the literature, a number of interestingness measures have been devised for estimating discovered patterns. Most of the measures can be directly calculated from what is known as a contingency table, which summarizes the sales frequencies of exclusive items or itemsets. A contingency table can provide brief insights into the relationship between two or more itemsets of concern. However, it is important to note that some useful information concerning sales transactions may be lost when a contingency table is constructed. For instance, information regarding the size of each market basket (i.e., the number of items in each transaction) cannot be described in a contingency table. It is natural that a larger basket has a tendency to consist of more sales patterns. Therefore, if two itemsets are sold together in a very large basket, it can be expected that the basket contains two or more patterns and that the two itemsets belong to mutually different patterns. Therefore, we should classify frequent itemset into two categories, inter-pattern co-occurrence and intra-pattern co-occurrence, and investigate the effect of the market basket size on the two categories. This notion implies that any interestingness measures for association rules should consider not only the total frequency of target itemsets but also the size of each basket.
There have been many attempts on analyzing various interestingness measures in the literature. Most of them have conducted qualitative comparison among various measures. The studies proposed desirable properties of interestingness measures and then surveyed how many properties are obeyed by each measure.
However, relatively few attentions have been made on evaluating how well the patterns discovered by each measure are regarded to be valuable in the real world. In this paper, attempts are made to propose two notions regarding association rule measures. First, a quantitative criterion for estimating accuracy of association rule measures is presented. According to this criterion, a measure can be considered to be accurate if it assigns high scores to meaningful patterns that actually exist and low scores to arbitrary patterns that co-occur by coincidence. Next, complementary measures are presented to improve the accuracy of traditional association rule measures. By adopting the factor of market basket size, the devised measures attempt to discriminate the co-occurrence of itemsets in a small basket from another co-occurrence in a large basket. Intensive computer simulations under various workloads were performed in order to analyze the accuracy of various interestingness measures including traditional measures and the proposed measures.

목차

Abstract
 Ⅰ. 서론
 Ⅱ. 관련 연구
 Ⅲ. 장바구니의 크기를 활용한 연관규칙척도의 정확성 향상 방안
  3.1 장바구니 크기 효과를 반영한 흥미성척도 - 결합력(cohesion)
  3.2 흥미성 척도의 정량적 평가를 위한 정확성 기준
 Ⅳ. 성능 평가
  4.1 실험 모형 및 환경
  4.2 실험 결과 및 해석
 Ⅴ. 결론
 참고문헌

저자정보

  • 김남규 Kim, Namgyu. 국민대학교 경상대학 비즈니스IT학부 전임강사

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 5,500원

      0개의 논문이 장바구니에 담겼습니다.