earticle

논문검색

Poster Session 1 : IT Fusion Technologies etc.

Enhancing the Robustness of VQA Model via Plausible Counterfactual Data Generation

초록

영어

Visual Question Answering (VQA) models suffer from a language bias problem, where they excessively rely on textual correlations. This study proposes a plausible counterfactual data generation method, named Plausible Counterfactual Data Generation (PCDG), which utilizes Grad- CAM-based visual importance to replace significant objects in a contextually appropriate manner. By synthesizing more contextually relevant samples than other existing augmentation methods, PCDS effectively strengthens visual-language alignment. In experiments on the VQA-CP v2 benchmark, our method achieved significant performance improvements, particularly a 10.56% increase in the 'Num' category and a 2.78% increase in the 'Other' category. This indicates that the proposed method enhances the VQA model's generalization ability and robustness through debiasing.

목차

Abstract
I. INTRODUCTION
II. RELATED WORK
A. Retrieval Visual Contrastive Decoding
B. Counterfactcal sample synthesis
III. METHOD
A. Visual Importance
B. Dynamic Counterfactual Image Generation
IV. EXPERIMENTS
A. Experimental Settings
B. Training
C. Results
V. CONCLUSION
VI. FUTURE WORK
ACKNOWLEDGMENT

저자정보

  • JaeBong Choi Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea
  • NamGyu Jung Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea
  • Chang Choi Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      0개의 논문이 장바구니에 담겼습니다.