원문정보
초록
영어
We introduce a reference-guided, fully automatic mask generation framework that does not rely on textual prompts or manual annotations. The approach first uses Segment Anything Model (SAM) with automatic mask generation (AMG) to produce multiple mask candidates. Each candidate is then scored against the reference image in the CLIP semantic space. A robust Top-K selection with prior reweighting favors plausible regions and suppresses small, off-center, or abnormal aspect-ratio masks. Finally, morphological closing and Gaussian feathering yield refined hard/soft masks that can be directly consumed by inpainting or blending modules. Experiments on a COCO subset and our in-house images show strong performance on segmentation metrics (IoU, Dice) and perceptual measures (FID, LPIPS, CLIP-Score), while avoiding the cost of manual masks. This enables streamlined asset preparation for metaverse content creation, immersive AR/VR scenes, and large-scale digital twins where zero-interaction mask generation is crucial
목차
I. INTRODUCTION
II. RELATED WORK
III. METHODOLOGIES
A. Overall Framework
B. Candidate Mask Generation (SAM-AMG)
C. Mask–Reference Similarity (CLIP Matching)
D. Top-K Robust Selection and Union
E. Prior Reweighting
F. Methodologies
G. Extensions and Robustness
IV. EXPERIMENTS
A. Evaluation Setup
B. Quantitative Results
C. Qualitative Results
V. CONCLUSION
Acknowledgment
REFERENCES
