원문정보
초록
영어
This work presents a novel fine-tuning scheme for enhancing the quality of Subject Driven Image Generation. Motivated by recent works on fine-tuning pre-trained diffusion models, we extract information from visual patch embedding to optimize the performance of the image encoder in our proposed method. Additionally, the loss function of the conventional Unet model is replaced with Masked Diffusion Loss. During inference time, the model can control degree of similarity between result image and reference image by using Classifier - Free Guidance method. Experimental results indicate that the proposed model exhibits improved image generation quality in comparison to the previous schemes.
목차
I. INTRODUCTION
II. RELATED WORK
III. METHOD
A. Image Encoder
B. Model Training Strategy
IV. EXPERIMENT
V. RESULTS AND DISCUSSION
VI. CONCLUSION
ACKNOWLEDGMENT
REFERENCES