원문정보
초록
영어
In the domain of Human-Computer Interaction (HCI), the main focus of the computer is to interpret the external stimuli provided by users. Moreover in the multi-person scenar- ios, it is important to localize and track the speaker. To solve this issue, we introduce here a framework by which multi-modal sensory data can be eciently and meaningfully com- bined in the application of speaker tracking. This framework fuses together four dierent observation types taken from multi-modal sensors. The advantages of this fusion are that weak sensory data from either modality can be reinforced, and the presence of noise can be reduced. We propose a method of combining these modalities by employing a particle lter. This method oers satised real-time performance. We demonstrate results of a speaker localization in two- and three-person scenarios.
목차
1: Introduction
2: The Proposed Approach
2.1: Video Modality
2.2: Audio Modality
2.3: Particle Filter Implementation
3: Experimental Results
4: Conclusions and Future Work
Acknowledgments
References
