Thies Pfeiffer (Libri)

Understanding multimodal deixis with gaze and gesture in conversational interfaces

When humans communicate, we use deictic expressions to refer to objects in our environment, enhancing verbal communication with gestures. This allows for less precision in our words, as our interlocutors interpret our speech, gestures, and eye movements to understand context. The thesis advances research on multimodal conversational interfaces, enabling natural dialogues between humans and computer systems. It provides detailed data on multimodal interactions, specifically focusing on manual pointing gestures and eye gaze, collected through advanced 3D tracking technology. New methods for measuring the 3D point of regard are introduced, alongside the concepts of Gesture Space Volumes and Attention Volumes, which visually represent gestural data and visual attention in space. A data-driven model for human pointing is offered, clarifying how the direction of pointing is defined and emphasizing the role of gaze. Additionally, the thesis presents technologies for recording, integrating, and analyzing multimodal data, facilitating a comprehensive exploration of multimedia corpus data in immersive virtual environments. Finally, it introduces the DRIVE framework for real-time deictic multimodal interaction in virtual reality, implementing the developed models for enhanced communication.

Thies Pfeiffer Libri

Understanding multimodal deixis with gaze and gesture in conversational interfaces

13. Workshop der GI-Fachgruppe VR/AR