Abstract
This paper presents a framework for generating appropriate facial expressions for a listener engaged in a dyadic conversation. The ability to produce contextually suitable facial gestures in response to user interactions may enhance the user experience for avatars and social robots interaction. We propose a Transformer and Siamese architecture-based approach for generating appropriate facial expressions. Positive and negative Speaker-Listener pairs are created, applying a contrastive loss to facilitate learning. Furthermore, an ensemble of reconstruction quality sensitive loss functions is added to the network for learning discriminative features. The listener's facial reactions are represented with a combination of the 3D Morphable Model's coefficients and affect-related attributes (facial action units). The inputs to the network are pre-trained Transformer-based feature MARLIN and affect-related features. Experimental analysis demonstrate the effectiveness of the proposed method across various metrics in the form of an increase in performance compared to a variational auto-encoder-based baseline.
Original language | English |
---|---|
Title of host publication | MM '23 - Proceedings of the 31st ACM International Conference on Multimedia |
Place of Publication | New York, NY |
Publisher | Association for Computing Machinery, Inc |
Pages | 9536-9540 |
Number of pages | 5 |
ISBN (Electronic) | 9798400701085 |
DOIs | |
Publication status | Published - 27 Oct 2023 |
Externally published | Yes |
Event | 31st ACM International Conference on Multimedia - Ottawa, Canada Duration: 29 Oct 2023 → 3 Nov 2023 Conference number: 31st |
Publication series
Name | Proceedings of the ACM International Conference on Multimedia |
---|---|
Publisher | Association for Computing Machinery |
Volume | 2023 |
Conference
Conference | 31st ACM International Conference on Multimedia |
---|---|
Abbreviated title | MM 2023 |
Country/Territory | Canada |
City | Ottawa |
Period | 29/10/23 → 3/11/23 |
Keywords
- behavioral encoder
- contrastive learning
- dyadic interactions
- facial reactions generation
- transformer