Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Graphics and Human Computer Interfaces

PDF

Research Collection School Of Computing and Information Systems

2015

Multimodal learning

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Deep Multimodal Learning For Affective Analysis And Retrieval, Lei Pang, Shiai Zhu, Chong-Wah Ngo Nov 2015

Deep Multimodal Learning For Affective Analysis And Retrieval, Lei Pang, Shiai Zhu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to emerging applications such as advertising and media analytics. Existing research efforts on affective computation are mostly dedicated to single media, either text captions or visual content. Few attempts for combined analysis of multiple media are made, despite that emotion can be viewed as an expression of multimodal experience. In this paper, we explore the learning of highly …


Multimodal Learning With Deep Boltzmann Machine For Emotion Prediction In User Generated Videos, Lei Pang, Chong-Wah Ngo Jun 2015

Multimodal Learning With Deep Boltzmann Machine For Emotion Prediction In User Generated Videos, Lei Pang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Detecting emotions from user-generated videos, such as“anger” and “sadness”, has attracted widespread interest recently. The problem is challenging as effectively representing video data with multi-view information (e.g., audio, video or text) is not trivial. In contrast to the existing works that extract features from each modality (view) separately followed by early or late fusion, we propose to learn a joint density model over the space of multi-modal inputs (including visual, auditory and textual modalities) with Deep Boltzmann Machine (DBM). The model is trained directly on the user-generated Web videos without any labeling effort. More importantly, the deep architecture enlightens the …