Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Singapore Management University

Feature extraction

2022

Social and Behavioral Sciences

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Action-Centric Relation Transformer Network For Video Question Answering, Jipeng Zhang, Jie Shao, Rui Cao, Lianli Gao, Xing Xu, Heng Tao Shen Jan 2022

Action-Centric Relation Transformer Network For Video Question Answering, Jipeng Zhang, Jie Shao, Rui Cao, Lianli Gao, Xing Xu, Heng Tao Shen

Research Collection School Of Computing and Information Systems

Video question answering (VideoQA) has emerged as a popular research topic in recent years. Enormous efforts have been devoted to developing more effective fusion strategies and better intra-modal feature preparation. To explore these issues further, we identify two key problems. (1) Current works take almost no account of introducing action of interest in video representation. Additionally, there exists insufficient labeling data on where the action of interest is in many datasets. However, questions in VideoQA are usually action-centric. (2) Frame-to-frame relations, which can provide useful temporal attributes (e.g., state transition, action counting), lack relevant research. Based on these observations, we …