Open Access. Powered by Scholars. Published by Universities.®
Graphics and Human Computer Interfaces
Research Collection School Of Computing and Information Systems
Articles 1 - 1 of 1
Full-Text Articles in Theory and Algorithms
Snap-And-Ask: Answering Multimodal Question By Naming Visual Instance, Wei Zhang, Lei Pang, Chong-Wah Ngo
Snap-And-Ask: Answering Multimodal Question By Naming Visual Instance, Wei Zhang, Lei Pang, Chong-Wah Ngo
Research Collection School Of Computing and Information Systems
In real-life, it is easier to provide a visual cue when asking a question about a possibly unfamiliar topic, for example, asking the question, “Where was this crop circle found?”. Providing an image of the instance is far more convenient than texting a verbose description of the visual properties, especially when the name of the query instance is not known. Nevertheless, having to identify the visual instance before processing the question and eventually returning the answer makes multimodal question-answering technically challenging. This paper addresses the problem of visual-totext naming through the paradigm of answering-by-search in a two-stage computational framework, which …