Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Publication
- Publication Type
Articles 1 - 5 of 5
Full-Text Articles in Physical Sciences and Mathematics
Comai: Enabling Lightweight, Collaborative Intelligence By Retrofitting Vision Dnns, Kasthuri Jayarajah, Dhanuja Wanniarachchige, Tarek Abdelzaher, Archan Misra
Comai: Enabling Lightweight, Collaborative Intelligence By Retrofitting Vision Dnns, Kasthuri Jayarajah, Dhanuja Wanniarachchige, Tarek Abdelzaher, Archan Misra
Research Collection School Of Computing and Information Systems
While Deep Neural Network (DNN) models have transformed machine vision capabilities, their extremely high computational complexity and model sizes present a formidable deployment roadblock for AIoT applications. We show that the complexity-vs-accuracy-vs-communication tradeoffs for such DNN models can be significantly addressed via a novel, lightweight form of “collaborative machine intelligence” that requires only runtime changes to the inference process. In our proposed approach, called ComAI, the DNN pipelines of different vision sensors share intermediate processing state with one another, effectively providing hints about objects located within their mutually-overlapping Field-of-Views (FoVs). CoMAI uses two novel techniques: (a) a secondary shallow ML …
Deep Learning For Video-Grounded Dialogue Systems, Hung Le
Deep Learning For Video-Grounded Dialogue Systems, Hung Le
Dissertations and Theses Collection (Open Access)
In recent years, we have witnessed significant progress in building systems with artificial intelligence. However, despite advancements in machine learning and deep learning, we are still far from achieving autonomous agents that can perceive multi-dimensional information from the surrounding world and converse with humans in natural language. Towards this goal, this thesis is dedicated to building intelligent systems in the task of video-grounded dialogues. Specifically, in a video-grounded dialogue, a system is required to hold a multi-turn conversation with humans about the content of a video. Given an input video, a dialogue history, and a question about the video, the …
Neighbourhood Structure Preserving Cross-Modal Embedding For Video Hyperlinking, Yanbin Hao, Chong-Wah Ngo, Benoit Huet
Neighbourhood Structure Preserving Cross-Modal Embedding For Video Hyperlinking, Yanbin Hao, Chong-Wah Ngo, Benoit Huet
Research Collection School Of Computing and Information Systems
Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient traversal of video content. This paper addresses the problem of link construction from the perspective of cross-modal embedding. To this end, a generalized multi-modal auto-encoder is proposed.& x00A0;The encoder learns two embeddings from visual and speech modalities, respectively, whereas each of the embeddings performs self-modal and cross-modal translation of modalities. Furthermore, to preserve the neighbourhood structure of fragments, which is important for video hyperlinking, the auto-encoder is devised to model data …
Rotation Invariant Convolutions For 3d Point Clouds Deep Learning, Zhiyuan Zhang, Binh-Son Hua, David W. Rosen, Sai-Kit Yeung
Rotation Invariant Convolutions For 3d Point Clouds Deep Learning, Zhiyuan Zhang, Binh-Son Hua, David W. Rosen, Sai-Kit Yeung
Research Collection School Of Computing and Information Systems
Recent progresses in 3D deep learning has shown that it is possible to design special convolution operators to consume point cloud data. However, a typical drawback is that rotation invariance is often not guaranteed, resulting in networks that generalizes poorly to arbitrary rotations. In this paper, we introduce a novel convolution operator for point clouds that achieves rotation invariance. Our core idea is to use low-level rotation invariant geometric features such as distances and angles to design a convolution operator for point cloud learning. The well-known point ordering problem is also addressed by a binning approach seamlessly built into the …
Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van
Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van
Research Collection School Of Computing and Information Systems
In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature …