Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Engineering

Toward Intuitive 3d Interactions In Virtual Reality: A Deep Learning- Based Dual-Hand Gesture Recognition Approach, Trudi Di Qi, Franceli L. Cibrian, Meghna Raswan, Tyler Kay, Hector M. Camarillo-Abad, Yuxin Wen May 2024

Toward Intuitive 3d Interactions In Virtual Reality: A Deep Learning- Based Dual-Hand Gesture Recognition Approach, Trudi Di Qi, Franceli L. Cibrian, Meghna Raswan, Tyler Kay, Hector M. Camarillo-Abad, Yuxin Wen

Engineering Faculty Articles and Research

Dual-hand gesture recognition is crucial for intuitive 3D interactions in virtual reality (VR), allowing the user to interact with virtual objects naturally through gestures using both handheld controllers. While deep learning and sensor-based technology have proven effective in recognizing single-hand gestures for 3D interactions, research on dual-hand gesture recognition for VR interactions is still underexplored. In this work, we introduce CWT-CNN-TCN, a novel deep learning model that combines a 2D Convolution Neural Network (CNN) with Continuous Wavelet Transformation (CWT) and a Temporal Convolution Network (TCN). This model can simultaneously extract features from the time-frequency domain and capture long-term dependencies using …


Multimodal Fusion For Audio-Image And Video Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar Jan 2024

Multimodal Fusion For Audio-Image And Video Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar

Research outputs 2022 to 2026

Multimodal Human Action Recognition (MHAR) is an important research topic in computer vision and event recognition fields. In this work, we address the problem of MHAR by developing a novel audio-image and video fusion-based deep learning framework that we call Multimodal Audio-Image and Video Action Recognizer (MAiVAR). We extract temporal information using image representations of audio signals and spatial information from video modality with the help of Convolutional Neutral Networks (CNN)-based feature extractors and fuse these features to recognize respective action classes. We apply a high-level weights assignment algorithm for improving audio-visual interaction and convergence. This proposed fusion-based framework utilizes …