Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Series

2022

Computer vision

Discipline
Institution
Publication

Articles 1 - 28 of 28

Full-Text Articles in Physical Sciences and Mathematics

Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik Dec 2022

Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik

Computer Vision Faculty Publications

Human bio-signal fusion is considered a critical technological solution that needs to be advanced to enable modern and secure digital health and well-being applications in the metaverse. To support such efforts, we propose a new data-driven digital twin (DT) system to fuse three human physiological bio-signals: heart rate (HR), breathing rate (BR), and blood oxygen saturation level (SpO2). To accomplish this goal, we design a computer vision technology based on the non-invasive photoplethysmography (PPG) technique to extract raw time-series bio-signal data from facial video frames. Then, we implement machine learning (ML) technology to model and measure the bio-signals. We accurately …


A Robust Normalizing Flow Using Bernstein-Type Polynomials, Sameera Ramasinghe, Kasun Fernando, Salman Khan, Nick Barnes Nov 2022

A Robust Normalizing Flow Using Bernstein-Type Polynomials, Sameera Ramasinghe, Kasun Fernando, Salman Khan, Nick Barnes

Computer Vision Faculty Publications

Modeling real-world distributions can often be challenging due to sample data that are subjected to perturbations, e.g., instrumentation errors, or added random noise. Since flow models are typically nonlinear algorithms, they amplify these initial errors, leading to poor generalizations. This paper proposes a framework to construct Normalizing Flows (NFs) which demonstrate higher robustness against such initial errors. To this end, we utilize Bernstein-type polynomials inspired by the optimal stability of the Bernstein basis. Further, compared to the existing NF frameworks, our method provides compelling advantages like theoretical upper bounds for the approximation error, better suitability for compactly supported densities, and …


Face Pyramid Vision Transformer, Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood Nov 2022

Face Pyramid Vision Transformer, Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood

Computer Vision Faculty Publications

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT …


Transresnet: Integrating The Strengths Of Vits And Cnns For High Resolution Medical Image Segmentation Via Feature Grafting, Muhammad Hamza Sharif, Dmitry Demidov, Asif Hanif, Mohammad Yaqub, Min Xu Nov 2022

Transresnet: Integrating The Strengths Of Vits And Cnns For High Resolution Medical Image Segmentation Via Feature Grafting, Muhammad Hamza Sharif, Dmitry Demidov, Asif Hanif, Mohammad Yaqub, Min Xu

Computer Vision Faculty Publications

High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method. In particular, high resolution helps substantially in improving automatic image segmentation. However, most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and perform poorly on high-resolution images. To address this shortcoming, we propose a parallel-in-branch architecture called TransResNet, which incorporates Transformer and CNN in a parallel manner to extract features from multi-resolution images independently. In TransResNet, we introduce Cross Grafting Module (CGM), which generates the grafted features, enriched in both …


How To Train Vision Transformer On Small-Scale Datasets?, Hanan Gani, Muzammal Naseer, Mohammad Yaqub Nov 2022

How To Train Vision Transformer On Small-Scale Datasets?, Hanan Gani, Muzammal Naseer, Mohammad Yaqub

Computer Vision Faculty Publications

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases. Therefore, successful training of such models is mainly attributed to pre-training on large-scale datasets such as ImageNet with 1.2M or JFT with 300M images. This hinders the direct adaption of Vision Transformer for small-scale datasets. In this work, we show that self-supervised inductive biases can be learned directly from small-scale datasets and serve as an effective weight initialization scheme for fine-tuning. This …


Maximum Spatial Perturbation Consistency For Unpaired Image-To-Image Translation, Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich Sep 2022

Maximum Spatial Perturbation Consistency For Unpaired Image-To-Image Translation, Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich

Machine Learning Faculty Publications

Unpaired image-to-image translation (I2I) is an ill-posed problem, as an infinite number of translation functions can map the source domain distribution to the target distribution. Therefore, much effort has been put into designing suitable constraints, e.g., cycle consistency (CycleGAN), geometry consistency (GCGAN), and contrastive learning-based constraints (CUTGAN), that help better pose the problem. However, these well-known constraints have limitations: (1) they are either too restrictive or too weak for specific I2I tasks; (2) these methods result in content distortion when there is a significant spatial variation between the source and target domains. This paper proposes a universal regularization technique called …


Ubnormal: New Benchmark For Supervised Open-Set Video Anomaly Detection, Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah Sep 2022

Ubnormal: New Benchmark For Supervised Open-Set Video Anomaly Detection, Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

Detecting abnormal events in video is commonly framed as a one-class classification task, where training videos contain only normal events, while test videos encompass both normal and abnormal events. In this scenario, anomaly detection is an open-set problem. However, some studies assimilate anomaly detection to action recognition. This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types. To this end, we propose UBnormal, a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. Unlike existing data sets, we introduce abnormal events annotated at the pixel level at training …


Avist: A Benchmark For Visual Object Tracking In Adverse Visibility, Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan Aug 2022

Avist: A Benchmark For Visual Object Tracking In Adverse Visibility, Mubashir Noman, Wafa Al Ghallabi, Daniya Najiha, Christoph Mayer, Hisham Cholakkal, Salman Khan, Luc Van Gool, Fahad Shahbaz Khan

Computer Vision Faculty Publications

One of the key factors behind the recent success in visual tracking is the availability of dedicated benchmarks. While being greatly benefiting to the tracking research, existing benchmarks do not pose the same difficulty as before with recent trackers achieving higher performance mainly due to (i) the introduction of more sophisticated transformers-based methods and (ii) the lack of diverse scenarios with adverse visibility such as, severe weather conditions, camouflage and imaging effects. We introduce AVisT, a dedicated benchmark for visual tracking in diverse scenarios with adverse visibility. AVisT comprises 120 challenging sequences with 80k annotated frames, spanning 18 diverse scenarios …


Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead Aug 2022

Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead

Art Faculty Articles and Research

We develop and apply a deep learning-based computer vision pipeline to automatically identify crew members in archival photographic imagery taken on-board the International Space Station. Our approach is able to quickly tag thousands of images from public and private photo repositories without human supervision with high degrees of accuracy, including photographs where crew faces are partially obscured. Using the results of our pipeline, we carry out a large-scale network analysis of the crew, using the imagery data to provide novel insights into the social interactions among crew during their missions.


3d Vision With Transformers: A Survey, Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Anwer, Salman Khan, Ming-Hsuan Yang Aug 2022

3d Vision With Transformers: A Survey, Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Anwer, Salman Khan, Ming-Hsuan Yang

Computer Vision Faculty Publications

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D …


Directed Acyclic Graph-Based Neural Networks For Tunable Low-Power Computer Vision, Abhinav Goel, Caleb Tung, Nick Eliopoulos, Xiao Hu, George K. Thiruvathukal, James C. Davis, Yung-Hisang Lu Aug 2022

Directed Acyclic Graph-Based Neural Networks For Tunable Low-Power Computer Vision, Abhinav Goel, Caleb Tung, Nick Eliopoulos, Xiao Hu, George K. Thiruvathukal, James C. Davis, Yung-Hisang Lu

Computer Science: Faculty Publications and Other Works

Processing visual data on mobile devices has many applications, e.g., emergency response and tracking. State-of-the-art computer vision techniques rely on large Deep Neural Networks (DNNs) that are usually too power-hungry to be deployed on resource-constrained edge devices. Many techniques improve DNN efficiency of DNNs by compromising accuracy. However, the accuracy and efficiency of these techniques cannot be adapted for diverse edge applications with different hardware constraints and accuracy requirements. This paper demonstrates that a recent, efficient tree-based DNN architecture, called the hierarchical DNN, can be converted into a Directed Acyclic Graph-based (DAG) architecture to provide tunable accuracy-efficiency tradeoff options. We …


Robustar: Interactive Toolbox Supporting Precise Data Annotation For Robust Vision Learning, Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, Jingcheng Wu, Xinnuo Li, Linjing Sun, Eric Xing Jul 2022

Robustar: Interactive Toolbox Supporting Precise Data Annotation For Robust Vision Learning, Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, Jingcheng Wu, Xinnuo Li, Linjing Sun, Eric Xing

Machine Learning Faculty Publications

We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model’s robustness is the tendency of the model’s learning of spurious features, we aim to solve this problem from its root at the data perspective by removing the spurious features from the data before training. In particular, we introduce a software that helps the users to better prepare the data for training image classification models by allowing the users to annotate the spurious features …


Bridging The Gap Between Object And Image-Level Representations For Open-Vocabulary Detection, Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan Jul 2022

Bridging The Gap Between Object And Image-Level Representations For Open-Vocabulary Detection, Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision. We note that both these modes of supervision are not optimally aligned for the detection task: CLIP is trained with image-text pairs and lacks precise localization of objects while the image-level supervision has been used with heuristics that do not accurately specify local object regions. In this work, we propose to address this problem by performing object-centric alignment of …


Irrelevant Pixels Are Everywhere: Find And Exclude Them For More Efficient Computer Vision, Caleb Tung, Abhinav Goel, Xiao Hu, Nick Eliopoulos, Emmanuel Amobi, George K. Thiruvathukal, Vipin Chaudhary, Yung-Hisang Lu Jul 2022

Irrelevant Pixels Are Everywhere: Find And Exclude Them For More Efficient Computer Vision, Caleb Tung, Abhinav Goel, Xiao Hu, Nick Eliopoulos, Emmanuel Amobi, George K. Thiruvathukal, Vipin Chaudhary, Yung-Hisang Lu

Computer Science: Faculty Publications and Other Works

Computer vision is often performed using Convolutional Neural Networks (CNNs). CNNs are compute-intensive and challenging to deploy on power-constrained systems such as mobile and Internet-of-Things (IoT) devices. CNNs are compute-intensive because they indiscriminately compute many features on all pixels of the input image. We observe that, given a computer vision task, images often contain pixels that are irrelevant to the task. For example, if the task is looking for cars, pixels in the sky are not very useful. Therefore, we propose that a CNN be modified to only operate on relevant pixels to save computation and energy. We propose a …


Edgenext: Efficiently Amalgamated Cnn-Transformer Architecture For Mobile Vision Applications, Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Anwer, Fahad Shahbaz Khan Jun 2022

Edgenext: Efficiently Amalgamated Cnn-Transformer Architecture For Mobile Vision Applications, Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Anwer, Fahad Shahbaz Khan

Computer Vision Faculty Publications

In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths of both CNN and Transformer models and propose a new efficient hybrid architecture EdgeNeXt. Specifically in EdgeNeXt, we introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups and utilizes depth-wise convolution along with self-attention across channel …


High-Resolution Face Swapping Via Latent Semantics Disentanglement, Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, Shengfeng He Jun 2022

High-Resolution Face Swapping Via Latent Semantics Disentanglement, Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, Shengfeng He

Research Collection School Of Computing and Information Systems

We present a novel high-resolution face swapping method using the inherent prior knowledge of a pre-trained GAN model. Although previous research can leverage generative priors to produce high-resolution results, their quality can suffer from the entangled semantics of the latent space. We explicitly disentangle the latent semantics by utilizing the progressive nature of the generator, deriving structure at-tributes from the shallow layers and appearance attributes from the deeper ones. Identity and pose information within the structure attributes are further separated by introducing a landmark-driven structure transfer latent direction. The disentangled latent code produces rich generative features that incorporate feature blending …


Metaformer Is Actually What You Need For Vision, Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan Jun 2022

Metaformer Is Actually What You Need For Vision, Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Research Collection School Of Computing and Information Systems

Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe …


Visual Attention Methods In Deep Learning: An In-Depth Survey, Mohammed Hassanin, Anwar Saeed, Ibrahim Radwan, Fahad Shahbaz Khan, Ajmal Mian Apr 2022

Visual Attention Methods In Deep Learning: An In-Depth Survey, Mohammed Hassanin, Anwar Saeed, Ibrahim Radwan, Fahad Shahbaz Khan, Ajmal Mian

Computer Vision Faculty Publications

Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. …


Energy-Based Latent Aligner For Incremental Learning, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N. Balasubramanian Mar 2022

Energy-Based Latent Aligner For Incremental Learning, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Vineeth N. Balasubramanian

Computer Vision Faculty Publications

Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. The resulting latent representation mismatch causes forgetting. In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. This learned manifold is used to counter the representational shift that happens during incremental learning. The …


Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan Mar 2022

Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan

FIU Electronic Theses and Dissertations

Early and reliable prediction of user’s intention to change locomotion mode or speed is critical for a smooth and natural lower limb prosthesis. Meanwhile, incorporation of explicit environmental feedback can facilitate context aware intelligent prosthesis which allows seamless operation in a variety of gait demands. This dissertation introduces environmental awareness through computer vision and enables early and accurate prediction of intention to start, stop or change speeds while walking. Electromyography (EMG), Electroencephalography (EEG), Inertial Measurement Unit (IMU), and Ground Reaction Force (GRF) sensors were used to predict intention to start, stop or increase walking speed. Furthermore, it was investigated whether …


Cocoa: Context-Conditional Adaptation For Recognizing Unseen Classes In Unseen Domains, Puneet Mangla, Shivam Chandhok, Vineeth N. Balasubramanian, Fahad Shahbaz Khan Feb 2022

Cocoa: Context-Conditional Adaptation For Recognizing Unseen Classes In Unseen Domains, Puneet Mangla, Shivam Chandhok, Vineeth N. Balasubramanian, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Recent progress towards designing models that can generalize to unseen domains (i.e domain generalization) or unseen classes (i.e zero-shot learning) has embarked interest towards building models that can tackle both domain-shift and semantic shift simultaneously (i.e zero-shot domain generalization). For models to generalize to unseen classes in unseen domains, it is crucial to learn feature representation that preserves class-level (domain-invariant) as well as domain-specific information. Motivated from the success of generative zero-shot approaches, we propose a feature generative framework integrated with a COntext COnditional Adaptive (COCOA) Batch-Normalization layer to seamlessly integrate class-level semantic and domain-specific information. The generated visual features …


Low-Power Computer Vision: Improve The Efficiency Of Artificial Intelligence, George K. Thiruvathukal, Yung-Hisang Lu, Jaeyoun Kim, Yiran Chen, Bo Chen Feb 2022

Low-Power Computer Vision: Improve The Efficiency Of Artificial Intelligence, George K. Thiruvathukal, Yung-Hisang Lu, Jaeyoun Kim, Yiran Chen, Bo Chen

Computer Science: Faculty Publications and Other Works

Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems.


Transformers In Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah Jan 2022

Transformers In Vision: A Survey, Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge …


Explainabilityaudit: An Automated Evaluation Of Local Explainability In Rooftop Image Classification, Duleep Rathgamage Don, Jonathan Boardman, Sudhashree Sayenju, Ramazan Aygun, Yifan Zhang, Bill Franks, Sereres Johnston, George Lee, Dan Sullivan, Girish Modgil Jan 2022

Explainabilityaudit: An Automated Evaluation Of Local Explainability In Rooftop Image Classification, Duleep Rathgamage Don, Jonathan Boardman, Sudhashree Sayenju, Ramazan Aygun, Yifan Zhang, Bill Franks, Sereres Johnston, George Lee, Dan Sullivan, Girish Modgil

Published and Grey Literature from PhD Candidates

Explainable Artificial Intelligence (XAI) is a key concept in building trustworthy machine learning models. Local explainability methods seek to provide explanations for individual predictions. Usually, humans must check these explanations manually. When large numbers of predictions are being made, this approach does not scale. We address this deficiency for a rooftop classification problem specifically with ExplainabilityAudit, a method that automatically evaluates explanations generated by a local explainability toolkit and identifies rooftop images that require further auditing by a human expert. The proposed method utilizes explanations generated by the Local Interpretable Model-Agnostic Explanations (LIME) framework as the most important superpixels of …


Jointly-Learnt Networks For Future Action Anticipation Via Self-Knowledge Distillation And Cycle Consistency, Md Moniruzzaman, Zhaozheng Yin, Zhihai He, Ming-Chuan Leu, Ruwen Qin Jan 2022

Jointly-Learnt Networks For Future Action Anticipation Via Self-Knowledge Distillation And Cycle Consistency, Md Moniruzzaman, Zhaozheng Yin, Zhihai He, Ming-Chuan Leu, Ruwen Qin

Mechanical and Aerospace Engineering Faculty Research & Creative Works

Future action anticipation aims to infer future actions from the observation of a small set of past video frames. In this paper, we propose a novel Jointly learnt Action Anticipation Network (J-AAN) via Self-Knowledge Distillation (Self-KD) and cycle consistency for future action anticipation. In contrast to the current state-of-the-art methods which anticipate the future actions either directly or recursively, our proposed J-AAN anticipates the future actions jointly in both direct and recursive ways. However, when dealing with future action anticipation, one important challenge to address is the future's uncertainty since multiple action sequences may come from or be followed by …


Construction Of A Repeatable Framework For Prostate Cancer Lesion Binary Semantic Segmentation Using Convolutional Neural Networks, Ian Vincent O. Mirasol, Patricia Angela R. Abu, Rosula Sj Reyes Jan 2022

Construction Of A Repeatable Framework For Prostate Cancer Lesion Binary Semantic Segmentation Using Convolutional Neural Networks, Ian Vincent O. Mirasol, Patricia Angela R. Abu, Rosula Sj Reyes

Department of Information Systems & Computer Science Faculty Publications

Prostate cancer is the 3rd most diagnosed cancer overall. Current screening methods such as the prostate-specific antigen test could result in overdiagonosis and overtreatment while other methods such as a transrectal ultrasonography are invasive. Recent medical advancements have allowed the use of multiparametric MRI — a noninvasive and reliable screening process for prostate cancer. However, assessment would still vary from different professionals introducing subjectivity. While con-volutional neural network has been used in multiple studies to ob-jectively segment prostate lesions, due to the sensitivity of datasets and varying ground-truth established used in these studies, it is not possible to reproduce and …


Image-Based Malware Classification Hybrid Framework Based On Space-Filling Curves, Stephen O Shaughnessy, Stephen Sheridan Jan 2022

Image-Based Malware Classification Hybrid Framework Based On Space-Filling Curves, Stephen O Shaughnessy, Stephen Sheridan

Articles

There exists a never-ending “arms race” between malware analysts and adversarial malicious code developers as malevolent programs evolve and countermeasures are developed to detect and eradicate them. Malware has become more complex in its intent and capabilities over time, which has prompted the need for constant improvement in detection and defence methods. Of particular concern are the anti-analysis obfuscation techniques, such as packing and encryption, that are employed by malware developers to evade detection and thwart the analysis process. In such cases, malware is generally impervious to basic analysis methods and so analysts must use more invasive techniques to extract …


Customer Gaze Estimation In Retail Using Deep Learning, Shashimal Senarath, Primesh Pathirana, Dulani Meedeniya, Sampath Jayarathna Jan 2022

Customer Gaze Estimation In Retail Using Deep Learning, Shashimal Senarath, Primesh Pathirana, Dulani Meedeniya, Sampath Jayarathna

Computer Science Faculty Publications

At present, intelligent computing applications are widely used in different domains, including retail stores. The analysis of customer behaviour has become crucial for the benefit of both customers and retailers. In this regard, the concept of remote gaze estimation using deep learning has shown promising results in analyzing customer behaviour in retail due to its scalability, robustness, low cost, and uninterrupted nature. This study presents a three-stage, three-attention-based deep convolutional neural network for remote gaze estimation in retail using image data. In the first stage, we design a mechanism to estimate the 3D gaze of the subject using image data …