Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

Computer Vision

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 53

Full-Text Articles in Physical Sciences and Mathematics

Retrofitting A Legacy Cutlery Washing Machine Using Computer Vision, Hua Leong Fwa Oct 2024

Retrofitting A Legacy Cutlery Washing Machine Using Computer Vision, Hua Leong Fwa

Research Collection School Of Computing and Information Systems

Industry 4.0, the digitalization of manufacturing promises to lead to lowered cost, efficient processes and even discovery of new business models. However, many of the enterprises have huge investments in legacy machines which are not 'smart'. In this study, we thus designed a cost-efficient solution to retrofit a legacy conveyor belt-based cutlery washing machine with a commodity web camera. We then applied computer vision (using both traditional image processing and deep learning techniques) to infer the speed and utilization of the machine. We detailed the algorithms that we designed for computing both speed andutilization. With the existing operational constraints of …


Vysion Software, Isaias Hernandez-Dominguez Jr, Chander Luderman Miller Jul 2024

Vysion Software, Isaias Hernandez-Dominguez Jr, Chander Luderman Miller

2024 Symposium

Vision loss presents significant challenges in daily life. Existing solutions for blind and visually impaired individuals are often limited in functionality, expensive, or complex to use. Vysion Software addresses this gap by developing a user-friendly, all-in-one AI companion app that provides features including text summarization, real-time audio descriptions, and AI-enhanced navigation. This project details the development plan, initial functionalities, and future vision for Vysion Software.


Context In Computer Vision: A Taxonomy, Multi-Stage Integration, And A General Framework, Xuan Wang Jun 2024

Context In Computer Vision: A Taxonomy, Multi-Stage Integration, And A General Framework, Xuan Wang

Dissertations, Theses, and Capstone Projects

Contextual information has been widely used in many computer vision tasks, such as object detection, video action detection, image classification, etc. Recognizing a single object or action out of context could be sometimes very challenging, and context information may help improve the understanding of a scene or an event greatly. However, existing approaches design specific contextual information mechanisms for different detection tasks.

In this research, we first present a comprehensive survey of context understanding in computer vision, with a taxonomy to describe context in different types and levels. Then we proposed MultiCLU, a new multi-stage context learning and utilization framework, …


Morp: Monocular Orientation Regression Pipeline, Jacob Gunderson Jun 2024

Morp: Monocular Orientation Regression Pipeline, Jacob Gunderson

Master's Theses

Orientation estimation of objects plays a pivotal role in robotics, self-driving cars, and augmented reality. Beyond mere position, accurately determining the orientation of objects is essential for constructing precise models of the physical world. While 2D object detection has made significant strides, the field of orientation estimation still faces several challenges. Our research addresses these hurdles by proposing an efficient pipeline which facilitates rapid creation of labeled training data and enables direct regression of object orientation from a single image. We start by creating a digital twin of a physical object using an iPhone, followed by generating synthetic images using …


Deep Learning-Based Human Action Understanding In Videos, Elahe Vahdani Feb 2024

Deep Learning-Based Human Action Understanding In Videos, Elahe Vahdani

Dissertations, Theses, and Capstone Projects

The understanding of human actions in videos holds immense potential for technological advancement and societal betterment. This thesis explores fundamental aspects of this field, including action recognition in trimmed clips and action localization in untrimmed videos. Trimmed videos contain only one action instance, with moments before or after the action excluded from the video. However, the majority of videos captured in unconstrained environments, often referred to as untrimmed videos, are naturally unsegmented. Untrimmed videos are typically lengthy and may encompass multiple action instances, along with the moments preceding or following each action, as well as transitions between actions. In the …


Using Pose Estimation Software To Predict Actions In Sabre Fencing, Micah Edwin Peters Ii Jan 2024

Using Pose Estimation Software To Predict Actions In Sabre Fencing, Micah Edwin Peters Ii

Honors College Theses

Fencing is a combat sport that uses three different swords: epee, foil, and sabre. Due to its fast-paced nature and employment of right of way, sabre fencing is often considered the most difficult of the three to learn. Computer vision and pose estimation software can be used to lower the barrier of entry to sabre fencing by identifying the different actions in sabre fencing. This project focuses on using open-source software to design a program that can identify the sabre parries as well as the main sabre movements. This program could be used to help newer fencers and spectators better …


Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron Dec 2023

Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron

Doctoral Dissertations

This work introduces improvements to the stability and generalizability of Cyclic DARTS (CDARTS). CDARTS is a Differentiable Architecture Search (DARTS)-based approach to neural architecture search (NAS) that uses a cyclic feedback mechanism to train search and evaluation networks concurrently, thereby optimizing the search process by enforcing that the networks produce similar outputs. However, the dissimilarity between the loss functions used by the evaluation networks during the search and retraining phases results in a search-phase evaluation network, a sub-optimal proxy for the final evaluation network utilized during retraining. ICDARTS, a revised algorithm that reformulates the search phase loss functions to ensure …


Weakly-Supervised Semantic Segmentation, Zhaozheng Chen Dec 2023

Weakly-Supervised Semantic Segmentation, Zhaozheng Chen

Dissertations and Theses Collection (Open Access)

Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM …


Deep Learning For Photovoltaic Characterization, Adrian Manuel De Luis Garcia Dec 2023

Deep Learning For Photovoltaic Characterization, Adrian Manuel De Luis Garcia

Graduate Theses and Dissertations

This thesis introduces a novel approach to Photovoltaic (PV) installation segmentation by proposing a new architecture to understand and identify PV modules from overhead imagery. Pivotal to this concept is the creation of a new Transformer-based network, S3Former, which focuses on small object characterization and modelling intra- and inter- object differentiation inside an image. Accurate mapping of PV installations is pivotal for understanding their adoption and guiding energy policy decisions. Drawing insights from current Deep Learning methodologies for image segmentation and building upon State-of-the-Art (SOTA) techniques in solar cell mapping, this work puts forth S3Former with the following enhancements: 1. …


Smartphone Based Object Detection For Shark Spotting, Darrick W. Oliver Nov 2023

Smartphone Based Object Detection For Shark Spotting, Darrick W. Oliver

Master's Theses

Given concern over shark attacks in coastal regions, the recent use of unmanned aerial vehicles (UAVs), or drones, has increased to ensure the safety of beachgoers. However, much of city officials' process remains manual, with drone operation and review of footage still playing a significant role. In pursuit of a more automated solution, researchers have turned to the usage of neural networks to perform detection of sharks and other marine life. For on-device solutions, this has historically required assembling individual hardware components to form an embedded system to utilize the machine learning model. This means that the camera, neural processing …


Decoding The Underlying Meaning Of Multimodal Hateful Memes, Ming Shan Hee, Wen Haw Chong, Roy Ka-Wei Lee Aug 2023

Decoding The Underlying Meaning Of Multimodal Hateful Memes, Ming Shan Hee, Wen Haw Chong, Roy Ka-Wei Lee

Research Collection School Of Computing and Information Systems

Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme …


Automatic Identification Of Jetting Behavior In 3d Printing With Binary Classification And Anomaly Detection, Alexander Chandy May 2023

Automatic Identification Of Jetting Behavior In 3d Printing With Binary Classification And Anomaly Detection, Alexander Chandy

Honors Scholar Theses

Consistently jetting different materials from the print head of a 3D printer is a key, yet challenging task in manufacturing processes. By using active machine learning, we can efficiently predict complex diagrams that illustrate the region of printing conditions under which “desirable jetting”, “jetting”, and “no jetting” of ink occurs for different substances. However, labeling the images of printed ink droplets that are fed to the active learning model can be time intensive. Therefore, it is ideal to use computer vision to automate the classification of this image data. This classification can be broken down into two steps. In the …


Vision-Based Object Manipulation For Activities Of Daily Living (Adl) Assistance Using Assistive Robot, Md Tanzil Shahria May 2023

Vision-Based Object Manipulation For Activities Of Daily Living (Adl) Assistance Using Assistive Robot, Md Tanzil Shahria

Theses and Dissertations

Upper and lower extremity (ULE) functional deficiencies, which limit a person's ability to perform everyday tasks, have increased at an alarming rate over the past few decades. It is essential for individuals with impairments to take care of themselves without requiring a significant amount of support from other individuals. Few assistive devices are available in the market to make their life comfortable, yet controlling them sometimes becomes challenging for this group of people. Robotic devices are emerging as assistive devices to assist individuals with limited ULE functionalities in activities of daily living (ADL). As most of these devices only allow …


Ai Applications On Planetary Rovers, Alexis David Pascual Mar 2023

Ai Applications On Planetary Rovers, Alexis David Pascual

Electronic Thesis and Dissertation Repository

The rise in the number of robotic missions to space is paving the way for the use of artificial intelligence and machine learning in the autonomy and augmentation of rover operations. For one, more rovers mean more images, and more images mean more data bandwidth required for downlinking as well as more mental bandwidth for analyzing the images. On the other hand, light-weight, low-powered microrover platforms are being developed to accommodate the drive for planetary exploration. As a result of the mass and power constraints, these microrover platforms will not carry typical navigational instruments like a stereocamera or a laser …


Reducing Negative Transfer Of Random Data In Source-Free Unsupervised Domain Adaptation, Anthony Wong Mar 2023

Reducing Negative Transfer Of Random Data In Source-Free Unsupervised Domain Adaptation, Anthony Wong

Electronic Thesis and Dissertation Repository

In domain adaptation, a model trained on one dataset (source domain) is applied to a different but related dataset (target domain). The most cutting-edge method is unsupervised source-free domain adaptation (SFDA), in which source data, source labels, and target labels are not available during adaptation. This thesis explores a realistic scenario where the target dataset includes some images that are unrelated to the adaptation process. This scenario can occur from errors in data collection or processing. We provide experiments and analysis to show that current state-of-the-art (SOTA) SFDA methods suffer significant performance drops under a specific domain adaptation setup when …


Region Detection & Segmentation Of Nissl-Stained Rat Brain Tissue, Alexandro Arnal Dec 2022

Region Detection & Segmentation Of Nissl-Stained Rat Brain Tissue, Alexandro Arnal

Open Access Theses & Dissertations

People who analyze images of biological tissue rely on the segmentation of structures as a preliminary step. In particular, laboratories studying the rat brain delineate brain regions to position scientific findings on a brain atlas to propose hypotheses about the rat brain and, ultimately, the human brain. Our work intersects with the preliminary step of delineating regions in images of brain tissue via computational methods.

We investigate pixel-wise classification or segmentation of brain regions using ten histological images of brain tissue sections stained for Nissl substance. We present a deep learning approach that uses the fully convolutional neural network, U-Net, …


Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy Dec 2022

Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy

Electronic Theses and Dissertations

Hyperspectral image (HSI) semantic segmentation is a growing field within computer vision, machine learning, and forestry. Due to the separate nature of these communities, research applying deep learning techniques to ground-type semantic segmentation needs improvement, along with working to bring the research and expectations of these three communities together. Semantic segmentation consists of classifying individual pixels within the image based on the features present. Many issues need to be resolved in HSI semantic segmentation including data preprocessing, feature reduction, semantic segmentation techniques, and adversarial training. In this thesis, we tackle these challenges by employing ensemble methods for HSI semantic segmentation. …


Semantic-Aligned Matching For Enhanced Detr Convergence And Multi-Scale Feature Fusion, Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Jiaxing Huang, Kaiwen Cui, Shijian Lu, Eric Xing Jul 2022

Semantic-Aligned Matching For Enhanced Detr Convergence And Multi-Scale Feature Fusion, Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Jiaxing Huang, Kaiwen Cui, Shijian Lu, Eric Xing

Machine Learning Faculty Publications

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-andplay module that projects object queries and encoded image features into the same feature embedding space, where …


Impact Of Movements On Facial Expression Recognition, Zhebin Yin Jun 2022

Impact Of Movements On Facial Expression Recognition, Zhebin Yin

Honors Theses

The ability to recognize human emotions can be a useful skill for robots. Emotion recognition can help robots understand our responses to robot movements and actions. Human emotions can be recognized through facial expressions. Facial Expression Recognition (FER) is a well-established research area, how- ever, the majority of prior research is based on static datasets of images. With robots often the subject is moving, the robot is moving, or both. The purpose of this research is to determine the impact of movement on facial expression recognition. We apply a pre-existing model for FER, which performs around 70.86% on a given …


Gesture Recognition Using Neural Networks, Ashwini Kurady Jan 2022

Gesture Recognition Using Neural Networks, Ashwini Kurady

Master's Projects

The advances in technology have brought in a lot of changes in the way humans go about their lives. This has enhanced the significance of Artificial Neural Networks and Computer Vision- based interactions with the world. Gesture Recognition is one of the major focus areas in Computer Vision. This involves Human Computer Interfaces (HCI) that would capture and understand human actions. In this project, we will explore how Neural Network concepts can be applied in this challenging field of Computer Vision. By leveraging the latest research for Gesture Recognition, we researched on how to capture the movement across different frames …


License Plate Image Quality Enhancement Utilizing Super Resolution Generative Adversarial Networks, Mark Moelter Jan 2022

License Plate Image Quality Enhancement Utilizing Super Resolution Generative Adversarial Networks, Mark Moelter

Electronic Theses and Dissertations

This thesis focuses primarily on enhancing the image quality of blurred license plates through the use of Super-Resolution Generative Adversarial Networks (SRGANs) [1]. We propose a synthetic dataset with SRGAN model to promote blurred image quality enhancement, and allow for model evaluation on a multitude of image input and output size combinations. SRGAN is mainly used for low-resolution image enhancement, but by heavily blurring the input images, the model is tested on its ability to blindly deblur and upsample images to the desired super-resolution (SR) size. The model enhances the image quality to nearly that of the reference images. The …


Task Classification During Visual Search Using Classic Machine Learning And Deep Learning, Devangi Vilas Chinchankar Dec 2021

Task Classification During Visual Search Using Classic Machine Learning And Deep Learning, Devangi Vilas Chinchankar

Master's Projects

In an average human life, the eyes not only passively scan visual scenes, but most times end up actively performing tasks including, but not limited to, searching, comparing, and counting. As a result of the advances in technology, we are observing a boost in the average screen time. Humans are now looking at an increasing number of screens and in turn images and videos. Understanding what scene a user is looking at and what type of visual task is being performed can be useful in developing intelligent user interfaces, and in virtual reality and augmented reality devices. In this research, …


Analysis Of Camera Trap Footage Through Subject Recognition, Nirnayak Bhardwaj Dec 2021

Analysis Of Camera Trap Footage Through Subject Recognition, Nirnayak Bhardwaj

Master's Projects

Motion-sensitive cameras, otherwise known as camera traps, have become increasingly popular amongst ecologists for studying wildlife. These cameras allow scientists to remotely observe animals through an inexpensive and non-invasive approach. Due to the lenient nature of motion cameras, studies involving them often generate excessive amounts of footage with many photographs not containing any animal subjects. Thus, there is a need for a system that is capable of analyzing camera trap footage to determine if a picture holds value for researchers. While research into automated image recognition is well documented, it has had limited applications in the field of ecology. This …


Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye Aug 2021

Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye

Computational and Data Sciences (PhD) Dissertations

Remote sensing and instrumentation is constantly improving and increasing in capability. Included within this, is the increase in amount of different instrument types, with various combinations of spatial and spectral resolutions, pointing angles, and various other instrument-specific qualities. While the increase in instruments, and therefore datasets, is a boon for those aiming to study the complexities of the various Earth systems, it can also present a large number of new challenges. With this information in mind, our group has set our aims on combining datasets with different spatial and spectral resolutions in an effective and as-general-as-possible way, with as little …


Forecasting Pedestrian Trajectory Using Deep Learning, Arsal Syed Aug 2021

Forecasting Pedestrian Trajectory Using Deep Learning, Arsal Syed

UNLV Theses, Dissertations, Professional Papers, and Capstones

In this dissertation we develop different methods for forecasting pedestrian trajectories. Complete understanding of pedestrian motion is essential for autonomous agents and social robots to make realistic and safe decisions. Current trajectory prediction methods rely on incorporating historic motion, scene features and social interaction to model pedestrian behaviors. Our focus is to accurately understand scene semantics to better forecast trajectories. In order to do so, we leverage semantic segmentation to encode static scene features such as walkable paths, entry/exits, static obstacles etc. We further evaluate the effectiveness of using semantic maps on different datasets and compare its performance with already …


Take The Lead: Toward A Virtual Video Dance Partner, Ty Farris Aug 2021

Take The Lead: Toward A Virtual Video Dance Partner, Ty Farris

Master's Theses

My work focuses on taking a single person as input and predicting the intentional movement of one dance partner based on the other dance partner's movement. Human pose estimation has been applied to dance and computer vision, but many existing applications focus on a single individual or multiple individuals performing. Currently there are very few works that focus specifically on dance couples combined with pose prediction. This thesis is applicable to the entertainment and gaming industry by training people to dance with a virtual dance partner.

Many existing interactive or virtual dance partners require a motion capture system, multiple cameras …


Model Uncertainty Guides Visual Object Tracking, Lijun Zhou, Antoine Ledent, Qintao Hu, Ting Liu, Jianlin Zhang, Marius Kloft Feb 2021

Model Uncertainty Guides Visual Object Tracking, Lijun Zhou, Antoine Ledent, Qintao Hu, Ting Liu, Jianlin Zhang, Marius Kloft

Research Collection School Of Computing and Information Systems

Model object trackers largely rely on the online learning of a discriminative classifier from potentially diverse sample frames. However, noisy or insufficient amounts of samples can deteriorate the classifiers' performance and cause tracking drift. Furthermore, alterations such as occlusion and blurring can cause the target to be lost. In this paper, we make several improvements aimed at tackling uncertainty and improving robustness in object tracking. Our first and most important contribution is to propose a sampling method for the online learning of object trackers based on uncertainty adjustment: our method effectively selects representative sample frames to feed the discriminative branch …


Perceptually Improved Medical Image Translations Using Conditional Generative Adversarial Networks, Anurag Vaidya Jan 2021

Perceptually Improved Medical Image Translations Using Conditional Generative Adversarial Networks, Anurag Vaidya

Honors Theses

Magnetic resonance imaging (MRI) can help visualize various brain regions. Typical MRI sequences consist of T1-weighted sequence (favorable for observing large brain structures), T2-weighted sequence (useful for pathology), and T2-FLAIR scan (useful for pathology with suppression of signal from water). While these different scans provide complementary information, acquiring them leads to acquisition times of ~1 hour and an average cost of $2,600, presenting significant barriers. To reduce these costs associated with brain MRIs, we present pTransGAN, a generative adversarial network capable of translating both healthy and unhealthy T1 scans into T2 scans. We show that the addition of non-adversarial …


Attentional Parsing Networks, Marcus Karr Dec 2020

Attentional Parsing Networks, Marcus Karr

Master's Theses

Convolutional neural networks (CNNs) have dominated the computer vision field since the early 2010s, when deep learning largely replaced previous approaches like hand-crafted feature engineering and hierarchical image parsing. Meanwhile transformer architectures have attained preeminence in natural language processing, and have even begun to supplant CNNs as the state of the art for some computer vision tasks.

This study proposes a novel transformer-based architecture, the attentional parsing network, that reconciles the deep learning and hierarchical image parsing approaches to computer vision. We recast unsupervised image representation as a sequence-to-sequence translation problem where image patches are mapped to successive layers …


Dataset And Evaluation Of Self-Supervised Learning For Panoramic Depth Estimation, Ryan Nett Dec 2020

Dataset And Evaluation Of Self-Supervised Learning For Panoramic Depth Estimation, Ryan Nett

Master's Theses

Depth detection is a very common computer vision problem. It shows up primarily in robotics, automation, or 3D visualization domains, as it is essential for converting images to point clouds. One of the poster child applications is self driving cars. Currently, the best methods for depth detection are either very expensive, like LIDAR, or require precise calibration, like stereo cameras. These costs have given rise to attempts to detect depth from a monocular camera (a single camera). While this is possible, it is harder than LIDAR or stereo methods since depth can't be measured from monocular images, it has to …