Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Computer Vision

Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 125

Full-Text Articles in Entire DC Network

Context In Computer Vision: A Taxonomy, Multi-Stage Integration, And A General Framework, Xuan Wang Jun 2024

Context In Computer Vision: A Taxonomy, Multi-Stage Integration, And A General Framework, Xuan Wang

Dissertations, Theses, and Capstone Projects

Contextual information has been widely used in many computer vision tasks, such as object detection, video action detection, image classification, etc. Recognizing a single object or action out of context could be sometimes very challenging, and context information may help improve the understanding of a scene or an event greatly. However, existing approaches design specific contextual information mechanisms for different detection tasks.

In this research, we first present a comprehensive survey of context understanding in computer vision, with a taxonomy to describe context in different types and levels. Then we proposed MultiCLU, a new multi-stage context learning and utilization framework, …


Detection And Classification Of Diabetic Retinopathy Using Deep Learning Models, Aishat Olatunji May 2024

Detection And Classification Of Diabetic Retinopathy Using Deep Learning Models, Aishat Olatunji

Electronic Theses and Dissertations

Healthcare analytics leverages extensive patient data for data-driven decision-making, enhancing patient care and results. Diabetic Retinopathy (DR), a complication of diabetes, stems from damage to the retina’s blood vessels. It can affect both type 1 and type 2 diabetes patients. Ophthalmologists employ retinal images for accurate DR diagnosis and severity assessment. Early detection is crucial for preserving vision and minimizing risks. In this context, we utilized a Kaggle dataset containing patient retinal images, employing Python’s versatile tools. Our research focuses on DR detection using deep learning techniques. We used a publicly available dataset to apply our proposed neural network and …


Automatic Image-Based Nutritional Calculator App, Kejvi Cupa Mar 2024

Automatic Image-Based Nutritional Calculator App, Kejvi Cupa

USF Tampa Graduate Theses and Dissertations

Nutrition plays a pivotal role in shaping an individuals’ health and quality of life, making the evaluation of dietary intake crucial for promoting healthier lifestyle choices. Various solutions, particularly mobile apps, have been developed to facilitate the process of dietary estimation. Accurate nutritional intake assessment relies on two key components: ingredient recognition and food portion estimation. For a mobile app to offer a comprehensive solution for automatic nutritional assessment, it must address both components.

In this work, we focus on a mobile app pipeline: the semi-automatic pipeline which focuses on automatic food ingredient recognition. This pipeline integrates state-of-the-art models for …


Deep Learning-Based Human Action Understanding In Videos, Elahe Vahdani Feb 2024

Deep Learning-Based Human Action Understanding In Videos, Elahe Vahdani

Dissertations, Theses, and Capstone Projects

The understanding of human actions in videos holds immense potential for technological advancement and societal betterment. This thesis explores fundamental aspects of this field, including action recognition in trimmed clips and action localization in untrimmed videos. Trimmed videos contain only one action instance, with moments before or after the action excluded from the video. However, the majority of videos captured in unconstrained environments, often referred to as untrimmed videos, are naturally unsegmented. Untrimmed videos are typically lengthy and may encompass multiple action instances, along with the moments preceding or following each action, as well as transitions between actions. In the …


Using Pose Estimation Software To Predict Actions In Sabre Fencing, Micah Edwin Peters Ii Jan 2024

Using Pose Estimation Software To Predict Actions In Sabre Fencing, Micah Edwin Peters Ii

Honors College Theses

Fencing is a combat sport that uses three different swords: epee, foil, and sabre. Due to its fast-paced nature and employment of right of way, sabre fencing is often considered the most difficult of the three to learn. Computer vision and pose estimation software can be used to lower the barrier of entry to sabre fencing by identifying the different actions in sabre fencing. This project focuses on using open-source software to design a program that can identify the sabre parries as well as the main sabre movements. This program could be used to help newer fencers and spectators better …


Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron Dec 2023

Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron

Doctoral Dissertations

This work introduces improvements to the stability and generalizability of Cyclic DARTS (CDARTS). CDARTS is a Differentiable Architecture Search (DARTS)-based approach to neural architecture search (NAS) that uses a cyclic feedback mechanism to train search and evaluation networks concurrently, thereby optimizing the search process by enforcing that the networks produce similar outputs. However, the dissimilarity between the loss functions used by the evaluation networks during the search and retraining phases results in a search-phase evaluation network, a sub-optimal proxy for the final evaluation network utilized during retraining. ICDARTS, a revised algorithm that reformulates the search phase loss functions to ensure …


Deep Learning For Photovoltaic Characterization, Adrian Manuel De Luis Garcia Dec 2023

Deep Learning For Photovoltaic Characterization, Adrian Manuel De Luis Garcia

Graduate Theses and Dissertations

This thesis introduces a novel approach to Photovoltaic (PV) installation segmentation by proposing a new architecture to understand and identify PV modules from overhead imagery. Pivotal to this concept is the creation of a new Transformer-based network, S3Former, which focuses on small object characterization and modelling intra- and inter- object differentiation inside an image. Accurate mapping of PV installations is pivotal for understanding their adoption and guiding energy policy decisions. Drawing insights from current Deep Learning methodologies for image segmentation and building upon State-of-the-Art (SOTA) techniques in solar cell mapping, this work puts forth S3Former with the following enhancements: 1. …


Developing Detection And Mapping Of Roads Within Various Forms Of Media Using Opencv, Jordan C. Lyle Dec 2023

Developing Detection And Mapping Of Roads Within Various Forms Of Media Using Opencv, Jordan C. Lyle

Computer Science and Computer Engineering Undergraduate Honors Theses

OpenCV, and Computer Vision in general, has been a Computer Science topic that has interested me for a long time while completing my Bachelor’s degree at the University of Arkansas. As a result of this, I ended up choosing to utilize OpenCV in order to complete the task of detecting road-lines and mapping roads when given a wide variety of images. The purpose of my Honors research and this thesis is to detail the process of creating an algorithm to detect the road-lines such that the results are effective and instantaneous, as well as detail how Computer Vision can be …


Weakly-Supervised Semantic Segmentation, Zhaozheng Chen Dec 2023

Weakly-Supervised Semantic Segmentation, Zhaozheng Chen

Dissertations and Theses Collection (Open Access)

Semantic segmentation is a fundamental task in computer vision that assigns a label to every pixel in an image based on the semantic meaning of the objects present. It demands a large amount of pixel-level labeled images for training deep models. Weakly-supervised semantic segmentation (WSSS) is a more feasible approach that uses only weak annotations to learn the segmentation task. Image-level label based WSSS is the most challenging and popular, where only the class label for the entire image is provided as supervision. To address this challenge, Class Activation Map (CAM) has emerged as a powerful technique in WSSS. CAM …


Smartphone Based Object Detection For Shark Spotting, Darrick W. Oliver Nov 2023

Smartphone Based Object Detection For Shark Spotting, Darrick W. Oliver

Master's Theses

Given concern over shark attacks in coastal regions, the recent use of unmanned aerial vehicles (UAVs), or drones, has increased to ensure the safety of beachgoers. However, much of city officials' process remains manual, with drone operation and review of footage still playing a significant role. In pursuit of a more automated solution, researchers have turned to the usage of neural networks to perform detection of sharks and other marine life. For on-device solutions, this has historically required assembling individual hardware components to form an embedded system to utilize the machine learning model. This means that the camera, neural processing …


Emotion-Aware Music Recommendation, Hieu Tran, Tuan Le, Anh Do, Tram Vu, Steven Bogaerts, Brian T. Howard Sep 2023

Emotion-Aware Music Recommendation, Hieu Tran, Tuan Le, Anh Do, Tram Vu, Steven Bogaerts, Brian T. Howard

Computer Science Faculty publications

It is common to listen to songs that match one's mood. Thus, an AI music recommendation system that is aware of the user's emotions is likely to provide a superior user experience to one that is unaware. In this paper, we present an emotion-aware music recommendation system. Multiple models are discussed and evaluated for affect identification from a live image of the user. We propose two models: DRViT, which applies dynamic routing to vision transformers, and InvNet50, which uses involution. All considered models are trained and evaluated on the AffectNet dataset. Each model outputs the user's estimated valence and arousal …


Towards Multi-Modal Explainable Video Understanding, Kashu Yamazaki Aug 2023

Towards Multi-Modal Explainable Video Understanding, Kashu Yamazaki

Graduate Theses and Dissertations

This thesis presents a novel approach to video understanding by emulating human perceptual processes and creating an explainable and coherent storytelling representation of video content. Central to this approach is the development of a Visual-Linguistic (VL) feature for an interpretable video representation and the creation of a Transformer-in-Transformer (TinT) decoder for modeling intra- and inter-event coherence in a video. Drawing inspiration from the way humans comprehend scenes by breaking them down into visual and non-visual components, the proposed VL feature models a scene through three distinct modalities. These include: (i) a global visual environment, providing a broad contextual understanding of …


Decoding The Underlying Meaning Of Multimodal Hateful Memes, Ming Shan Hee, Wen Haw Chong, Roy Ka-Wei Lee Aug 2023

Decoding The Underlying Meaning Of Multimodal Hateful Memes, Ming Shan Hee, Wen Haw Chong, Roy Ka-Wei Lee

Research Collection School Of Computing and Information Systems

Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme …


Insect Classification And Explainability From Image Data Via Deep Learning Techniques, Tanvir Hossain Bhuiyan Jun 2023

Insect Classification And Explainability From Image Data Via Deep Learning Techniques, Tanvir Hossain Bhuiyan

USF Tampa Graduate Theses and Dissertations

Since the dawn of the Industrial Revolution, humanity has always tried to make labor more efficient and automated, and this trend is only continuing in the modern digital age. With the advent of artificial intelligence (AI) techniques in the latter part of the 20th century, the speed and scale with which AI has been leveraged to automate tasks defy human imagination. Many people deeply entrenched in the technology field are genuinely intrigued and concerned about how AI may change many of the ways in which humans have been living for millennia. Only time will provide the answers. This dissertation is …


Automatic Identification Of Jetting Behavior In 3d Printing With Binary Classification And Anomaly Detection, Alexander Chandy May 2023

Automatic Identification Of Jetting Behavior In 3d Printing With Binary Classification And Anomaly Detection, Alexander Chandy

Honors Scholar Theses

Consistently jetting different materials from the print head of a 3D printer is a key, yet challenging task in manufacturing processes. By using active machine learning, we can efficiently predict complex diagrams that illustrate the region of printing conditions under which “desirable jetting”, “jetting”, and “no jetting” of ink occurs for different substances. However, labeling the images of printed ink droplets that are fed to the active learning model can be time intensive. Therefore, it is ideal to use computer vision to automate the classification of this image data. This classification can be broken down into two steps. In the …


Vision-Based Object Manipulation For Activities Of Daily Living (Adl) Assistance Using Assistive Robot, Md Tanzil Shahria May 2023

Vision-Based Object Manipulation For Activities Of Daily Living (Adl) Assistance Using Assistive Robot, Md Tanzil Shahria

Theses and Dissertations

Upper and lower extremity (ULE) functional deficiencies, which limit a person's ability to perform everyday tasks, have increased at an alarming rate over the past few decades. It is essential for individuals with impairments to take care of themselves without requiring a significant amount of support from other individuals. Few assistive devices are available in the market to make their life comfortable, yet controlling them sometimes becomes challenging for this group of people. Robotic devices are emerging as assistive devices to assist individuals with limited ULE functionalities in activities of daily living (ADL). As most of these devices only allow …


Prototyping A 3d Reconstruction Of Fly Photoreceptors: From Traditional Computer Vision Techniques To 3d Visualization, Kunal Jain May 2023

Prototyping A 3d Reconstruction Of Fly Photoreceptors: From Traditional Computer Vision Techniques To 3d Visualization, Kunal Jain

Graduate Masters Theses

The Drosophila fruit fly is a well-established model organism for studying vision and neural perception. In this research, we focused on segmenting and analyzing photoreceptor cells in the Drosophila retina, specifically rhabdomeres. To facilitate our study, we acquired a comprehensive 3D dataset of Drosophila retina EM images and developed an automated segmentation pipeline using computer vision techniques. We evaluated the performance of our pipeline using automated metrics on Wild Type CS 1 and Nina D1 Mutant Drosophila datasets and prototyped a 3D model of segmented cells. The generated segmentation masks and 3D models of photoreceptor cells contribute to a better …


Ai Applications On Planetary Rovers, Alexis David Pascual Mar 2023

Ai Applications On Planetary Rovers, Alexis David Pascual

Electronic Thesis and Dissertation Repository

The rise in the number of robotic missions to space is paving the way for the use of artificial intelligence and machine learning in the autonomy and augmentation of rover operations. For one, more rovers mean more images, and more images mean more data bandwidth required for downlinking as well as more mental bandwidth for analyzing the images. On the other hand, light-weight, low-powered microrover platforms are being developed to accommodate the drive for planetary exploration. As a result of the mass and power constraints, these microrover platforms will not carry typical navigational instruments like a stereocamera or a laser …


Reducing Negative Transfer Of Random Data In Source-Free Unsupervised Domain Adaptation, Anthony Wong Mar 2023

Reducing Negative Transfer Of Random Data In Source-Free Unsupervised Domain Adaptation, Anthony Wong

Electronic Thesis and Dissertation Repository

In domain adaptation, a model trained on one dataset (source domain) is applied to a different but related dataset (target domain). The most cutting-edge method is unsupervised source-free domain adaptation (SFDA), in which source data, source labels, and target labels are not available during adaptation. This thesis explores a realistic scenario where the target dataset includes some images that are unrelated to the adaptation process. This scenario can occur from errors in data collection or processing. We provide experiments and analysis to show that current state-of-the-art (SOTA) SFDA methods suffer significant performance drops under a specific domain adaptation setup when …


Bias Detector Tool For Face Datasets Using Image Recognition, Jatin Vamshi Battu Jan 2023

Bias Detector Tool For Face Datasets Using Image Recognition, Jatin Vamshi Battu

Master's Projects

Computer Vision has been quickly transforming the way we live and work. One of its sub- domains, i.e., Facial Recognition has also been advancing at a rapid pace. However, the development of machine learning models that power these systems has been marred by social biases, which open the door to various societal issues. The objective of this project is to address these issues and ensure that computer vision systems are unbiased and fair to all individuals. To achieve this, we have created a web tool that uses three image classifiers (implemented using CNNs) to classify images into categories based on …


Computer Vision In Adverse Conditions: Small Objects, Low-Resoltuion Images, And Edge Deployment, Raja Sunkara Jan 2023

Computer Vision In Adverse Conditions: Small Objects, Low-Resoltuion Images, And Edge Deployment, Raja Sunkara

Masters Theses

"Computer vision based on deep learning is an essential field that plays a significant role in object detection, image classification, semantic segmentation, instance segmentation, and other applications. However, these models face significant challenges in adverse conditions, such as small objects, low-resolution images, and edge deployment. These challenges limit the accuracy and efficiency of computer vision algorithms, making it difficult to obtain reliable results.

The primary objective of this thesis is to assess the performance of deep learning- based computer vision models in challenging conditions and provide viable solutions to overcome the obstacles. The study will specifically address three key challenges, …


Region Detection & Segmentation Of Nissl-Stained Rat Brain Tissue, Alexandro Arnal Dec 2022

Region Detection & Segmentation Of Nissl-Stained Rat Brain Tissue, Alexandro Arnal

Open Access Theses & Dissertations

People who analyze images of biological tissue rely on the segmentation of structures as a preliminary step. In particular, laboratories studying the rat brain delineate brain regions to position scientific findings on a brain atlas to propose hypotheses about the rat brain and, ultimately, the human brain. Our work intersects with the preliminary step of delineating regions in images of brain tissue via computational methods.

We investigate pixel-wise classification or segmentation of brain regions using ten histological images of brain tissue sections stained for Nissl substance. We present a deep learning approach that uses the fully convolutional neural network, U-Net, …


Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy Dec 2022

Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy

Electronic Theses and Dissertations

Hyperspectral image (HSI) semantic segmentation is a growing field within computer vision, machine learning, and forestry. Due to the separate nature of these communities, research applying deep learning techniques to ground-type semantic segmentation needs improvement, along with working to bring the research and expectations of these three communities together. Semantic segmentation consists of classifying individual pixels within the image based on the features present. Many issues need to be resolved in HSI semantic segmentation including data preprocessing, feature reduction, semantic segmentation techniques, and adversarial training. In this thesis, we tackle these challenges by employing ensemble methods for HSI semantic segmentation. …


Real–Time Semantic Segmentation For Railway Anomalies Analysis, Paul Stanik Iii Dec 2022

Real–Time Semantic Segmentation For Railway Anomalies Analysis, Paul Stanik Iii

UNLV Theses, Dissertations, Professional Papers, and Capstones

In the past few years, computer vision has made huge jumps due to deep learning which leverages increased computational power and access to data. The computer vision community has also embraced transparency to accelerate research progress by sharing open datasets and open source code. Access to large scale datasets and benchmark challenges propelled and opened the field. The autonomous vehicle community is a prime example. While there has been significant growth in the automotive vision community, not much has been done in the rail domain. Traditional rail inspection methods require special trains that are run during down time, have sensitive …


Generative Spatio-Temporal And Multimodal Analysis Of Neonatal Pain, Md Sirajus Salekin Nov 2022

Generative Spatio-Temporal And Multimodal Analysis Of Neonatal Pain, Md Sirajus Salekin

USF Tampa Graduate Theses and Dissertations

Neonates can not express their pain like an adult person. Due to the lacking of proper muscle growth and inability to express non-verbally, it is difficult to understand their emotional status. In addition, if the neonates are under any treatment or left monitored after any major surgeries (post-operative), it is more difficult to understand their pain due to the side effect of medications and the caring system (i.e. intubated, masked face, covered body with blanket, etc.). In a clinical environment, usually, bedside nurses routinely observe the neonate and measure the pain status following any standard clinical pain scale. But current …


Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra Oct 2022

Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra

Research Collection School Of Computing and Information Systems

Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual …


Semantic-Aligned Matching For Enhanced Detr Convergence And Multi-Scale Feature Fusion, Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Jiaxing Huang, Kaiwen Cui, Shijian Lu, Eric Xing Jul 2022

Semantic-Aligned Matching For Enhanced Detr Convergence And Multi-Scale Feature Fusion, Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Jiaxing Huang, Kaiwen Cui, Shijian Lu, Eric Xing

Machine Learning Faculty Publications

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-andplay module that projects object queries and encoded image features into the same feature embedding space, where …


Impact Of Movements On Facial Expression Recognition, Zhebin Yin Jun 2022

Impact Of Movements On Facial Expression Recognition, Zhebin Yin

Honors Theses

The ability to recognize human emotions can be a useful skill for robots. Emotion recognition can help robots understand our responses to robot movements and actions. Human emotions can be recognized through facial expressions. Facial Expression Recognition (FER) is a well-established research area, how- ever, the majority of prior research is based on static datasets of images. With robots often the subject is moving, the robot is moving, or both. The purpose of this research is to determine the impact of movement on facial expression recognition. We apply a pre-existing model for FER, which performs around 70.86% on a given …


Gesture Recognition Using Neural Networks, Ashwini Kurady Jan 2022

Gesture Recognition Using Neural Networks, Ashwini Kurady

Master's Projects

The advances in technology have brought in a lot of changes in the way humans go about their lives. This has enhanced the significance of Artificial Neural Networks and Computer Vision- based interactions with the world. Gesture Recognition is one of the major focus areas in Computer Vision. This involves Human Computer Interfaces (HCI) that would capture and understand human actions. In this project, we will explore how Neural Network concepts can be applied in this challenging field of Computer Vision. By leveraging the latest research for Gesture Recognition, we researched on how to capture the movement across different frames …


License Plate Image Quality Enhancement Utilizing Super Resolution Generative Adversarial Networks, Mark Moelter Jan 2022

License Plate Image Quality Enhancement Utilizing Super Resolution Generative Adversarial Networks, Mark Moelter

Electronic Theses and Dissertations

This thesis focuses primarily on enhancing the image quality of blurred license plates through the use of Super-Resolution Generative Adversarial Networks (SRGANs) [1]. We propose a synthetic dataset with SRGAN model to promote blurred image quality enhancement, and allow for model evaluation on a multitude of image input and output size combinations. SRGAN is mainly used for low-resolution image enhancement, but by heavily blurring the input images, the model is tested on its ability to blindly deblur and upsample images to the desired super-resolution (SR) size. The model enhances the image quality to nearly that of the reference images. The …