Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- MBZUAI (25)
- Portland State University (23)
- Singapore Management University (22)
- Air Force Institute of Technology (15)
- Old Dominion University (13)
-
- University of Texas at Arlington (13)
- TÜBİTAK (10)
- Loyola University Chicago (9)
- Technological University Dublin (9)
- University of Arkansas, Fayetteville (8)
- University of Central Florida (8)
- California Polytechnic State University, San Luis Obispo (6)
- New Jersey Institute of Technology (6)
- San Jose State University (6)
- Zayed University (6)
- Brigham Young University (5)
- Edith Cowan University (5)
- University of Nebraska - Lincoln (5)
- University of Tennessee, Knoxville (5)
- University of Texas Rio Grande Valley (5)
- Washington University in St. Louis (5)
- Louisiana State University (4)
- Purdue University (4)
- The University of Maine (4)
- University at Albany, State University of New York (4)
- University of Kentucky (4)
- Wright State University (4)
- China Simulation Federation (3)
- City University of New York (CUNY) (3)
- Dartmouth College (3)
- Publication Year
- Publication
-
- Theses and Dissertations (25)
- Research Collection School Of Computing and Information Systems (22)
- Computer Vision Faculty Publications (20)
- Dissertations and Theses (16)
- Electronic Theses and Dissertations (16)
-
- Turkish Journal of Electrical Engineering and Computer Sciences (10)
- Computer Science and Engineering Dissertations (9)
- Computer Science: Faculty Publications and Other Works (9)
- All Works (6)
- Computer Science Faculty Publications and Presentations (6)
- Dissertations (6)
- Master's Projects (6)
- Graduate Theses and Dissertations (5)
- Machine Learning Faculty Publications (5)
- Articles (4)
- Browse all Theses and Dissertations (4)
- Computer Science and Engineering Theses (4)
- Conference papers (4)
- Electrical & Computer Engineering Theses & Dissertations (4)
- Legacy Theses & Dissertations (2009 - 2024) (4)
- McKelvey School of Engineering Theses & Dissertations (4)
- Theses and Dissertations--Computer Science (4)
- Computer Science and Computer Engineering Undergraduate Honors Theses (3)
- Dissertations, Theses, and Capstone Projects (3)
- Electrical & Computer Engineering Faculty Publications (3)
- Electronic Thesis and Dissertation Repository (3)
- Journal of System Simulation (3)
- LSU Doctoral Dissertations (3)
- Master's Theses (3)
- Open Access Dissertations (3)
Articles 1 - 30 of 290
Full-Text Articles in Entire DC Network
Predictive Filtering-Based Image Inpainting, Xiaoguang Li
Predictive Filtering-Based Image Inpainting, Xiaoguang Li
Theses and Dissertations
Image inpainting is an important challenge in the computer vision field. The primary goal of image inpainting is to fill in the missing parts of an image. This technique has many real-life uses including fixing old photographs and restoring ancient artworks, e.g., the degraded Dunhuang frescoes. Moreover, image inpainting is also helpful in image editing. It has the capability to eliminate unwanted objects from images while maintaining a natural and realistic appearance, e.g., removing watermarks and subtitles. Disregarding the fact that image inpainting expects the restored result to be identical to the original clean one, existing deep generative inpainting methods …
Challenges And Practices Of Deep Learning Model Reengineering: A Case Study On Computer Vision, Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis
Challenges And Practices Of Deep Learning Model Reengineering: A Case Study On Computer Vision, Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis
Computer Science: Faculty Publications and Other Works
Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering — reusing, replicating, adapting, and enhancing state-of-the-art deep learning approaches — is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing.
Scene Text Detection And Recognition Via Discriminative Representation, Liang Zhao
Scene Text Detection And Recognition Via Discriminative Representation, Liang Zhao
Theses and Dissertations
Scene texts refer to arbitrary text presented in an image captured by a camera in the real world. The tasks of scene text detection and recognition from complex images play a crucial role in computer vision, with potential applications in scene understanding, information retrieval, robotics, autonomous driving, etc. Despite the notable progress made by existing deep-learning methods, achieving accurate text detection and recognition remains challenging for robust real-world applications. The challenges in scene text detection and recognition stem from: 1) diverse text shapes, fonts, colors, styles, layouts, etc.; 2) countless combinations of characters with unfixed attributes for complete detection, coupled …
Development Of Feature Extraction Models To Improve Image Analysis Applications In Cancer, Yu Shi
Development Of Feature Extraction Models To Improve Image Analysis Applications In Cancer, Yu Shi
Dissertations and Doctoral Documents from University of Nebraska-Lincoln, 2023–
Cancer poses a significant global health challenge. With an estimated 20 million new cases diagnosed worldwide in 2022 and 9.7 million fatalities attributable to the disease, the economic burden of cancer is immense. It impacts healthcare systems and imposes substantial costs for its care on patients and their families. Despite advancements in early detection, prevention, and treatment that have reduced overall cancer mortality rates, the growing prevalence of cancer, particularly among younger individuals, remains a pressing issue.
Recent advancements in medical imaging technology have progressed significantly with the help of emerging computer vision and artificial intelligence (AI) technology. Despite these …
Embodied Visions: Interactive Installations That Reimagine Bodily Presence In Digital Imaging Apparatuses As Shadows, Yunzi Shi
Dartmouth College Master’s Theses
Contextualized within a history of technological development, the evolution of imaging devices and technologies is accompanied by the abstraction of spatial relationships between the body of the observer, the apparatus, and physical reality, which leads to disembodying experiences for the observing subject. Compared with devices and interactive experiences, critical reflection on the epistemological impact of digital imaging devices has less priority in computational imaging and human-computer interaction research. Taking an artistic approach, this thesis describes Embodied Visions, an exhibition featuring three interactive installations exploring the technical infrastructure for imaging and reflecting on the (dis)embodied experiences in the digital age. …
Star-Based Reachability Analysis Of Binary Neural Networks On Continuous Input, Mykhailo Ivashchenko
Star-Based Reachability Analysis Of Binary Neural Networks On Continuous Input, Mykhailo Ivashchenko
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Deep Neural Networks (DNNs) have become a popular instrument for solving various real-world problems. DNNs’ sophisticated structure allows them to learn complex representations and features. However, architecture specifics and floating-point number usage result in increased computational operations complexity. For this reason, a more lightweight type of neural networks is widely used when it comes to edge devices, such as microcomputers or microcontrollers – Binary Neural Networks (BNNs). Like other DNNs, BNNs are vulnerable to adversarial attacks; even a small perturbation to the input set may lead to an errant output. Unfortunately, only a few approaches have been proposed for verifying …
3-D Reconstruction For Underwater Robots With A Monocular Camera And Lights, Monika Roznere
3-D Reconstruction For Underwater Robots With A Monocular Camera And Lights, Monika Roznere
Dartmouth College Ph.D Dissertations
Before a robot can act, it must perceive its environment. Though, this is not a simple task when considering the challenges in underwater domains -- poor visibility conditions, limited sensor configurations, and lack of readily accessible localization. Underwater robots have, nevertheless, improved dramatically with more extensive sensor and navigation equipment. Robot and sensor use have enabled us to explore all reaches of our oceans. On the other hand, these same robots are not easily accessible or transferable to many practical tasks, including fishery management, infrastructure maintenance, disaster response, site conservation, and ecological surveys. There is a growing need for robots …
Sign-A-Mander: A Mobile App That Enhances Asl Learning With Computer Vision, Sandrine Adap
Sign-A-Mander: A Mobile App That Enhances Asl Learning With Computer Vision, Sandrine Adap
Honors Theses
Several machine learning researchers have developed algorithms recognizing American Sign Language (ASL), but few have applied the algorithms to real-world situations, such as with portable ASL learning applications. This project develops a beta version of a mobile application designed to allow beginner ASL learners to practice basic ASL vocabulary and receive feedback about their signing accuracy. Building on Dongxu Li et al.’s I3D sign language recognition algorithm and 2000-word dataset, the app seeks to determine whether the I3D algorithm can sufficiently recognize a user’s motions when recorded from a mobile device and accurately classify whether or not the user signed …
Rescape: Transforming Coral-Reefscape Images For Quantitative Analysis, Zachary Ferris, Eraldo Ribeiro, Tomofumi Nagata, Robert Van Woesik
Rescape: Transforming Coral-Reefscape Images For Quantitative Analysis, Zachary Ferris, Eraldo Ribeiro, Tomofumi Nagata, Robert Van Woesik
Ocean Engineering and Marine Sciences Faculty Publications
Ever since the first image of a coral reef was captured in 1885, people worldwide have been accumulating images of coral reefscapes that document the historic conditions of reefs. However, these innumerable reefscape images suffer from perspective distortion, which reduces the apparent size of distant taxa, rendering the images unusable for quantitative analysis of reef conditions. Here we solve this century-long distortion problem by developing a novel computer-vision algorithm, ReScape, which removes the perspective distortion from reefscape images by transforming them into top-down views, making them usable for quantitative analysis of reef conditions. In doing so, we demonstrate the …
Context-Aware Affective Behavior Modeling And Analytics, Md Taufeeq Uddin
Context-Aware Affective Behavior Modeling And Analytics, Md Taufeeq Uddin
USF Tampa Graduate Theses and Dissertations
Affective computing (AC) is a sub-domain of AI that has the potential to assist people by assessing mental states and making appropriate recommendations to patients, loved ones, caregivers, and domain experts. Humans usually produce an enormous amount of data (such as face videos) every day. One of the major challenges for affective computer vision is to efficiently deal with high volumes of data to facilitate automated model development. To cope with this challenge, we developed computer vision algorithms that measure the expressivity of the human face from video data. More precisely, the developed algorithms can map complex affect information from …
A Computer Vision Solution To Cross-Cultural Food Image Classification And Nutrition Logging, Rohan Sethi, George K. Thiruvathukal
A Computer Vision Solution To Cross-Cultural Food Image Classification And Nutrition Logging, Rohan Sethi, George K. Thiruvathukal
Computer Science: Faculty Publications and Other Works
The US is a culturally and ethnically diverse country, and with this diversity comes a myriad of cuisines and eating habits that expand well beyond that of western culture. Each of these meals have their own good and bad effects when it comes to the nutritional value and its potential impact on human health. Thus, there is a greater need for people to be able to access the nutritional profile of their diverse daily meals and better manage their health. A revolutionary solution to democratize food image classification and nutritional logging is using deep learning to extract that information from …
Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu
Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu
Turkish Journal of Electrical Engineering and Computer Sciences
Domain generalization (DG) techniques strive to attain the ability to generalize to an unfamiliar target domain solely based on training data originating from the source domains. Despite the increasing attention given to learning from multiple training domains through the application of various forms of invariance across those domains, the enhancements observed in comparison to ERM are nearly insignificant under specified evaluation rules. In this paper, we demonstrate that the disentanglement of spurious and invariant features is a challenging task in conventional training since ERM simply minimizes the loss and does not exploit invariance among domains. To address this issue, we …
Relative Vectoring Using Dual Object Detection For Autonomous Aerial Refueling, Derek B. Worth, Jeffrey L. Choate, James Lynch, Scott L. Nykl, Clark N. Taylor
Relative Vectoring Using Dual Object Detection For Autonomous Aerial Refueling, Derek B. Worth, Jeffrey L. Choate, James Lynch, Scott L. Nykl, Clark N. Taylor
Faculty Publications
Once realized, autonomous aerial refueling will revolutionize unmanned aviation by removing current range and endurance limitations. Previous attempts at establishing vision-based solutions have come close but rely heavily on near perfect extrinsic camera calibrations that often change midflight. In this paper, we propose dual object detection, a technique that overcomes such requirement by transforming aerial refueling imagery directly into receiver aircraft reference frame probe-to-drogue vectors regardless of camera position and orientation. These vectors are precisely what autonomous agents need to successfully maneuver the tanker and receiver aircraft in synchronous flight during refueling operations. Our method follows a common 4-stage process …
Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu
Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu
Turkish Journal of Electrical Engineering and Computer Sciences
Recent advances in deep learning, increased availability of large-scale datasets, and improvement of accelerated graphics processing units facilitated creation of an unprecedented amount of synthetically generated media content with impressive visual quality. Although such technology is used predominantly for entertainment, there is widespread practice of using deepfake technology for malevolent ends. This potential for malicious use necessitates the creation of detection methods capable of reliably distinguishing manipulated video content. In this work we aim to create a learning-based detection method for synthetically generated videos. To this end, we attempt to detect spatiotemporal inconsistencies by leveraging a learning-based magnification-inspired feature manipulation …
Automatic Classification Of Activities In Classroom Videos, Jonathan K. Foster, Matthew Korban, Peter Youngs, Ginger S. Watson, Scott T. Acton
Automatic Classification Of Activities In Classroom Videos, Jonathan K. Foster, Matthew Korban, Peter Youngs, Ginger S. Watson, Scott T. Acton
VMASC Publications
Classroom videos are a common source of data for educational researchers studying classroom interactions as well as a resource for teacher education and professional development. Over the last several decades emerging technologies have been applied to classroom videos to record, transcribe, and analyze classroom interactions. With the rise of machine learning, we report on the development and validation of neural networks to classify instructional activities using video signals, without analyzing speech or audio features, from a large corpus of nearly 250 h of classroom videos from elementary mathematics and English language arts instruction. Results indicated that the neural networks performed …
A Survey On Few-Shot Class-Incremental Learning, Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari
A Survey On Few-Shot Class-Incremental Learning, Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari
Computer Science Faculty Publications
Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental …
Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam
Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam
SMU Data Science Review
Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …
Enhancing Indoors Robotic Traversability Estimation With Sensor Fusion, Christos Sevastopoulos
Enhancing Indoors Robotic Traversability Estimation With Sensor Fusion, Christos Sevastopoulos
Computer Science and Engineering Dissertations
Generally speaking, traversability estimation illustrates the ability to navigate or move through a particular environment (indoors or outdoors). Indoor environments are governed by uncertainty and stochasticity arising from their complex structures encapsulating both static elements like furniture and walls, as well as entities such as moving humans. In our research, we underline the importance of blending semantic and spatial information for ensuring secure navigation for a mobile robot. We show that RGB sensors suffer from constrained situational awareness of the surroundings, thus highlighting the need to incorporate spatial and geometric data, which can collaborate synergistically to enhance overall perception and …
Implementation Of Adas And Autonomy On Unlv Campus, Zillur Rahman
Implementation Of Adas And Autonomy On Unlv Campus, Zillur Rahman
UNLV Theses, Dissertations, Professional Papers, and Capstones
The integration of Advanced Driving Assistance Systems (ADAS) and autonomous driving functionalities into contemporary vehicles has notably surged, driven by the remarkable progress in artificial intelligence (AI). These AI systems, capable of learning from real-world data, now exhibit the capability to perceive their surroundings via a suite of sensors, create optimal routes from source to destination, and execute vehicle control akin to a human driver.
Within the context of this thesis, we undertake a comprehensive exploration of three distinct yet interrelated ADAS and Autonomy projects. Our central objective is the implementation of autonomous driving(AD) technology at UNLV campus, culminating in …
Enhanced Privacy-Enabled Face Recognition Using Κ-Identity Optimization, Ryan Karl
Enhanced Privacy-Enabled Face Recognition Using Κ-Identity Optimization, Ryan Karl
Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research
Facial recognition is becoming more and more prevalent in the daily lives of the common person. Law enforcement utilizes facial recognition to find and track suspects. The newest smartphones have the ability to unlock using the user's face. Some door locks utilize facial recognition to allow correct users to enter restricted spaces. The list of applications that use facial recognition will only increase as hardware becomes more cost-effective and more computationally powerful. As this technology becomes more prevalent in our lives, it is important to understand and protect the data provided to these companies. Any data transmitted should be encrypted …
Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir
Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir
Computer Vision Faculty Publications
As urbanization increases, streetlights have become significant consumers of electrical power, making it imperative to develop effective control methods for sustainability. This paper offers a comprehensive review on control methods of smart streetlight systems, setting itself apart by introducing a novel light scheme framework that provides a structured classification of various light control patterns, thus filling an existing gap in the literature. Unlike previous studies, this work dives into the technical specifics of individual research papers and methodologies, ranging from basic to advanced control methods like computer vision and deep learning, while also assessing the energy consumption associated with each …
Object Recognition With Deep Neural Networks In Low-End Systems, Lillian Davis
Object Recognition With Deep Neural Networks In Low-End Systems, Lillian Davis
Mahurin Honors College Capstone Experience/Thesis Projects
Object recognition is an important area in computer vision. Object recognition has been advanced significantly by deep learning that unifies feature extraction and classification. In general, deep neural networks, such as Convolution Neural Networks (CNNs), are trained in high-performance systems. Aiming to extend the reach of deep learning to personal computing, I propose a study of deep learning-based object recognition in low-end systems, such as laptops. This research includes how differing layer configurations and hyperparameter values used in CNNs can either create or resolve the issue of overfitting and affect final accuracy levels of object recognition systems. The main contribution …
Stprivacy: Spatio-Temporal Privacy-Preserving Action Recognition, Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
Stprivacy: Spatio-Temporal Privacy-Preserving Action Recognition, Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan
Research Collection School Of Computing and Information Systems
Existing methods of privacy-preserving action recognition (PPAR) mainly focus on frame-level (spatial) privacy removal through 2D CNNs. Unfortunately, they have two major drawbacks. First, they may compromise temporal dynamics in input videos, which are critical for accurate action recognition. Second, they are vulnerable to practical attacking scenarios where attackers probe for privacy from an entire video rather than individual frames. To address these issues, we propose a novel framework STPrivacy to perform video-level PPAR. For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification …
Pymaivar: An Open-Source Python Suit For Audio-Image Representation In Human Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar
Pymaivar: An Open-Source Python Suit For Audio-Image Representation In Human Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar
Research outputs 2022 to 2026
We present PyMAiVAR, a versatile toolbox that encompasses the generation of image representations for audio data including Wave plots, Spectral Centroids, Spectral Roll Offs, Mel Frequency Cepstral Coefficients (MFCC), MFCC Feature Scaling, and Chromagrams. This wide-ranging toolkit generates rich audio-image representations, playing a pivotal role in reshaping human action recognition. By fully exploiting audio data's latent potential, PyMAiVAR stands as a significant advancement in the field. The package is implemented in Python and can be used across different operating systems.
Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin
Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin
Machine Learning Faculty Publications
Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer) across various visual tasks. However, VL-driven segmentation has been under-explored, and the existing approaches still have the burden of acquiring additional training images or even segmentation annotations to adapt a VL model to downstream segmentation tasks. In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. To tackle this challenging task, our proposed method, coined IFSeg, generates …
Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan
Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan
Computer Vision Faculty Publications
Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make over-confident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and …
3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang
3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang
Computer Vision Faculty Publications
Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we …
Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan
Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan
Computer Vision Faculty Publications
Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation …
Autonomous Shipwreck Detection & Mapping, William Ard
Autonomous Shipwreck Detection & Mapping, William Ard
LSU Master's Theses
This thesis presents the development and testing of Bruce, a low-cost hybrid Remote Operated Vehicle (ROV) / Autonomous Underwater Vehicle (AUV) system for the optical survey of marine archaeological sites, as well as a novel sonar image augmentation strategy for semantic segmentation of shipwrecks. This approach takes side-scan sonar and bathymetry data collected using an EdgeTech 2205 AUV sensor integrated with an Harris Iver3, and generates augmented image data to be used for the semantic segmentation of shipwrecks. It is shown that, due to the feature enhancement capabilities of the proposed shipwreck detection strategy, correctly identified areas have a 15% …
Smartbrush: Text And Shape Guided Object Inpainting With Diffusion Model, Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, Kun Zhang
Smartbrush: Text And Shape Guided Object Inpainting With Diffusion Model, Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, Kun Zhang
Machine Learning Faculty Publications
Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content, e.g., a text prompt can be used to describe an object with richer attributes, and a mask can be used to constrain the shape of the inpainted object rather than being only considered as a missing area. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance. While previous work such as DALLE-2 and Stable Diffusion can …