Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Computer vision

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 290

Full-Text Articles in Entire DC Network

Predictive Filtering-Based Image Inpainting, Xiaoguang Li Aug 2024

Predictive Filtering-Based Image Inpainting, Xiaoguang Li

Theses and Dissertations

Image inpainting is an important challenge in the computer vision field. The primary goal of image inpainting is to fill in the missing parts of an image. This technique has many real-life uses including fixing old photographs and restoring ancient artworks, e.g., the degraded Dunhuang frescoes. Moreover, image inpainting is also helpful in image editing. It has the capability to eliminate unwanted objects from images while maintaining a natural and realistic appearance, e.g., removing watermarks and subtitles. Disregarding the fact that image inpainting expects the restored result to be identical to the original clean one, existing deep generative inpainting methods …


Challenges And Practices Of Deep Learning Model Reengineering: A Case Study On Computer Vision, Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis Aug 2024

Challenges And Practices Of Deep Learning Model Reengineering: A Case Study On Computer Vision, Wenxin Jiang, Vishnu Banna, Naveen Vivek, Abhinav Goel, Nicholas Synovic, George K. Thiruvathukal, James C. Davis

Computer Science: Faculty Publications and Other Works

Many engineering organizations are reimplementing and extending deep neural networks from the research community. We describe this process as deep learning model reengineering. Deep learning model reengineering — reusing, replicating, adapting, and enhancing state-of-the-art deep learning approaches — is challenging for reasons including under-documented reference models, changing requirements, and the cost of implementation and testing.


Scene Text Detection And Recognition Via Discriminative Representation, Liang Zhao Aug 2024

Scene Text Detection And Recognition Via Discriminative Representation, Liang Zhao

Theses and Dissertations

Scene texts refer to arbitrary text presented in an image captured by a camera in the real world. The tasks of scene text detection and recognition from complex images play a crucial role in computer vision, with potential applications in scene understanding, information retrieval, robotics, autonomous driving, etc. Despite the notable progress made by existing deep-learning methods, achieving accurate text detection and recognition remains challenging for robust real-world applications. The challenges in scene text detection and recognition stem from: 1) diverse text shapes, fonts, colors, styles, layouts, etc.; 2) countless combinations of characters with unfixed attributes for complete detection, coupled …


Development Of Feature Extraction Models To Improve Image Analysis Applications In Cancer, Yu Shi Aug 2024

Development Of Feature Extraction Models To Improve Image Analysis Applications In Cancer, Yu Shi

Dissertations and Doctoral Documents from University of Nebraska-Lincoln, 2023–

Cancer poses a significant global health challenge. With an estimated 20 million new cases diagnosed worldwide in 2022 and 9.7 million fatalities attributable to the disease, the economic burden of cancer is immense. It impacts healthcare systems and imposes substantial costs for its care on patients and their families. Despite advancements in early detection, prevention, and treatment that have reduced overall cancer mortality rates, the growing prevalence of cancer, particularly among younger individuals, remains a pressing issue.

Recent advancements in medical imaging technology have progressed significantly with the help of emerging computer vision and artificial intelligence (AI) technology. Despite these …


Embodied Visions: Interactive Installations That Reimagine Bodily Presence In Digital Imaging Apparatuses As Shadows, Yunzi Shi May 2024

Embodied Visions: Interactive Installations That Reimagine Bodily Presence In Digital Imaging Apparatuses As Shadows, Yunzi Shi

Dartmouth College Master’s Theses

Contextualized within a history of technological development, the evolution of imaging devices and technologies is accompanied by the abstraction of spatial relationships between the body of the observer, the apparatus, and physical reality, which leads to disembodying experiences for the observing subject. Compared with devices and interactive experiences, critical reflection on the epistemological impact of digital imaging devices has less priority in computational imaging and human-computer interaction research. Taking an artistic approach, this thesis describes Embodied Visions, an exhibition featuring three interactive installations exploring the technical infrastructure for imaging and reflecting on the (dis)embodied experiences in the digital age. …


Star-Based Reachability Analysis Of Binary Neural Networks On Continuous Input, Mykhailo Ivashchenko May 2024

Star-Based Reachability Analysis Of Binary Neural Networks On Continuous Input, Mykhailo Ivashchenko

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Deep Neural Networks (DNNs) have become a popular instrument for solving various real-world problems. DNNs’ sophisticated structure allows them to learn complex representations and features. However, architecture specifics and floating-point number usage result in increased computational operations complexity. For this reason, a more lightweight type of neural networks is widely used when it comes to edge devices, such as microcomputers or microcontrollers – Binary Neural Networks (BNNs). Like other DNNs, BNNs are vulnerable to adversarial attacks; even a small perturbation to the input set may lead to an errant output. Unfortunately, only a few approaches have been proposed for verifying …


3-D Reconstruction For Underwater Robots With A Monocular Camera And Lights, Monika Roznere May 2024

3-D Reconstruction For Underwater Robots With A Monocular Camera And Lights, Monika Roznere

Dartmouth College Ph.D Dissertations

Before a robot can act, it must perceive its environment. Though, this is not a simple task when considering the challenges in underwater domains -- poor visibility conditions, limited sensor configurations, and lack of readily accessible localization. Underwater robots have, nevertheless, improved dramatically with more extensive sensor and navigation equipment. Robot and sensor use have enabled us to explore all reaches of our oceans. On the other hand, these same robots are not easily accessible or transferable to many practical tasks, including fishery management, infrastructure maintenance, disaster response, site conservation, and ecological surveys. There is a growing need for robots …


Sign-A-Mander: A Mobile App That Enhances Asl Learning With Computer Vision, Sandrine Adap Apr 2024

Sign-A-Mander: A Mobile App That Enhances Asl Learning With Computer Vision, Sandrine Adap

Honors Theses

Several machine learning researchers have developed algorithms recognizing American Sign Language (ASL), but few have applied the algorithms to real-world situations, such as with portable ASL learning applications. This project develops a beta version of a mobile application designed to allow beginner ASL learners to practice basic ASL vocabulary and receive feedback about their signing accuracy. Building on Dongxu Li et al.’s I3D sign language recognition algorithm and 2000-word dataset, the app seeks to determine whether the I3D algorithm can sufficiently recognize a user’s motions when recorded from a mobile device and accurately classify whether or not the user signed …


Rescape: Transforming Coral-Reefscape Images For Quantitative Analysis, Zachary Ferris, Eraldo Ribeiro, Tomofumi Nagata, Robert Van Woesik Apr 2024

Rescape: Transforming Coral-Reefscape Images For Quantitative Analysis, Zachary Ferris, Eraldo Ribeiro, Tomofumi Nagata, Robert Van Woesik

Ocean Engineering and Marine Sciences Faculty Publications

Ever since the first image of a coral reef was captured in 1885, people worldwide have been accumulating images of coral reefscapes that document the historic conditions of reefs. However, these innumerable reefscape images suffer from perspective distortion, which reduces the apparent size of distant taxa, rendering the images unusable for quantitative analysis of reef conditions. Here we solve this century-long distortion problem by developing a novel computer-vision algorithm, ReScape, which removes the perspective distortion from reefscape images by transforming them into top-down views, making them usable for quantitative analysis of reef conditions. In doing so, we demonstrate the …


Context-Aware Affective Behavior Modeling And Analytics, Md Taufeeq Uddin Apr 2024

Context-Aware Affective Behavior Modeling And Analytics, Md Taufeeq Uddin

USF Tampa Graduate Theses and Dissertations

Affective computing (AC) is a sub-domain of AI that has the potential to assist people by assessing mental states and making appropriate recommendations to patients, loved ones, caregivers, and domain experts. Humans usually produce an enormous amount of data (such as face videos) every day. One of the major challenges for affective computer vision is to efficiently deal with high volumes of data to facilitate automated model development. To cope with this challenge, we developed computer vision algorithms that measure the expressivity of the human face from video data. More precisely, the developed algorithms can map complex affect information from …


A Computer Vision Solution To Cross-Cultural Food Image Classification And Nutrition Logging​, Rohan Sethi, George K. Thiruvathukal Apr 2024

A Computer Vision Solution To Cross-Cultural Food Image Classification And Nutrition Logging​, Rohan Sethi, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

The US is a culturally and ethnically diverse country, and with this diversity comes a myriad of cuisines and eating habits that expand well beyond that of western culture. Each of these meals have their own good and bad effects when it comes to the nutritional value and its potential impact on human health. Thus, there is a greater need for people to be able to access the nutritional profile of their diverse daily meals and better manage their health. A revolutionary solution to democratize food image classification and nutritional logging is using deep learning to extract that information from …


Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu Mar 2024

Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Domain generalization (DG) techniques strive to attain the ability to generalize to an unfamiliar target domain solely based on training data originating from the source domains. Despite the increasing attention given to learning from multiple training domains through the application of various forms of invariance across those domains, the enhancements observed in comparison to ERM are nearly insignificant under specified evaluation rules. In this paper, we demonstrate that the disentanglement of spurious and invariant features is a challenging task in conventional training since ERM simply minimizes the loss and does not exploit invariance among domains. To address this issue, we …


Relative Vectoring Using Dual Object Detection For Autonomous Aerial Refueling, Derek B. Worth, Jeffrey L. Choate, James Lynch, Scott L. Nykl, Clark N. Taylor Mar 2024

Relative Vectoring Using Dual Object Detection For Autonomous Aerial Refueling, Derek B. Worth, Jeffrey L. Choate, James Lynch, Scott L. Nykl, Clark N. Taylor

Faculty Publications

Once realized, autonomous aerial refueling will revolutionize unmanned aviation by removing current range and endurance limitations. Previous attempts at establishing vision-based solutions have come close but rely heavily on near perfect extrinsic camera calibrations that often change midflight. In this paper, we propose dual object detection, a technique that overcomes such requirement by transforming aerial refueling imagery directly into receiver aircraft reference frame probe-to-drogue vectors regardless of camera position and orientation. These vectors are precisely what autonomous agents need to successfully maneuver the tanker and receiver aircraft in synchronous flight during refueling operations. Our method follows a common 4-stage process …


Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu Feb 2024

Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Recent advances in deep learning, increased availability of large-scale datasets, and improvement of accelerated graphics processing units facilitated creation of an unprecedented amount of synthetically generated media content with impressive visual quality. Although such technology is used predominantly for entertainment, there is widespread practice of using deepfake technology for malevolent ends. This potential for malicious use necessitates the creation of detection methods capable of reliably distinguishing manipulated video content. In this work we aim to create a learning-based detection method for synthetically generated videos. To this end, we attempt to detect spatiotemporal inconsistencies by leveraging a learning-based magnification-inspired feature manipulation …


Automatic Classification Of Activities In Classroom Videos, Jonathan K. Foster, Matthew Korban, Peter Youngs, Ginger S. Watson, Scott T. Acton Jan 2024

Automatic Classification Of Activities In Classroom Videos, Jonathan K. Foster, Matthew Korban, Peter Youngs, Ginger S. Watson, Scott T. Acton

VMASC Publications

Classroom videos are a common source of data for educational researchers studying classroom interactions as well as a resource for teacher education and professional development. Over the last several decades emerging technologies have been applied to classroom videos to record, transcribe, and analyze classroom interactions. With the rise of machine learning, we report on the development and validation of neural networks to classify instructional activities using video signals, without analyzing speech or audio features, from a large corpus of nearly 250 h of classroom videos from elementary mathematics and English language arts instruction. Results indicated that the neural networks performed …


A Survey On Few-Shot Class-Incremental Learning, Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari Jan 2024

A Survey On Few-Shot Class-Incremental Learning, Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Xin Ning, Prayag Tiwari

Computer Science Faculty Publications

Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Enhancing Indoors Robotic Traversability Estimation With Sensor Fusion, Christos Sevastopoulos Dec 2023

Enhancing Indoors Robotic Traversability Estimation With Sensor Fusion, Christos Sevastopoulos

Computer Science and Engineering Dissertations

Generally speaking, traversability estimation illustrates the ability to navigate or move through a particular environment (indoors or outdoors). Indoor environments are governed by uncertainty and stochasticity arising from their complex structures encapsulating both static elements like furniture and walls, as well as entities such as moving humans. In our research, we underline the importance of blending semantic and spatial information for ensuring secure navigation for a mobile robot. We show that RGB sensors suffer from constrained situational awareness of the surroundings, thus highlighting the need to incorporate spatial and geometric data, which can collaborate synergistically to enhance overall perception and …


Implementation Of Adas And Autonomy On Unlv Campus, Zillur Rahman Dec 2023

Implementation Of Adas And Autonomy On Unlv Campus, Zillur Rahman

UNLV Theses, Dissertations, Professional Papers, and Capstones

The integration of Advanced Driving Assistance Systems (ADAS) and autonomous driving functionalities into contemporary vehicles has notably surged, driven by the remarkable progress in artificial intelligence (AI). These AI systems, capable of learning from real-world data, now exhibit the capability to perceive their surroundings via a suite of sensors, create optimal routes from source to destination, and execute vehicle control akin to a human driver.

Within the context of this thesis, we undertake a comprehensive exploration of three distinct yet interrelated ADAS and Autonomy projects. Our central objective is the implementation of autonomous driving(AD) technology at UNLV campus, culminating in …


Enhanced Privacy-Enabled Face Recognition Using Κ-Identity Optimization, Ryan Karl Dec 2023

Enhanced Privacy-Enabled Face Recognition Using Κ-Identity Optimization, Ryan Karl

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Facial recognition is becoming more and more prevalent in the daily lives of the common person. Law enforcement utilizes facial recognition to find and track suspects. The newest smartphones have the ability to unlock using the user's face. Some door locks utilize facial recognition to allow correct users to enter restricted spaces. The list of applications that use facial recognition will only increase as hardware becomes more cost-effective and more computationally powerful. As this technology becomes more prevalent in our lives, it is important to understand and protect the data provided to these companies. Any data transmitted should be encrypted …


Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir Nov 2023

Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir

Computer Vision Faculty Publications

As urbanization increases, streetlights have become significant consumers of electrical power, making it imperative to develop effective control methods for sustainability. This paper offers a comprehensive review on control methods of smart streetlight systems, setting itself apart by introducing a novel light scheme framework that provides a structured classification of various light control patterns, thus filling an existing gap in the literature. Unlike previous studies, this work dives into the technical specifics of individual research papers and methodologies, ranging from basic to advanced control methods like computer vision and deep learning, while also assessing the energy consumption associated with each …


Object Recognition With Deep Neural Networks In Low-End Systems, Lillian Davis Oct 2023

Object Recognition With Deep Neural Networks In Low-End Systems, Lillian Davis

Mahurin Honors College Capstone Experience/Thesis Projects

Object recognition is an important area in computer vision. Object recognition has been advanced significantly by deep learning that unifies feature extraction and classification. In general, deep neural networks, such as Convolution Neural Networks (CNNs), are trained in high-performance systems. Aiming to extend the reach of deep learning to personal computing, I propose a study of deep learning-based object recognition in low-end systems, such as laptops. This research includes how differing layer configurations and hyperparameter values used in CNNs can either create or resolve the issue of overfitting and affect final accuracy levels of object recognition systems. The main contribution …


Stprivacy: Spatio-Temporal Privacy-Preserving Action Recognition, Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan Oct 2023

Stprivacy: Spatio-Temporal Privacy-Preserving Action Recognition, Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan

Research Collection School Of Computing and Information Systems

Existing methods of privacy-preserving action recognition (PPAR) mainly focus on frame-level (spatial) privacy removal through 2D CNNs. Unfortunately, they have two major drawbacks. First, they may compromise temporal dynamics in input videos, which are critical for accurate action recognition. Second, they are vulnerable to practical attacking scenarios where attackers probe for privacy from an entire video rather than individual frames. To address these issues, we propose a novel framework STPrivacy to perform video-level PPAR. For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification …


Pymaivar: An Open-Source Python Suit For Audio-Image Representation In Human Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar Sep 2023

Pymaivar: An Open-Source Python Suit For Audio-Image Representation In Human Action Recognition, Muhammad B. Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar

Research outputs 2022 to 2026

We present PyMAiVAR, a versatile toolbox that encompasses the generation of image representations for audio data including Wave plots, Spectral Centroids, Spectral Roll Offs, Mel Frequency Cepstral Coefficients (MFCC), MFCC Feature Scaling, and Chromagrams. This wide-ranging toolkit generates rich audio-image representations, playing a pivotal role in reshaping human action recognition. By fully exploiting audio data's latent potential, PyMAiVAR stands as a significant advancement in the field. The package is implemented in Python and can be used across different operating systems.


Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin Aug 2023

Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin

Machine Learning Faculty Publications

Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer) across various visual tasks. However, VL-driven segmentation has been under-explored, and the existing approaches still have the burden of acquiring additional training images or even segmentation annotations to adapt a VL model to downstream segmentation tasks. In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. To tackle this challenging task, our proposed method, coined IFSeg, generates …


Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan Aug 2023

Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan

Computer Vision Faculty Publications

Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make over-confident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and …


3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang Aug 2023

3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang

Computer Vision Faculty Publications

Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we …


Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan Aug 2023

Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation …


Autonomous Shipwreck Detection & Mapping, William Ard Aug 2023

Autonomous Shipwreck Detection & Mapping, William Ard

LSU Master's Theses

This thesis presents the development and testing of Bruce, a low-cost hybrid Remote Operated Vehicle (ROV) / Autonomous Underwater Vehicle (AUV) system for the optical survey of marine archaeological sites, as well as a novel sonar image augmentation strategy for semantic segmentation of shipwrecks. This approach takes side-scan sonar and bathymetry data collected using an EdgeTech 2205 AUV sensor integrated with an Harris Iver3, and generates augmented image data to be used for the semantic segmentation of shipwrecks. It is shown that, due to the feature enhancement capabilities of the proposed shipwreck detection strategy, correctly identified areas have a 15% …


Smartbrush: Text And Shape Guided Object Inpainting With Diffusion Model, Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, Kun Zhang Aug 2023

Smartbrush: Text And Shape Guided Object Inpainting With Diffusion Model, Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, Kun Zhang

Machine Learning Faculty Publications

Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content, e.g., a text prompt can be used to describe an object with richer attributes, and a mask can be used to constrain the shape of the inpainted object rather than being only considered as a missing area. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance. While previous work such as DALLE-2 and Stable Diffusion can …