Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer vision

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 101

Full-Text Articles in Computer Engineering

Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu Mar 2024

Uncovering And Mitigating Spurious Features In Domain Generalization, Saeed Karimi, Hamdi̇ Di̇bekli̇oğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Domain generalization (DG) techniques strive to attain the ability to generalize to an unfamiliar target domain solely based on training data originating from the source domains. Despite the increasing attention given to learning from multiple training domains through the application of various forms of invariance across those domains, the enhancements observed in comparison to ERM are nearly insignificant under specified evaluation rules. In this paper, we demonstrate that the disentanglement of spurious and invariant features is a challenging task in conventional training since ERM simply minimizes the loss and does not exploit invariance among domains. To address this issue, we …


Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu Feb 2024

Motion Magnification-Inspired Feature Manipulation For Deepfake Detection, Aydamir Mirzayev, Hamdi Di̇bekli̇oğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Recent advances in deep learning, increased availability of large-scale datasets, and improvement of accelerated graphics processing units facilitated creation of an unprecedented amount of synthetically generated media content with impressive visual quality. Although such technology is used predominantly for entertainment, there is widespread practice of using deepfake technology for malevolent ends. This potential for malicious use necessitates the creation of detection methods capable of reliably distinguishing manipulated video content. In this work we aim to create a learning-based detection method for synthetically generated videos. To this end, we attempt to detect spatiotemporal inconsistencies by leveraging a learning-based magnification-inspired feature manipulation …


Ai And 6g Into The Metaverse: Fundamentals, Challenges And Future Research Trends, Muhammad Zawish, Fayaz Ali Dharejo, Sunder Ali Khowaja, Saleem Raza, Steven Davy, Kapal Dev, Paolo Bellavista Jan 2024

Ai And 6g Into The Metaverse: Fundamentals, Challenges And Future Research Trends, Muhammad Zawish, Fayaz Ali Dharejo, Sunder Ali Khowaja, Saleem Raza, Steven Davy, Kapal Dev, Paolo Bellavista

Articles

Since Facebook was renamed Meta, a lot of attention, debate, and exploration have intensified about what the Metaverse is, how it works, and the possible ways to exploit it. It is anticipated that Metaverse will be a continuum of rapidly emerging technologies, usecases, capabilities, and experiences that will make it up for the next evolution of the Internet. Several researchers have already surveyed the literature on artificial intelligence (AI) and wireless communications in realizing the Metaverse. However, due to the rapid emergence and continuous evolution of technologies, there is a need for a comprehensive and in-depth survey of the role …


Sc-Fuse: A Feature Fusion Approach For Unpaved Road Detection From Remotely Sensed Images, Aniruddh Saxena Dec 2023

Sc-Fuse: A Feature Fusion Approach For Unpaved Road Detection From Remotely Sensed Images, Aniruddh Saxena

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Road network extraction from remote sensing imagery is crucial for numerous applications, ranging from autonomous navigation to urban and rural planning. A particularly challenging aspect is the detection of unpaved roads, often underrepresented in research and data. These roads display variability in texture, width, shape, and surroundings, making their detection quite complex. This thesis addresses these challenges by creating a specialized dataset and introducing the SC-Fuse model.

Our custom dataset comprises high resolution remote sensing imagery which primarily targets unpaved roads of the American Midwest. To capture the diverse seasonal variation and their impact, the dataset includes images from different …


An In-Depth Analysis Of Domain Adaptation In Computer And Robotic Vision, Muhammad Hassan Tanveer, Zainab Fatima, Shehnila Zardari, David A. Guerra-Zubiaga Nov 2023

An In-Depth Analysis Of Domain Adaptation In Computer And Robotic Vision, Muhammad Hassan Tanveer, Zainab Fatima, Shehnila Zardari, David A. Guerra-Zubiaga

Faculty and Research Publications

This review article comprehensively delves into the rapidly evolving field of domain adaptation in computer and robotic vision. It offers a detailed technical analysis of the opportunities and challenges associated with this topic. Domain adaptation methods play a pivotal role in facilitating seamless knowledge transfer and enhancing the generalization capabilities of computer and robotic vision systems. Our methodology involves systematic data collection and preparation, followed by the application of diverse assessment metrics to evaluate the efficacy of domain adaptation strategies. This study assesses the effectiveness and versatility of conventional, deep learning-based, and hybrid domain adaptation techniques within the domains of …


Autonomous Shipwreck Detection & Mapping, William Ard Aug 2023

Autonomous Shipwreck Detection & Mapping, William Ard

LSU Master's Theses

This thesis presents the development and testing of Bruce, a low-cost hybrid Remote Operated Vehicle (ROV) / Autonomous Underwater Vehicle (AUV) system for the optical survey of marine archaeological sites, as well as a novel sonar image augmentation strategy for semantic segmentation of shipwrecks. This approach takes side-scan sonar and bathymetry data collected using an EdgeTech 2205 AUV sensor integrated with an Harris Iver3, and generates augmented image data to be used for the semantic segmentation of shipwrecks. It is shown that, due to the feature enhancement capabilities of the proposed shipwreck detection strategy, correctly identified areas have a 15% …


Learning To Rig Characters, Zhan Xu Aug 2023

Learning To Rig Characters, Zhan Xu

Doctoral Dissertations

With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging. First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons …


Detecting Road Intersections From Satellite Images Using Convolutional Neural Networks, Fatmaelzahraa Eltaher, Luis Miralles-Pechuán, Jane Courtney, Susan Mckeever Jan 2023

Detecting Road Intersections From Satellite Images Using Convolutional Neural Networks, Fatmaelzahraa Eltaher, Luis Miralles-Pechuán, Jane Courtney, Susan Mckeever

Conference papers

Automatic detection of road intersections is an important task in various domains such as navigation, route planning, traffic prediction, and road network extraction. Road intersections range from simple three-way T-junctions to complex large-scale junctions with many branches. The location of intersections is an important consideration for vulnerable road users such as People with Blindness or Visually Impairment (PBVI) or children. Route planning applications, however, do not give information about the location of intersections as this information is not available at scale. As a first step to solving this problem, a mechanism for automatically mapping road intersection locations is required, ideally …


Analyzing The Benthic Cover Of Crustose Coralline Algae Using Mask-R Cnn, Rachana Ravindra Jan 2023

Analyzing The Benthic Cover Of Crustose Coralline Algae Using Mask-R Cnn, Rachana Ravindra

Master's Projects

Coral reefs, supporting 25% of marine biodiversity, confront challenges from local and global impacts like overfishing, runoff, acidification, and warming. Crustose Coralline Algae (CCA), pivotal for reef structure and coral settlement, are underrepresented in research. Current methods like Coral Point Count with Excel Extensions (CPCe) have limitations, relying on image quality and being time-consuming. This paper proposes computer vision and Mask R-CNN, a supervised machine learning model, for CCA analysis in reef images, considering color, texture, and shape. Results indicate promise in clustering and classifying organisms. The innovative technology reduces manual labor, enhancing image analysis, simplifying the understanding of CCA’s …


Sequential Frame-Interpolation And Dct-Based Video Compression Framework, Yeganeh Jalalpour, Wu-Chi Feng, Feng Liu Dec 2022

Sequential Frame-Interpolation And Dct-Based Video Compression Framework, Yeganeh Jalalpour, Wu-Chi Feng, Feng Liu

Computer Science Faculty Publications and Presentations

Video data is ubiquitous; capturing, transferring, and storing even compressed video data is challenging because it requires substantial resources. With the large amount of video traffic being transmitted on the internet, any improvement in compressing such data, even small, can drastically impact resource consumption. In this paper, we present a hybrid video compression framework that unites the advantages of both DCT-based and interpolation-based video compression methods in a single framework. We show that our work can deliver the same visual quality or, in some cases, improve visual quality while reducing the bandwidth by 10--20%.


Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead Aug 2022

Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead

Art Faculty Articles and Research

We develop and apply a deep learning-based computer vision pipeline to automatically identify crew members in archival photographic imagery taken on-board the International Space Station. Our approach is able to quickly tag thousands of images from public and private photo repositories without human supervision with high degrees of accuracy, including photographs where crew faces are partially obscured. Using the results of our pipeline, we carry out a large-scale network analysis of the crew, using the imagery data to provide novel insights into the social interactions among crew during their missions.


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Applied Deep Learning: Case Studies In Computer Vision And Natural Language Processing, Md Reshad Ul Hoque Aug 2022

Applied Deep Learning: Case Studies In Computer Vision And Natural Language Processing, Md Reshad Ul Hoque

Electrical & Computer Engineering Theses & Dissertations

Deep learning has proved to be successful for many computer vision and natural language processing applications. In this dissertation, three studies have been conducted to show the efficacy of deep learning models for computer vision and natural language processing. In the first study, an efficient deep learning model was proposed for seagrass scar detection in multispectral images which produced robust, accurate scars mappings. In the second study, an arithmetic deep learning model was developed to fuse multi-spectral images collected at different times with different resolutions to generate high-resolution images for downstream tasks including change detection, object detection, and land cover …


Survey Of Ship Detection In Video Surveillance Based On Shallow Machine Learning, Zhenbo Bi, Shiyou Zhang, Yang Hua, Yuanhong Wu Jan 2022

Survey Of Ship Detection In Video Surveillance Based On Shallow Machine Learning, Zhenbo Bi, Shiyou Zhang, Yang Hua, Yuanhong Wu

Journal of System Simulation

Abstract: At present, detection of ship targets in video surveillance based on shallow machine learning methods is still attracting attention in the fields of underwater cultural heritage protection, marine aquaculture, maritime traffic, and port management. This paper provides a review and discussion for this kind of ship detection methods. The ship target detection based on video surveillance is divided into five parts according to the key technologies involved: preprocessing, region of interest extraction, target segmentation, ship feature extraction and ship type recognition. According to different functional modules, the core problems involved in them are pointed out, and the core ideas, …


Building An Understanding Of Human Activities In First Person Video Using Fuzzy Inference, Bradley A. Schneider Jan 2022

Building An Understanding Of Human Activities In First Person Video Using Fuzzy Inference, Bradley A. Schneider

Browse all Theses and Dissertations

Activities of Daily Living (ADL’s) are the activities that people perform every day in their home as part of their typical routine. The in-home, automated monitoring of ADL’s has broad utility for intelligent systems that enable independent living for the elderly and mentally or physically disabled individuals. With rising interest in electronic health (e-Health) and mobile health (m-Health) technology, opportunities abound for the integration of activity monitoring systems into these newer forms of healthcare. In this dissertation we propose a novel system for describing ’s based on video collected from a wearable camera. Most in-home activities are naturally defined by …


Lapnitor: A Web Service That Protects Your Laptop From Theft., Michael Ameteku Jan 2022

Lapnitor: A Web Service That Protects Your Laptop From Theft., Michael Ameteku

Williams Honors College, Honors Research Projects

Laptop theft is an issue worldwide. According to an article from 2018, Security Boulevard stated that a laptop is stolen every 53 seconds. Using a laptop's camera, we can monitor the surroundings of the laptop and reduce a laptop's probability of being stolen. According to the University of Pittsburgh, a laptop has a 1-in- 10 chance of being stolen and nearly half of these thefts occur in offices or classrooms. These thefts mostly occur when a laptop owner leaves their device unattended for a certain period of time to maybe go visit the restroom or attend to a call when …


Magic: The Gathering Card Virtualizer, Vincent Garbonick, Jacen C. Conlan, Jaret A. Varn Jan 2022

Magic: The Gathering Card Virtualizer, Vincent Garbonick, Jacen C. Conlan, Jaret A. Varn

Williams Honors College, Honors Research Projects

Any well-versed Magic: The Gathering (MTG) player or collector knows how difficult it can be to keep track of all cards in their collection. Some spend hours searching for that one specific card, and others are constantly scouring the internet for how much their collection costs. However, this issue does not only affect casual fans. Resale companies spend hours a day determining the costs of cards, and tournament judges painstakingly check players’ decks to ensure they are not cheating. To assist with these struggles, the design team proposed to create the MTG Card Virtualizer. This device scans MTG playing cards …


Kg-Cnn: Augmenting Convolutional Neural Networks With Knowledge Graphs For Multi-Class Image Classification, Aidan O'Neill Jan 2022

Kg-Cnn: Augmenting Convolutional Neural Networks With Knowledge Graphs For Multi-Class Image Classification, Aidan O'Neill

Dissertations

Computer vision is slowly becoming more and more prevalent in daily life. Tesla has recently announced that it plans to scale up the manufacturing of their Robotaxis by 2024, with this increase in self-driving vehicles being just one example, the importance of computer vision is growing year by year. Vision can be easy to take for granted, as most humans grow up using vision as their primary way of absorbing environmental information. The way humans process and classify visual information differs significantly from how current computer vision systems process and organise visual information. The human brain can use its past …


Visual Cues For Semi-Autonomous Control Of Transradial Prosthetics, Mena S.A. Kamel Aug 2021

Visual Cues For Semi-Autonomous Control Of Transradial Prosthetics, Mena S.A. Kamel

Electronic Thesis and Dissertation Repository

Upper-limb prosthetics are typically driven exclusively by biological signals, mainly electromyography (EMG), where electrodes are placed on the residual part of an amputated limb. In this approach, amputees must control each arm joint iteratively, in a proportional manner. Research has shown that sequential control of prosthetics usually imposes a cognitive burden on amputees, leading to high abandonment rates. This thesis presents a control system for upper-limb prosthetics, leveraging a computer vision module capable of simultaneously predicting objects in a scene, their segmentation mask, and a ranked list of the optimal grasping locations. The proposed system shares control with an amputee, …


Passive Method For 3d Reconstruction Of Human Jaw: Theory And Application., Mohamad Ghanoum Aug 2021

Passive Method For 3d Reconstruction Of Human Jaw: Theory And Application., Mohamad Ghanoum

Electronic Theses and Dissertations

Oral dental applications based on visual data pose various challenges. There are problems with lighting (effect of saliva, tooth dis-colorization, gum texture, and other sources of specularity) and motion (even inevitable slight motions of the upper/ lower jaw may lead to errors far beyond the desired tolerance of sub-millimeter accuracy). Nowadays, the dental CAM systems have become more compromised and accurate to obtain the geometric data of the jaw from the active sensor (laser scanner). However, they have not met the expectations and the needs of dental professionals in many ways. The probes in these systems are bulky { even …


Towards Semantic Integration Of Machine Vision Systems To Aid Manufacturing Event Understanding, Kaishu Xia, Clint Saidy, Max Kirkpatrick, Noble Anumbe, Amit Sheth, Ramy Harik Jun 2021

Towards Semantic Integration Of Machine Vision Systems To Aid Manufacturing Event Understanding, Kaishu Xia, Clint Saidy, Max Kirkpatrick, Noble Anumbe, Amit Sheth, Ramy Harik

Publications

A manufacturing paradigm shift from conventional control pyramids to decentralized, service-oriented, and cyber-physical systems (CPSs) is taking place in today’s 4th industrial revolution. Generally accepted roles and implementation recipes of cyber systems are expected to be standardized in the future of manufacturing industry. The authors intend to develop a novel CPS-enabled control architecture that accommodates: (1) intelligent information systems involving domain knowledge, empirical model, and simulation; (2) fast and secured industrial communication networks; (3) cognitive automation by rapid signal analytics and machine learning (ML) based feature extraction; (4) interoperability between machine and human. Semantic integration of process indicators is fundamental …


A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead Jun 2021

A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead

Mathematics, Physics, and Computer Science Faculty Articles and Research

In previous works, we have shown the efficacy of using Deep Belief Networks, paired with clustering, to identify distinct classes of objects within remotely sensed data via cluster analysis and qualitative analysis of the output data in comparison with reference data. In this paper, we quantitatively validate the methodology against datasets currently being generated and used within the remote sensing community, as well as show the capabilities and benefits of the data fusion methodologies used. The experiments run take the output of our unsupervised fusion and segmentation methodology and map them to various labeled datasets at different levels of global …


Machine Learning-Based Recognition On Crowdsourced Food Images, Aditya Kulkarni May 2021

Machine Learning-Based Recognition On Crowdsourced Food Images, Aditya Kulkarni

Honors Scholar Theses

With nearly a third of the world’s population suffering from food-induced chronic diseases such as obesity, the role of food in community health is required now more than ever. While current research underscores food proximity and density, there is a dearth in regard to its nutrition and quality. However, recent research in geospatial data collection and analysis as well as intelligent deep learning will help us study this further.

Employing the efficiency and interconnection of computer vision and geospatial technology, we want to study whether healthy food in the community is attainable. Specifically, with the help of deep learning in …


Accelerating Point Set Registration For Automated Aerial Refueling, Ryan M. Raettig Mar 2021

Accelerating Point Set Registration For Automated Aerial Refueling, Ryan M. Raettig

Theses and Dissertations

The goal of AAR is to control the tanker boom to safely refuel a receiving aircraft with no input or aid from the boom operator. To achieve this, the pose of the receiver relative to the tanker must be known. Point set registration is a fundamental issue used to estimate the relative pose of an object in an environment. However, it's likely a computational bottleneck of a vision processing pipeline. In addition, the matching of each sensed point with a closest truth point, nearest neighbor matching, is the most costly portion of the point set registration process. For this reason, …


Analysis Of Recent Trends In Continuous Sign Language Recognition Using Nlp, Vijayshri Nitin Khedkar, Sonali Kothari Dr, Aarohi Prasad, Arunima Mishra, Varun Saha, Vinay Kumar Mar 2021

Analysis Of Recent Trends In Continuous Sign Language Recognition Using Nlp, Vijayshri Nitin Khedkar, Sonali Kothari Dr, Aarohi Prasad, Arunima Mishra, Varun Saha, Vinay Kumar

Library Philosophy and Practice (e-journal)

Oralism is an ideology and practice that advocates communication that is based solely on speech. This practice is encouraged from a pretty early age in our country. As a consequence, the hard of hearing are constantly forced to negotiate with schools, colleges, organisations, workspaces, and families that don’t acknowledge the need and preference for sign language over oral languages. This results in inconsideration of an entire community for admissions, jobs and general social position. We aim to close that communication gap a little and take a step towards fighting the stigma associated with Sign Language. The aim is to provide …


‘Digits’ App - Smartphone Augmented Reality For Hand Telerehabilitation, Hongdao Dong, Edward Ho, Herbert Shin, Tania Banerjee, Geoffrey Masschelein, Jacob Davidson, Sandrine De Ribaupierre, Roy Eagleson, Caitlin Symonette Jan 2021

‘Digits’ App - Smartphone Augmented Reality For Hand Telerehabilitation, Hongdao Dong, Edward Ho, Herbert Shin, Tania Banerjee, Geoffrey Masschelein, Jacob Davidson, Sandrine De Ribaupierre, Roy Eagleson, Caitlin Symonette

Electrical and Computer Engineering Publications

Hand telerehabilitation currently has limitations for accurate and remote assessment of range of motion (ROM) in small finger joints. ‘DIGITS’ application utilises the front smartphone camera to measure finger ROM in a reliable and rapid assessment protocol. Our initial beta-phase testing examined the consistency of our software measurements to in-person goniometry. 6 to 9 degrees of difference existed between the smartphone application recorded data versus the in-person measurements. This range is within acceptable 7 to 9 degree tolerance for interrater goniometry measurements. The effect of environmental factors such as hand distance, lightings and hand orientation was evaluated. The intraclass correlation …


Human-Robot Collaboration Enabled By Real-Time Vision Tracking, Travis Deegan Jan 2021

Human-Robot Collaboration Enabled By Real-Time Vision Tracking, Travis Deegan

Electronic Theses and Dissertations

The number of robotic systems in the world is growing rapidly. However, most industrial robots are isolated in caged environments for the safety of users. There is an urgent need for human-in-the-loop collaborative robotic systems since robots are very good at performing precise and repetitive tasks but lack the cognitive ability and soft skills of humans. To fill this need, a key challenge is how to enable a robot to interpret its human co-worker’s motion and intention. This research addresses this challenge by developing a collaborative human-robot interface via innovations in computer vision, robotics, and system integration techniques. Specifically, this …


Ship Deck Segmentation In Engineering Document Using Generative Adversarial Networks, Mohammad Shahab Uddin, Raphael Pamie-George, Daron Wilkins, Andres Sousa Poza, Mustafa Canan, Samuel Kovacic, Jiang Li Jan 2021

Ship Deck Segmentation In Engineering Document Using Generative Adversarial Networks, Mohammad Shahab Uddin, Raphael Pamie-George, Daron Wilkins, Andres Sousa Poza, Mustafa Canan, Samuel Kovacic, Jiang Li

Engineering Management & Systems Engineering Faculty Publications

Generative adversarial networks (GANs) have become very popular in recent years. GANs have proved to be successful in different computer vision tasks including image-translation, image super-resolution etc. In this paper, we have used GAN models for ship deck segmentation. We have used 2D scanned raster images of ship decks provided by US Navy Military Sealift Command (MSC) to extract necessary information including ship walls, objects etc. Our segmentation results will be helpful to get vector and 3D image of a ship that can be later used for maintenance of the ship. We applied the trained models to engineering documents provided …


A Deep Transfer Learning Based Model For Automatic Detection Of Covid-19from Chest X-Rays, Prateek Chhikara, Prakhar Gupta, Prabhjot Singh, Tarunpreet Bhatia Jan 2021

A Deep Transfer Learning Based Model For Automatic Detection Of Covid-19from Chest X-Rays, Prateek Chhikara, Prakhar Gupta, Prabhjot Singh, Tarunpreet Bhatia

Turkish Journal of Electrical Engineering and Computer Sciences

Deep learning in medical imaging has revolutionized the way we interpret medical data, as high computational devices' capabilities are far more than their creators. With the pandemic causing havoc for the second straight year, the findings in our paper will allow researchers worldwide to use and create state-of-the-art models to detect affected persons before it reaches the R number. The paper proposes an automated diagnostic tool using the deep learning models on chest x-rays as an input to reach a point where we surpass this pandemic (COVID-19 disease). A deep transfer learning-based model for automatic detection of COVID-19 from chest …


Evaluating The Performance Of Transformer Architecture Over Attention Architecture On Image Captioning, Deepti Balasubramaniam Jan 2021

Evaluating The Performance Of Transformer Architecture Over Attention Architecture On Image Captioning, Deepti Balasubramaniam

Dissertations

Over the last few decades computer vision and Natural Language processing has shown tremendous improvement in different tasks such as image captioning, video captioning, machine translation etc using deep learning models. However, there were not much researches related to image captioning based on transformers and how it outperforms other models that were implemented for image captioning. In this study will be designing a simple encoder-decoder model, attention model and transformer model for image captioning using Flickr8K dataset where will be discussing about the hyperparameters of the model, type of pre-trained model used and how long the model has been trained. …