Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (30)
- Artificial Intelligence and Robotics (12)
- Engineering (6)
- Numerical Analysis and Scientific Computing (5)
- Databases and Information Systems (4)
-
- Software Engineering (4)
- Medicine and Health Sciences (3)
- Computer Engineering (2)
- Graphics and Human Computer Interfaces (2)
- Other Computer Sciences (2)
- Behavioral Disciplines and Activities (1)
- Bioinformatics (1)
- Chemistry (1)
- Computational Engineering (1)
- Digital Communications and Networking (1)
- Electrical and Computer Engineering (1)
- Engineering Science and Materials (1)
- Environmental Sciences (1)
- Industrial Engineering (1)
- Information Security (1)
- Life Sciences (1)
- Mechanical Engineering (1)
- Operations Research, Systems Engineering and Industrial Engineering (1)
- Other Chemistry (1)
- Other Computer Engineering (1)
- Other Physical Sciences and Mathematics (1)
- Pharmaceutics and Drug Design (1)
- Pharmacology (1)
- Pharmacology, Toxicology and Environmental Health (1)
- Institution
-
- Singapore Management University (8)
- Purdue University (3)
- Technological University Dublin (3)
- California Polytechnic State University, San Luis Obispo (2)
- Southern Methodist University (2)
-
- University of Arkansas, Fayetteville (2)
- Utah State University (2)
- City University of New York (CUNY) (1)
- Missouri University of Science and Technology (1)
- San Jose State University (1)
- University of Kentucky (1)
- University of Nevada, Las Vegas (1)
- University of South Florida (1)
- University of Texas at El Paso (1)
- Washington University in St. Louis (1)
- Wayne State University (1)
- Publication
-
- Research Collection School Of Computing and Information Systems (8)
- The Summer Undergraduate Research Fellowship (SURF) Symposium (3)
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (2)
- Conference papers (2)
- Graduate Theses and Dissertations (2)
-
- SMU Data Science Review (2)
- Arts & Sciences Electronic Theses and Dissertations (1)
- Computer Engineering (1)
- Dissertations, Theses, and Capstone Projects (1)
- Master's Projects (1)
- Master's Theses (1)
- Mechanical and Aerospace Engineering Faculty Research & Creative Works (1)
- Open Access Theses & Dissertations (1)
- Theses (1)
- Theses and Dissertations--Computer Science (1)
- UNLV Theses, Dissertations, Professional Papers, and Capstones (1)
- USF Tampa Graduate Theses and Dissertations (1)
- Wayne State University Dissertations (1)
- Publication Type
Articles 1 - 30 of 31
Full-Text Articles in Physical Sciences and Mathematics
Deep Visual Recommendation System, Raksha Sunil
Deep Visual Recommendation System, Raksha Sunil
Master's Projects
Recommendation system is a filtering system that predicts ratings or preferences that a user might have. Recommendation system is an evolved form of our trivial information retrieval systems. In this paper, we present a technique to solve new item cold start problem. New item cold start problem occurs when a new item is added to a shopping website like Amazon.com. There is no metadata for this item, no ratings and no reviews because it’s a new item in the system. Absence of data results in no recommendation or bad recommendations. Our approach to solve new item cold start problem requires …
Uas-Based Object Tracking Via Deep Learning, Marc Dinh
Uas-Based Object Tracking Via Deep Learning, Marc Dinh
UNLV Theses, Dissertations, Professional Papers, and Capstones
Tracking is the task of identifying an object of interest and detect its position over time, and has numerous applications like surveillance, security and traffic control. In present times, unmanned aerial vehicles (UAV) have been more and more common which provides us with a new and less explored domain, with an ideal vantage point for surveillance and monitoring applications.. Aerial tracking is a particularly challenging task as it introduces new environmental variables such as rapid motion in 3D space. We propose a new deep learned tracker architecture catered to aerial tracking.
First, a study of six state-of-the-art deep learned trackers …
Modeling The Bioactivation And Subsequent Reactivity Of Drugs, Tyler Brian Hughes
Modeling The Bioactivation And Subsequent Reactivity Of Drugs, Tyler Brian Hughes
Arts & Sciences Electronic Theses and Dissertations
Metabolism can convert drugs to harmful reactive metabolites that conjugate to DNA and off-target proteins. Reactive metabolites are a significant driver of both drug candidate attrition and withdrawal from the market of already approved drugs. Unfortunately, reactive metabolites are difficult to study in vivo, because they are transitory and generally do not circulate. Instead, this work computationally models both metabolism and reactivity. Using deep learning, predictive models were developed for the metabolic formation of quinones and epoxides, which together account for about half of known reactive metabolites. Additionally, an accurate model of DNA and protein reactivity was constructed, which predicts …
Deep Air Learning: Interpolation, Prediction, And Feature Analysis Of Fine-Grained Air Quality, Zhongang Qi, Tianchun Wang, Guojie Song, Weisong Hu, Xi Li, Zhongfei Mark Zhang
Deep Air Learning: Interpolation, Prediction, And Feature Analysis Of Fine-Grained Air Quality, Zhongang Qi, Tianchun Wang, Guojie Song, Weisong Hu, Xi Li, Zhongfei Mark Zhang
Research Collection School Of Computing and Information Systems
The interpolation, prediction, and feature analysis of fine-gained air quality are three important topics in the area of urban air computing. The solutions to these topics can provide extremely useful information to support air pollution control, and consequently generate great societal and technical impacts. Most of the existing work solves the three problems separately by different models. In this paper, we propose a general and effective approach to solve the three problems in one model called the Deep Air Learning (DAL). The main idea of DAL lies in embedding feature selection and semi-supervised learning in different layers of the deep …
A Multi-Task Approach To Incremental Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher
A Multi-Task Approach To Incremental Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher
Conference papers
Incrementality is a fundamental feature of language in real world use. To this point, however, the vast majority of work in automated dialogue processing has focused on language as turn based. In this paper we explore the challenge of incremental dialogue state tracking through the development and analysis of a multi-task approach to incremental dialogue state tracking. We present the design of our incremental dialogue state tracker in detail and provide evaluation against the well known Dialogue State Tracking Challenge 2 (DSTC2) dataset. In addition to a standard evaluation of the tracker, we also provide an analysis of the Incrementality …
Dsm: A Specification Mining Tool Using Recurrent Neural Network Based Language Model, Tien-Duy B. Le, Lingfeng Bao, David Lo
Dsm: A Specification Mining Tool Using Recurrent Neural Network Based Language Model, Tien-Duy B. Le, Lingfeng Bao, David Lo
Research Collection School Of Computing and Information Systems
Formal specifications are important but often unavailable. Furthermore, writing these specifications is time-consuming and requires skills from developers. In this work, we present Deep Specification Miner (DSM), an automated tool that applies deep learning to mine finite-state automaton (FSA) based specifications. DSM accepts as input a set of execution traces to train a Recurrent Neural Network Language Model (RNNLM). From the input traces, DSM creates a Prefix Tree Acceptor (PTA) and leverages the inferred RNNLM to extract many features. These features are then forwarded to clustering algorithms for merging similar automata states in the PTA for assembling a number of …
Emotion Recognition Using Deep Convolutional Neural Network With Large Scale Physiological Data, Astha Sharma
Emotion Recognition Using Deep Convolutional Neural Network With Large Scale Physiological Data, Astha Sharma
USF Tampa Graduate Theses and Dissertations
Classification of emotions plays a very important role in affective computing and has real-world applications in fields as diverse as entertainment, medical, defense, retail, and education. These applications include video games, virtual reality, pain recognition, lie detection, classification of Autistic Spectrum Disorder (ASD), analysis of stress levels, and determining attention levels. This vast range of applications motivated us to study automatic emotion recognition which can be done by using facial expression, speech, and physiological data.
A person’s physiological signals such are heart rate, and blood pressure are deeply linked with their emotional states and can be used to identify a …
Prediction Of Relatedness In Stack Overflow: Deep Learning Vs. Svm: A Reproducibility Study, Bowen Xu, Amirreza Shirani, David Lo, Mohammad Amin Alipour
Prediction Of Relatedness In Stack Overflow: Deep Learning Vs. Svm: A Reproducibility Study, Bowen Xu, Amirreza Shirani, David Lo, Mohammad Amin Alipour
Research Collection School Of Computing and Information Systems
Background Xu et al. used a deep neural network (DNN) technique to classify the degree of relatedness between two knowledge units (question-answer threads) on Stack Overflow. More recently, extending Xu et al.'s work, Fu and Menzies proposed a simpler classification technique based on a fine-tuned support vector machine (SVM) that achieves similar performance but in a much shorter time. Thus, they suggested that researchers need to compare their sophisticated methods against simpler alternatives.Aim The aim of this work is to replicate the previous studies and further investigate the validity of Fu and Menzies' claim by evaluating the DNN- and SVM-based …
Fake News Detection: A Deep Learning Approach, Aswini Thota, Priyanka Tilak, Simrat Ahluwalia, Nibrat Lohia
Fake News Detection: A Deep Learning Approach, Aswini Thota, Priyanka Tilak, Simrat Ahluwalia, Nibrat Lohia
SMU Data Science Review
Fake news is defined as a made-up story with an intention to deceive or to mislead. In this paper we present the solution to the task of fake news detection by using Deep Learning architectures. Gartner research [1] predicts that “By 2022, most people in mature economies will consume more false information than true information”. The exponential increase in production and distribution of inaccurate news presents an immediate need for automatically tagging and detecting such twisted news articles. However, automated detection of fake news is a hard task to accomplish as it requires the model to understand nuances in natural …
Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin
Deep Machine Learning For Mechanical Performance And Failure Prediction, Elijah Reber, Nickolas D. Winovich, Guang Lin
The Summer Undergraduate Research Fellowship (SURF) Symposium
Deep learning has provided opportunities for advancement in many fields. One such opportunity is being able to accurately predict real world events. Ensuring proper motor function and being able to predict energy output is a valuable asset for owners of wind turbines. In this paper, we look at how effective a deep neural network is at predicting the failure or energy output of a wind turbine. A data set was obtained that contained sensor data from 17 wind turbines over 13 months, measuring numerous variables, such as spindle speed and blade position and whether or not the wind turbine experienced …
Investigating Dataset Distinctiveness, Andrew Ulmer, Kent W. Gauen, Yung-Hsiang Lu, Zohar R. Kapach, Daniel P. Merrick
Investigating Dataset Distinctiveness, Andrew Ulmer, Kent W. Gauen, Yung-Hsiang Lu, Zohar R. Kapach, Daniel P. Merrick
The Summer Undergraduate Research Fellowship (SURF) Symposium
Just as a human might struggle to interpret another human’s handwriting, a computer vision program might fail when asked to perform one task in two different domains. To be more specific, visualize a self-driving car as a human driver who had only ever driven on clear, sunny days, during daylight hours. This driver – the self-driving car – would inevitably face a significant challenge when asked to drive when it is violently raining or foggy during the night, putting the safety of its passengers in danger. An extensive understanding of the data we use to teach computer vision models – …
Deep Neural Network Architectures For Modulation Classification Using Principal Component Analysis, Sharan Ramjee, Shengtai Ju, Diyu Yang, Aly El Gamal
Deep Neural Network Architectures For Modulation Classification Using Principal Component Analysis, Sharan Ramjee, Shengtai Ju, Diyu Yang, Aly El Gamal
The Summer Undergraduate Research Fellowship (SURF) Symposium
In this work, we investigate the application of Principal Component Analysis to the task of wireless signal modulation recognition using deep neural network architectures. Sampling signals at the Nyquist rate, which is often very high, requires a large amount of energy and space to collect and store the samples. Moreover, the time taken to train neural networks for the task of modulation classification is large due to the large number of samples. These problems can be drastically reduced using Principal Component Analysis, which is a technique that allows us to reduce the dimensionality or number of features of the samples …
A Deep Learning Approach To Recognizing Bees In Video Analysis Of Bee Traffic, Astha Tiwari
A Deep Learning Approach To Recognizing Bees In Video Analysis Of Bee Traffic, Astha Tiwari
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Colony Collapse Disorder (CCD) has been a major threat to bee colonies around the world which affects vital human food crop pollination. The decline in bee population can have tragic consequences, for humans as well as the bees and the ecosystem. Bee health has been a cause of urgent concern for farmers and scientists around the world for at least a decade but a specific cause for the phenomenon has yet to be conclusively identified.
This work uses Artificial Intelligence and Computer Vision approaches to develop and analyze techniques to help in continuous monitoring of bee traffic which will further …
Word Recognition In Nutrition Labels With Convolutional Neural Network, Anuj Khasgiwala
Word Recognition In Nutrition Labels With Convolutional Neural Network, Anuj Khasgiwala
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Nowadays, everyone is very busy and running around trying to maintain a balance between their work life and family, as the working hours are increasing day by day. In such hassled life people either ignore or do not give enough attention to a healthy diet. An imperative part of a healthy eating routine is the cognizance and maintenance of nourishing data and comprehension of how extraordinary sustenance and nutritious constituents influence our bodies. Besides in the USA, in many other countries, nutritional information is fundamentally passed on to consumers through nutrition labels (NLs) which can be found in all packaged …
Deep Specification Mining, Tien-Duy B. Le, David Lo
Deep Specification Mining, Tien-Duy B. Le, David Lo
Research Collection School Of Computing and Information Systems
Formal specifications are essential but usually unavailable in software systems. Furthermore, writing these specifications is costly and requires skills from developers. Recently, many automated techniques have been proposed to mine specifications in various formats including finite-state automaton (FSA). However, more works in specification mining are needed to further improve the accuracy of the inferred specifications. In this work, we propose Deep Specification Miner (DSM), a new approach that performs deep learning for mining FSA-based specifications. Our proposed approach uses test case generation to generate a richer set of execution traces for training a Recurrent Neural Network Based Language Model (RNNLM). …
Online Deep Learning: Learning Deep Neural Networks On The Fly, Doyen Sahoo, Hong Quang Pham, Jing Lu, Steven C. H. Hoi
Online Deep Learning: Learning Deep Neural Networks On The Fly, Doyen Sahoo, Hong Quang Pham, Jing Lu, Steven C. H. Hoi
Research Collection School Of Computing and Information Systems
Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch setting, requiring the entire training data to be made available prior to the learning task. This is not scalable for many real-world scenarios where new data arrives sequentially in a stream. We aim to address an open challenge of “Online Deep Learning” (ODL) for learning DNNs on the fly in an online setting. Unlike traditional online learning that often optimizes some convex objective function with respect to a shallow model (e.g., a linear/kernel-based hypothesis), ODL is more challenging as the optimization objective is non-convex, and regular DNN with …
Extractive Text Summarization With Deep Learning, Garrett G. Chan
Extractive Text Summarization With Deep Learning, Garrett G. Chan
Computer Engineering
This project explores extractive text summarization using the capabilities of Deep Learning. The goal of this project is to create an application with a neural network to take in text as its input, and create a summary that is a shorter, condensed version of the input text. This has been implemented in Python by configuring and training a neural network that takes in a vector of features that are extracted from the text using various Natural Language Processing libraries. The implementation demonstrates that we can train simple deep neural networks to successfully summarize text.
Worker Activity Recognition In Smart Manufacturing Using Imu And Semg Signals With Convolutional Neural Networks, Wenjin Tao, Ze-Hao Lai, Ming-Chuan Leu, Zhaozheng Yin
Worker Activity Recognition In Smart Manufacturing Using Imu And Semg Signals With Convolutional Neural Networks, Wenjin Tao, Ze-Hao Lai, Ming-Chuan Leu, Zhaozheng Yin
Mechanical and Aerospace Engineering Faculty Research & Creative Works
In a smart manufacturing system involving workers, recognition of the worker's activity can be used for quantification and evaluation of the worker's performance, as well as to provide onsite instructions with augmented reality. In this paper, we propose a method for activity recognition using Inertial Measurement Unit (IMU) and surface electromyography (sEMG) signals obtained from a Myo armband. The raw 10-channel IMU signals are stacked to form a signal image. This image is transformed into an activity image by applying Discrete Fourier Transformation (DFT) and then fed into a Convolutional Neural Network (CNN) for feature extraction, resulting in a high-level …
D-Pruner: Filter-Based Pruning Method For Deep Convolutional Neural Network, Nguyen Loc Huynh, Youngki Lee, Rajesh Krishna Balan
D-Pruner: Filter-Based Pruning Method For Deep Convolutional Neural Network, Nguyen Loc Huynh, Youngki Lee, Rajesh Krishna Balan
Research Collection School Of Computing and Information Systems
The emergence of augmented reality devices such as Google Glass and Microsoft Hololens has opened up a new class of vision sensing applications. Those applications often require the ability to continuously capture and analyze contextual information from video streams. They often adopt various deep learning algorithms such as convolutional neural networks (CNN) to achieve high recognition accuracy while facing severe challenges to run computationally intensive deep learning algorithms on resource-constrained mobile devices. In this paper, we propose and explore a new class of compression technique called D-Pruner to efficiently prune redundant parameters within a CNN model to run the model …
D-Pruner: Filter-Based Pruning Method For Deep Convolutional Neural Network, Nguyen Loc Huynh, Youngki Lee, Rajesh Krishna Balan
D-Pruner: Filter-Based Pruning Method For Deep Convolutional Neural Network, Nguyen Loc Huynh, Youngki Lee, Rajesh Krishna Balan
Research Collection School Of Computing and Information Systems
The emergence of augmented reality devices such as Google Glass and Microsoft Hololens has opened up a new class of vision sensing applications. Those applications often require the ability to continuously capture and analyze contextual information from video streams. They often adopt various deep learning algorithms such as convolutional neural networks (CNN) to achieve high recognition accuracy while facing severe challenges to run computationally intensive deep learning algorithms on resource-constrained mobile devices. In this paper, we propose and explore a new class of compression technique called D-Pruner to efficiently prune redundant parameters within a CNN model to run the model …
A Continuous Space Generative Model, Erzen Komoni
A Continuous Space Generative Model, Erzen Komoni
Graduate Theses and Dissertations
Generative models are a class of machine learning models capable of producing digital images with plausibly realistic properties. They are useful in such applications as visualizing designs, rendering game scenes, and improving images at higher magnifications. Unfortunately, existing generative models generate only images with a discrete predetermined resolution. This paper presents the Continuous Space Generative Model (CSGM), a novel generative model capable of generating images as a continuous function, rather than as a discrete set of pixel values. Like generative adversarial networks, CSGM trains by alternating between generative and discriminative steps. But unlike generative adversarial networks, CSGM uses only one …
Improving Asynchronous Advantage Actor Critic With A More Intelligent Exploration Strategy, James B. Holliday
Improving Asynchronous Advantage Actor Critic With A More Intelligent Exploration Strategy, James B. Holliday
Graduate Theses and Dissertations
We propose a simple and efficient modification to the Asynchronous Advantage Actor Critic (A3C)
algorithm that improves training. In 2016 Google’s DeepMind set a new standard for state-of-theart
reinforcement learning performance with the introduction of the A3C algorithm. The goal of
this research is to show that A3C can be improved by the use of a new novel exploration strategy we
call “Follow then Forage Exploration” (FFE). FFE forces the agents to follow the best known path
at the beginning of a training episode and then later in the episode the agent is forced to “forage”
and explores randomly. In …
Walknet: A Deep Learning Approach To Improving Sidewalk Quality And Accessibility, Andrew Abbott, Alex Deshowitz, Dennis Murray, Eric C. Larson
Walknet: A Deep Learning Approach To Improving Sidewalk Quality And Accessibility, Andrew Abbott, Alex Deshowitz, Dennis Murray, Eric C. Larson
SMU Data Science Review
This paper proposes a framework for optimizing allocation of infrastructure spending on sidewalk improvement and allowing planners to focus their budgets on the areas in the most need. In this research, we identify curb ramps from Google Street View images using traditional machine learning and deep learning methods. Our convolutional neural network approach achieved an 83% accuracy and high level of precision when classifying curb cuts. We found that as the model received more data, the accuracy increased, which with the continued collection of crowdsourced labeling of curb cuts will increase the model’s classification power. We further investigated a model …
Detecting Speakers In Video Footage, Michael Williams
Detecting Speakers In Video Footage, Michael Williams
Master's Theses
Facial recognition is a powerful tool for identifying people visually. Yet, when the end goal is more specific than merely identifying the person in a picture problems can arise. Speaker identification is one such task which expects more predictive power out of a facial recognition system than can be provided on its own. Speaker identification is the task of identifying who is speaking in video not simply who is present in the video. This extra requirement introduces numerous false positives into the facial recognition system largely due to one main scenario. The person speaking is not on camera. This paper …
Urlnet: Learning A Url Representation With Deep Learning For Malicious Url Detection, Hung Le, Hong Quang Pham, Doyen Sahoo, Steven C. H. Hoi
Urlnet: Learning A Url Representation With Deep Learning For Malicious Url Detection, Hung Le, Hong Quang Pham, Doyen Sahoo, Steven C. H. Hoi
Research Collection School Of Computing and Information Systems
Malicious URLs host unsolicited content and are used to perpetrate cybercrimes. It is imperative to detect them in a timely manner. Traditionally, this is done through the usage of blacklists, which cannot be exhaustive, and cannot detect newly generated malicious URLs. To address this, recent years have witnessed several efforts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs. There are also other features designed by experts to improve the prediction performance of the …
Multimodal Sensing And Data Processing For Speaker And Emotion Recognition Using Deep Learning Models With Audio, Video And Biomedical Sensors, Farnaz Abtahi
Dissertations, Theses, and Capstone Projects
The focus of the thesis is on Deep Learning methods and their applications on multimodal data, with a potential to explore the associations between modalities and replace missing and corrupt ones if necessary. We have chosen two important real-world applications that need to deal with multimodal data: 1) Speaker recognition and identification; 2) Facial expression recognition and emotion detection.
The first part of our work assesses the effectiveness of speech-related sensory data modalities and their combinations in speaker recognition using deep learning models. First, the role of electromyography (EMG) is highlighted as a unique biometric sensor in improving audio-visual speaker …
Data-Driven Modeling For Decision Support Systems And Treatment Management In Personalized Healthcare, Milad Zafar Nezhad
Data-Driven Modeling For Decision Support Systems And Treatment Management In Personalized Healthcare, Milad Zafar Nezhad
Wayne State University Dissertations
Massive amount of electronic medical records (EMRs) accumulating from patients and populations motivates clinicians and data scientists to collaborate for the advanced analytics to create knowledge that is essential to address the extensive personalized insights needed for patients, clinicians, providers, scientists, and health policy makers. Learning from large and complicated data is using extensively in marketing and commercial enterprises to generate personalized recommendations. Recently the medical research community focuses to take the benefits of big data analytic approaches and moves to personalized (precision) medicine. So, it is a significant period in healthcare and medicine for transferring to a new paradigm. …
Deep Learning Models For Scoring Protein-Ligand Interaction Energies, Md Mahmudulla Hassan
Deep Learning Models For Scoring Protein-Ligand Interaction Energies, Md Mahmudulla Hassan
Open Access Theses & Dissertations
In recent years, the cheminformatics community has seen an increased success with machine learning-based scoring functions for estimating binding affinities. The prediction of protein-ligand binding affinities is crucial for drug discovery research. Many physics-based scoring functions have been developed over the years. Lately, machine learning approaches are proven to boost the performance of traditional scoring functions. In this study, two scoring functions were developed; one is based on the Convolutional Neural Networks and the other one, called DLSCORE, is based on an ensemble of fully connected neural networks. Both the models were trained on the refined PDBbind (v.2016) dataset using …
Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones
Scalable Feature Selection And Extraction With Applications In Kinase Polypharmacology, Derek Jones
Theses and Dissertations--Computer Science
In order to reduce the time associated with and the costs of drug discovery, machine learning is being used to automate much of the work in this process. However the size and complex nature of molecular data makes the application of machine learning especially challenging. Much work must go into the process of engineering features that are then used to train machine learning models, costing considerable amounts of time and requiring the knowledge of domain experts to be most effective. The purpose of this work is to demonstrate data driven approaches to perform the feature selection and extraction steps in …
Generating Diverse And Meaningful Captions: Unsupervised Specificity Optimization For Image Captioning, Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher
Generating Diverse And Meaningful Captions: Unsupervised Specificity Optimization For Image Captioning, Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher
Conference papers
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty.
We make our …