Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Physical Sciences and Mathematics

Update Frequency And Background Corpus Selection In Dynamic Tf-Idf Models For First Story Detection, Fei Wang, Robert J. Ross, John D. Kelleher Oct 2019

Update Frequency And Background Corpus Selection In Dynamic Tf-Idf Models For First Story Detection, Fei Wang, Robert J. Ross, John D. Kelleher

Conference papers

First Story Detection (FSD) requires a system to detect the very first story that mentions an event from a stream of stories. Nearest neighbour-based models, using the traditional term vector document representations like TF-IDF, currently achieve the state of the art in FSD. Because of its online nature, a dynamic term vector model that is incrementally updated during the detection process is usually adopted for FSD instead of a static model. However, very little research has investigated the selection of hyper-parameters and the background corpora for a dynamic model. In this paper, we analyse how a dynamic term vector model …


Capturing Dialogue State Variable Dependencies With An Energy-Based Neural Dialogue State Tracker, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Sep 2019

Capturing Dialogue State Variable Dependencies With An Energy-Based Neural Dialogue State Tracker, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

Dialogue state tracking requires the population and maintenance of a multi-slot frame representation of the dialogue state. Frequently, dialogue state tracking systems assume independence between slot values within a frame. In this paper we argue that treating the prediction of each slot value as an independent prediction task may ignore important associations between the slot values, and, consequently, we argue that treating dialogue state tracking as a structured prediction problem can help to improve dialogue state tracking performance. To support this argument, the research presented in this paper is structured into three stages: (i) analyzing variable dependencies in dialogue data; …


Investigating Variable Dependencies In Dialogue States, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Sep 2019

Investigating Variable Dependencies In Dialogue States, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

Dialogue State Tracking is arguably one of the most challenging tasks among dialogue processing problems due to the uncertainties of language and complexity of dialogue contexts. We argue that this problem is made more challenging by variable dependencies in the dialogue states that must be accounted for in processing. In this paper we give details on our motivation for this argument through statistical tests on a number of dialogue datasets. We also propose a machine learning-based approach called energy-based learning that tackles variable dependencies while performing prediction on the dialogue state tracking tasks.


Bigger Versus Similar: Selecting A Background Corpus For First Story Detection Based On Distributional Similarity, Fei Wang, Robert J. Ross, John D. Kelleher Sep 2019

Bigger Versus Similar: Selecting A Background Corpus For First Story Detection Based On Distributional Similarity, Fei Wang, Robert J. Ross, John D. Kelleher

Conference papers

The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale …


Estimating Distributed Representation Performance In Disaster-Related Social Media Classification, Pallavi Jain, Robert J. Ross, Bianca Schoen-Phelan Sep 2019

Estimating Distributed Representation Performance In Disaster-Related Social Media Classification, Pallavi Jain, Robert J. Ross, Bianca Schoen-Phelan

Conference papers

This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language …


Energy-Based Modelling For Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Aug 2019

Energy-Based Modelling For Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

The uncertainties of language and the complexity of dialogue contexts make accurate dialogue state tracking one of the more challenging aspects of dialogue processing. To improve state tracking quality, we argue that relationships between different aspects of dialogue state must be taken into account as they can often guide a more accurate interpretation process. To this end, we present an energy-based approach to dialogue state tracking as a structured classification task. The novelty of our approach lies in the use of an energy network on top of a deep learning architecture to explore more signal correlations between network variables including …


Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher Jul 2019

Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find …


Comparative Study Of Feature Representations For Disaster Tweet Classification, Pallavi Jain, Bianca Schoen-Phelan, Robert J. Ross May 2019

Comparative Study Of Feature Representations For Disaster Tweet Classification, Pallavi Jain, Bianca Schoen-Phelan, Robert J. Ross

Other resources

Twitter is a popular social media platform where users publicly broadcast short messages on a myriad of topics. In recent years it has enjoyed an increased usage around disaster events due to availability of information in near real time. Additionally, enhanced information representations to facilitate the classification of social media in terms of relevancy and type of information is currently a highly active research area (Ashktorab et al., 2014, Imran et al., 2014, Win et al., 2018). In this work we consider the usefulness and reliability of a range of representation models in the analysis of disaster related social media.


Examining The Limits Of Predictability Of Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John D. Kelleher Apr 2019

Examining The Limits Of Predictability Of Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John D. Kelleher

Articles

We challenge the upper bound of human-mobility predictability that is widely used to corroborate the accuracy of mobility prediction models. We observe that extensions of recurrent-neural network architectures achieve significantly higher prediction accuracy, surpassing this upper bound. Given this discrepancy, the central objective of our work is to show that the methodology behind the estimation of the predictability upper bound is erroneous and identify the reasons behind this discrepancy. In order to explain this anomaly, we shed light on several underlying assumptions that have contributed to this bias. In particular, we highlight the consequences of the assumed Markovian nature of …


A U-Net Deep Learning Framework For High Performance Vessel Segmentation In Paitents With Cerebrovascular Disease, Michelle Livne, Jana Rieger, Orhun Utku Aydin, Abdel Aziz Taha, Ela Maria Akay, Tabea Kossen, Jan Sobesky, John D. Kelleher, Kristian Hildebrand, Dietmar Frey, Vince I. Madai Feb 2019

A U-Net Deep Learning Framework For High Performance Vessel Segmentation In Paitents With Cerebrovascular Disease, Michelle Livne, Jana Rieger, Orhun Utku Aydin, Abdel Aziz Taha, Ela Maria Akay, Tabea Kossen, Jan Sobesky, John D. Kelleher, Kristian Hildebrand, Dietmar Frey, Vince I. Madai

Articles

Brain vessel status is a promising biomarker for better prevention and treatment in cerebrovascular disease. However, classic rule-based vessel segmentation algorithms need to be hand-crafted and are insufficiently validated. A specialized deep learning method—the U-net—is a promising alternative. Using labeled data from 66 patients with cerebrovascular disease, the U-net framework was optimized and evaluated with three metrics: Dice coefficient, 95% Hausdorff distance (95HD) and average Hausdorff distance (AVD). The model performance was compared with the traditional segmentation method of graph-cuts. Training and reconstruction was performed using 2D patches. A full and a reduced architecture with less parameters were trained. We …


Automatic Acquisition Of Annotated Training Corpora For Test-Code Generation, Magdalena Kacmajor, John D. Kelleher Feb 2019

Automatic Acquisition Of Annotated Training Corpora For Test-Code Generation, Magdalena Kacmajor, John D. Kelleher

Articles

Open software repositories make large amounts of source code publicly available. Potentially, this source code could be used as training data to develop new, machine learning-based programming tools. For many applications, however, raw code scraped from online repositories does not constitute an adequate training dataset. Building on the recent and rapid improvements in machine translation (MT), one possibly very interesting application is code generation from natural language descriptions. One of the bottlenecks in developing these MT-inspired systems is the acquisition of parallel text-code corpora required for training code-generative models. This paper addresses the problem of automatically synthetizing parallel text-code corpora …


Weakly-Admissible Semantics And The Propagation Of Ambiguity In Abstract Argumentation Semantics, Pierpaolo Dondio Feb 2019

Weakly-Admissible Semantics And The Propagation Of Ambiguity In Abstract Argumentation Semantics, Pierpaolo Dondio

Other

The concept of ambiguous literals of defeasible logics is mapped to the set of undecided arguments identified by an argumentation semantics. It follows that Dung’s complete semantics are all ambiguity propagating, since the undecided status of an attacking argument is always propagated to the attacked argument, unless the latter is defeated by another accepted argument. In this paper we investigate a novel family of abstract argumentation semantics, called weakly-admissible semantics, where we do not require an acceptable argument to be necessarily defended from the attacks of undecided arguments. Weakly-admissible semantics are conflict-free, ambiguity blocking, non-admissible (in Dung’s sense), but employing …


Audio Mixing Using Image Neural Style Transfer Networks, Susan Mckeever, Xuehao Liu, Sarah Jane Delany Jan 2019

Audio Mixing Using Image Neural Style Transfer Networks, Susan Mckeever, Xuehao Liu, Sarah Jane Delany

Conference papers

Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare …


The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …


Multi-Spectral Visual Crop Assessment Under Limited Data Constraints, Patricia O'Byrne, Patrick Jackman, Damon Berry, Hector-Hugo Franco-Penya, Michael French, Robert J. Ross Jan 2019

Multi-Spectral Visual Crop Assessment Under Limited Data Constraints, Patricia O'Byrne, Patrick Jackman, Damon Berry, Hector-Hugo Franco-Penya, Michael French, Robert J. Ross

Conference papers

In an era of climate change and global population growth, deep learning based multi-spectral imaging has the potential to significantly assist in production management across a wide range of agricultural and food production domains. A key challenge however in applying state-of-the-art methods is that they, unlike classical hand crafted methods, are usually thought of as being only useful when significant amounts of data are available. In this paper we investigate this hypothesis by examining the performance of state-of-the-art deep learning methods when applied to a restricted data set that is not easily bootstrapped through pre-trained image processing networks. We demonstrate …


On The Inability Of Markov Models To Capture Criticality In Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John Kelleher Jan 2019

On The Inability Of Markov Models To Capture Criticality In Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John Kelleher

Conference papers

We examine the non-Markovian nature of human mobility by exposing the inability of Markov models to capture criticality in human mobility. In particular, the assumed Markovian nature of mobility was used to establish an upper bound on the predictability of human mobility, based on the temporal entropy. Since its inception, this bound has been widely used for validating the performance of mobility prediction models. We show that the variants of recurrent neural network architectures can achieve significantly higher prediction accuracy surpassing this upper bound. The central objective of our work is to show that human-mobility dynamics exhibit criticality characteristics which …


An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari Jan 2019

An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari

Dissertations

One of the key goals of Pedagogy is to assess learning. Various paradigms exist and one of this is Cognitivism. It essentially sees a human learner as an information processor and the mind as a black box with limited capacity that should be understood and studied. With respect to this, an approach is to employ the construct of cognitive load to assess a learner's experience and in turn design instructions better aligned to the human mind. However, cognitive load assessment is not an easy activity, especially in a traditional classroom setting. This research proposes a novel method for evaluating learning …