Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

Update Frequency And Background Corpus Selection In Dynamic Tf-Idf Models For First Story Detection, Fei Wang, Robert J. Ross, John D. Kelleher Oct 2019

Update Frequency And Background Corpus Selection In Dynamic Tf-Idf Models For First Story Detection, Fei Wang, Robert J. Ross, John D. Kelleher

Conference papers

First Story Detection (FSD) requires a system to detect the very first story that mentions an event from a stream of stories. Nearest neighbour-based models, using the traditional term vector document representations like TF-IDF, currently achieve the state of the art in FSD. Because of its online nature, a dynamic term vector model that is incrementally updated during the detection process is usually adopted for FSD instead of a static model. However, very little research has investigated the selection of hyper-parameters and the background corpora for a dynamic model. In this paper, we analyse how a dynamic term vector model …


Capturing Dialogue State Variable Dependencies With An Energy-Based Neural Dialogue State Tracker, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Sep 2019

Capturing Dialogue State Variable Dependencies With An Energy-Based Neural Dialogue State Tracker, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

Dialogue state tracking requires the population and maintenance of a multi-slot frame representation of the dialogue state. Frequently, dialogue state tracking systems assume independence between slot values within a frame. In this paper we argue that treating the prediction of each slot value as an independent prediction task may ignore important associations between the slot values, and, consequently, we argue that treating dialogue state tracking as a structured prediction problem can help to improve dialogue state tracking performance. To support this argument, the research presented in this paper is structured into three stages: (i) analyzing variable dependencies in dialogue data; …


Investigating Variable Dependencies In Dialogue States, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Sep 2019

Investigating Variable Dependencies In Dialogue States, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

Dialogue State Tracking is arguably one of the most challenging tasks among dialogue processing problems due to the uncertainties of language and complexity of dialogue contexts. We argue that this problem is made more challenging by variable dependencies in the dialogue states that must be accounted for in processing. In this paper we give details on our motivation for this argument through statistical tests on a number of dialogue datasets. We also propose a machine learning-based approach called energy-based learning that tackles variable dependencies while performing prediction on the dialogue state tracking tasks.


Bigger Versus Similar: Selecting A Background Corpus For First Story Detection Based On Distributional Similarity, Fei Wang, Robert J. Ross, John D. Kelleher Sep 2019

Bigger Versus Similar: Selecting A Background Corpus For First Story Detection Based On Distributional Similarity, Fei Wang, Robert J. Ross, John D. Kelleher

Conference papers

The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors cannot always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale …


Estimating Distributed Representation Performance In Disaster-Related Social Media Classification, Pallavi Jain, Robert J. Ross, Bianca Schoen-Phelan Sep 2019

Estimating Distributed Representation Performance In Disaster-Related Social Media Classification, Pallavi Jain, Robert J. Ross, Bianca Schoen-Phelan

Conference papers

This paper examines the effectiveness of a range of pre-trained language representations in order to determine the informativeness and information type of social media in the event of natural or man-made disasters. Within the context of disaster tweet analysis, we aim to accurately analyse tweets while minimising both false positive and false negatives in the automated information analysis. The investigation is performed across a number of well known disaster-related twitter datasets. Models that are built from pre-trained word embeddings from Word2Vec, GloVe, ELMo and BERT are used for performance evaluation. Given the relative ubiquity of BERT as a standout language …


Energy-Based Modelling For Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher Aug 2019

Energy-Based Modelling For Dialogue State Tracking, Anh Duong Trinh, Robert J. Ross, John D. Kelleher

Conference papers

The uncertainties of language and the complexity of dialogue contexts make accurate dialogue state tracking one of the more challenging aspects of dialogue processing. To improve state tracking quality, we argue that relationships between different aspects of dialogue state must be taken into account as they can often guide a more accurate interpretation process. To this end, we present an energy-based approach to dialogue state tracking as a structured classification task. The novelty of our approach lies in the use of an energy network on top of a deep learning architecture to explore more signal correlations between network variables including …


Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher Jul 2019

Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find …


Audio Mixing Using Image Neural Style Transfer Networks, Susan Mckeever, Xuehao Liu, Sarah Jane Delany Jan 2019

Audio Mixing Using Image Neural Style Transfer Networks, Susan Mckeever, Xuehao Liu, Sarah Jane Delany

Conference papers

Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare …


The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …


Multi-Spectral Visual Crop Assessment Under Limited Data Constraints, Patricia O'Byrne, Patrick Jackman, Damon Berry, Hector-Hugo Franco-Penya, Michael French, Robert J. Ross Jan 2019

Multi-Spectral Visual Crop Assessment Under Limited Data Constraints, Patricia O'Byrne, Patrick Jackman, Damon Berry, Hector-Hugo Franco-Penya, Michael French, Robert J. Ross

Conference papers

In an era of climate change and global population growth, deep learning based multi-spectral imaging has the potential to significantly assist in production management across a wide range of agricultural and food production domains. A key challenge however in applying state-of-the-art methods is that they, unlike classical hand crafted methods, are usually thought of as being only useful when significant amounts of data are available. In this paper we investigate this hypothesis by examining the performance of state-of-the-art deep learning methods when applied to a restricted data set that is not easily bootstrapped through pre-trained image processing networks. We demonstrate …


On The Inability Of Markov Models To Capture Criticality In Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John Kelleher Jan 2019

On The Inability Of Markov Models To Capture Criticality In Human Mobility, Vaibhav Klukarni, Abhijit Mahalunkar, Benoit Garbinato, John Kelleher

Conference papers

We examine the non-Markovian nature of human mobility by exposing the inability of Markov models to capture criticality in human mobility. In particular, the assumed Markovian nature of mobility was used to establish an upper bound on the predictability of human mobility, based on the temporal entropy. Since its inception, this bound has been widely used for validating the performance of mobility prediction models. We show that the variants of recurrent neural network architectures can achieve significantly higher prediction accuracy surpassing this upper bound. The central objective of our work is to show that human-mobility dynamics exhibit criticality characteristics which …