Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Physical Sciences and Mathematics

Investigation, Detection And Prevention Of Online Child Sexual Abuse Material: A Comprehensive Survey, Vuong Ngo, Christina Thorpe, Cach N. Dang, Susan Mckeever Dec 2022

Investigation, Detection And Prevention Of Online Child Sexual Abuse Material: A Comprehensive Survey, Vuong Ngo, Christina Thorpe, Cach N. Dang, Susan Mckeever

Conference papers

Child sexual abuse inflicts lifelong devastating consequences for victims and is a growing social concern. In most countries, child sexual abuse material (CSAM) distribution is illegal. As a result, there are many research papers in the literature which proposed technologies to detect and investigate CSAM. In this survey, a comprehensive search of the peer reviewed journal and conference paper databases (including preprints) is conducted to identify high-quality literature. We use the PRISMA methodology to refine our search space to 2,761 papers published by Springer, Elsevier, IEEE and ACM. After iterative reviews of title, abstract and full text for relevance to …


Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany Dec 2022

Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany

Conference Papers

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure …


Machine Learning With Kay, Lasith Niroshan, James Carswell Jun 2022

Machine Learning With Kay, Lasith Niroshan, James Carswell

Conference Papers

Computational power is very important when training Deep Learning (DL) models with large amounts of data (Wooldridge, 2021). Hence, High-Performance Computing (HPC) can be leveraged to reduce computational cost, and the Irish Centre for High-End Computing (ICHEC) provides significant infrastructure and services for research and development to both academia and industry. A portion of ICHEC's HPC system has been allocated for institutional access, and this paper presents a case study of how to use Kay (Ireland's national supercomputer) in the remote sensing domain. Specifically, this study uses clusters of Kay Graphics Processing Units (GPUs) for training DL models to extract …


Pothole Detection Under Diverse Conditions Using Object Detection Models, Ibrahim Hassan Syed, Dympna O'Sullivan, Susan Mckeever May 2021

Pothole Detection Under Diverse Conditions Using Object Detection Models, Ibrahim Hassan Syed, Dympna O'Sullivan, Susan Mckeever

Conference papers

One of the most important tasks in road maintenance is the detection of potholes. This process is usually done through manual visual inspection, where certified engineers assess recorded images of pavements acquired using cameras or professional road assessment vehicles. Machine learning techniques are now being applied to this problem, with models trained to automatically identify road conditions. However, approaching this real-world problem with machine learning techniques presents the classic problem of how to produce generalisable models. Images and videos may be captured in different illumination conditions, with different camera types, camera angles, and resolutions. In this paper, we present our …


Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo Apr 2021

Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo

Conference papers

In this early-stage research, a multidisciplinary approach is presented for the detection of propaganda in the media, and for modeling the spread of propaganda and disinformation using semantic web and graph theory. An ontology will be designed which has the theoretical underpinnings from multiple disciplines including the social sciences and epidemiology. An additional objective of this work is to automate triple extraction from unstructured text which surpasses the state-of-the-art performance.


K-Nearest Neighbour Classifiers - A Tutorial, Padraig Cunningham, Sarah Jane Delany Jan 2021

K-Nearest Neighbour Classifiers - A Tutorial, Padraig Cunningham, Sarah Jane Delany

Conference papers

Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of …


Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao Dec 2020

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao

Articles

It is often the case with new technologies that it is very hard to predict their long-term impacts and as a result, although new technology may be beneficial in the short term, it can still cause problems in the longer term. This is what happened with oil by-products in different areas: the use of plastic as a disposable material did not take into account the hundreds of years necessary for its decomposition and its related long-term environmental damage. Data is said to be the new oil. The message to be conveyed is associated with its intrinsic value. But as in …


Multimodal Fusion Strategies For Outcome Prediction In Stroke, Esra Zihni, John D. Kelleher, Vince I. Madai, Ahmed Khalil, Ivana Galinovic, Jochen Fiebach, Michelle Livne, Dietmar Frey Jan 2020

Multimodal Fusion Strategies For Outcome Prediction In Stroke, Esra Zihni, John D. Kelleher, Vince I. Madai, Ahmed Khalil, Ivana Galinovic, Jochen Fiebach, Michelle Livne, Dietmar Frey

Conference papers

Data driven methods are increasingly being adopted in the medical domain for clinical predictive modeling. Prediction of stroke outcome using machine learning could provide a decision support system for physicians to assist them in patient-oriented diagnosis and treatment. While patient-specific clinical parameters play an important role in outcome prediction, a multimodal fusion approach that integrates neuroimaging with clinical data has the potential to improve accuracy. This paper addresses two research questions: (a) does multimodal fusion aid in the prediction of stroke outcome, and (b) what fusion strategy is more suitable for the task at hand. The baselines for our experimental …


Exploring Composite Dataset Biases For Heart Sound Classification, Davoud Shariat Panah, Andrew Hines, Susan Mckeever Jan 2020

Exploring Composite Dataset Biases For Heart Sound Classification, Davoud Shariat Panah, Andrew Hines, Susan Mckeever

Conference papers

In the last few years, the automatic classification of heart sounds has been widely studied as a screening method for heart disease. Some of these studies have achieved high accuracies in heart abnormality prediction. However, for such models to assist clinicians in the detection of heart abnormalities, it is of critical importance that they are generalisable, working on unseen real-world data. Despite the importance of generalisability, the presence of bias in the leading heart sound datasets used in these studies has remained unexplored. In this paper, we explore the presence of potential bias in heart sound datasets. Using a small …


Brexit: Psychometric Profiling The Political Salubrious Through Machine Learning: Predicting Personality Traits Of Boris Johnson Through Twitter Political Text, James Usher, Pierpaolo Dondio Jan 2020

Brexit: Psychometric Profiling The Political Salubrious Through Machine Learning: Predicting Personality Traits Of Boris Johnson Through Twitter Political Text, James Usher, Pierpaolo Dondio

Conference papers

Whilst the CIA have been using psychometric profiling for decades, Cambridge Analytica showed that people's psychological characteristics can be accurately predicted from their digital footprints, such as their Facebook or Twitter accounts. To exploit this form of psychological assessment from digital footprints, we propose machine learning methods for assessing political personality from Twitter. We have extracted the tweet content of Prime Minster Boris Johnson’s Twitter account and built three predictive personality models based on his Twitter political content. We use a Multi-Layer Perceptron Neural network, a Naive Bayes multinomial model and a Support Machine Vector model to predict the OCEAN …


An Examination Of The Smote And Other Smote-Based Techniques That Use Synthetic Data To Oversample The Minority Class In The Context Of Credit-Card Fraud Classification, Eduardo Parkinson De Castro Jan 2020

An Examination Of The Smote And Other Smote-Based Techniques That Use Synthetic Data To Oversample The Minority Class In The Context Of Credit-Card Fraud Classification, Eduardo Parkinson De Castro

Dissertations

This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope …


Machine Learning Assisted Gait Analysis For The Determination Of Handedness In Able-Bodied People, Hugh Gallagher Jan 2020

Machine Learning Assisted Gait Analysis For The Determination Of Handedness In Able-Bodied People, Hugh Gallagher

Dissertations

This study has investigated the potential application of machine learning for video analysis, with a view to creating a system which can determine a person’s hand laterality (handedness) from the way that they walk (their gait). To this end, the convolutional neural network model VGG16 underwent transfer learning in order to classify videos under two ‘activities’: “walking left-handed” and “walking right-handed”. This saw varying degrees of success across five transfer learning trained models: Everything – the entire dataset; FiftyFifty – the dataset with enough right-handed samples removed to produce a set with parity between activities; Female – only the female …


Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira Dec 2019

Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira

Dissertations

Cardiovascular disease (CVD) is the most common cause of death in Ireland, and probably, worldwide. According to the Health Service Executive (HSE) cardiovascular disease accounting for 36% of all deaths, and one important fact, 22% of premature deaths (under age 65) are from CVD.

Using data from the Heart Disease UCI Data Set (UCI Machine Learning), we use machine learning techniques to detect the presence or absence of heart disease in the patient according to 14 features provide for this dataset. The different results are compared based on accuracy performance, confusion matrix and area under the Receiver Operating Characteristics (ROC) …


Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira Dec 2019

Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira

Dissertations

In the previous projects, it has been worked to statistically analysis of the factors to impact the score of the subjects of Mathematics and Portuguese for several groups of the student from secondary school from Portugal.

In this project will be interested in finding a model, hypothetically multiple linear regression, to predict the final score, dependent variable G3, of the student according to some features divide into two groups. One group, analyses the features or predictors which impact in the final score more related to the performance of the students, means variables like study time or past failures. The second …


Predicting Violent Crime Reports From Geospatial And Temporal Attributes Of Us 911 Emergency Call Data, Vincent Corcoran Jan 2019

Predicting Violent Crime Reports From Geospatial And Temporal Attributes Of Us 911 Emergency Call Data, Vincent Corcoran

Dissertations

The aim of this study is to create a model to predict which 911 calls will result in crime reports of a violent nature. Such a prediction model could be used by the police to prioritise calls which are most likely to lead to violent crime reports. The model will use geospatial and temporal attributes of the call to predict whether a crime report will be generated. To create this model, a dataset of characteristics relating to the neighbourhood where the 911 call originated will be created and combined with characteristics related to the time of the 911 call. Geospatial …


Performance Comparison Of Hybrid Cnn-Svm And Cnn-Xgboost Models In Concrete Crack Detection, Sahana Thiyagarajan Jan 2019

Performance Comparison Of Hybrid Cnn-Svm And Cnn-Xgboost Models In Concrete Crack Detection, Sahana Thiyagarajan

Dissertations

Detection of cracks mainly has been a sort of essential step in visual inspection involved in construction engineering as it is the commonly used building material and cracks in them is an early sign of de-basement. It is hard to find cracks by a visual check for the massive structures. So, the development of crack detecting systems generally has been a critical issue. The utilization of contextual image processing in crack detection is constrained, as image data usually taken under real-world situations vary widely and also includes the complex modelling of cracks and the extraction of handcrafted features. Therefore the …


Mind The Gap: Situated Spatial Language A Case-Study In Connecting Perception And Language, John D. Kelleher Jun 2018

Mind The Gap: Situated Spatial Language A Case-Study In Connecting Perception And Language, John D. Kelleher

Other

This abstract reviews the literature on computational models of spatial semantics and the potential of deep learning models as an useful approach to this challenge.


An Investigation Into The Effects Of Multiple Kernel Combinations On Solutions Spaces In Support Vector Machines, Paul Kelly, Luca Longo May 2018

An Investigation Into The Effects Of Multiple Kernel Combinations On Solutions Spaces In Support Vector Machines, Paul Kelly, Luca Longo

Conference papers

The use of Multiple Kernel Learning (MKL) for Support Vector Machines (SVM) in Machine Learning tasks is a growing field of study. MKL kernels expand on traditional base kernels that are used to improve performance on non-linearly separable datasets. Multiple kernels use combinations of those base kernels to develop novel kernel shapes that allow for more diversity in the generated solution spaces. Customising these kernels to the dataset is still mostly a process of trial and error. Guidelines around what combinations to implement are lacking and usually they requires domain specific knowledge and understanding of the data. Through a brute …


Non-Linear Machine Learning With Active Sampling For Mox Drift Compensation, Tamara Matthews, Muhammad Iqbal, Horacio Gonzalez-Velez Jan 2018

Non-Linear Machine Learning With Active Sampling For Mox Drift Compensation, Tamara Matthews, Muhammad Iqbal, Horacio Gonzalez-Velez

Conference papers

Abstract—Metal oxide (MOX) gas detectors based on SnO2 provide low-cost solutions for real-time sensing of complex gas mixtures for indoor ambient monitoring. With high sensitivity under ideal conditions, MOX detectors may have poor longterm response accuracy due to environmental factors (humidity and temperature) along with sensor aging, leading to calibration drifts. Finding a simple and efficient solution to correct such calibration drifts has been the subject of numerous studies but remains an open problem. In this work, we present an efficient approach to MOX calibration using active and transfer sampling techniques coupled with non-linear machine learning algorithms, namely neural networks, …


Generating Diverse And Meaningful Captions: Unsupervised Specificity Optimization For Image Captioning, Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher Jan 2018

Generating Diverse And Meaningful Captions: Unsupervised Specificity Optimization For Image Captioning, Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher

Conference papers

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty.

We make our …


Modular Mechanistic Networks: On Bridging Mechanistic And Phenomenological Models With Deep Neural Networks In Natural Language Processing, Simon Dobnik, John D. Kelleher Nov 2017

Modular Mechanistic Networks: On Bridging Mechanistic And Phenomenological Models With Deep Neural Networks In Natural Language Processing, Simon Dobnik, John D. Kelleher

Books/Book chapters

Natural language processing (NLP) can be done using either top-down (theory driven) and bottom-up (data driven) approaches, which we call mechanistic and phenomenological respectively. The approaches are frequently considered to stand in opposition to each other. Examining some recent approaches in deep learning we argue that deep neural networks incorporate both perspectives and, furthermore, that leveraging this aspect of deep learning may help in solving complex problems within language technology, such as modelling language and perception in the domain of spatial cognition.


Rating By Ranking: An Improved Scale For Judgement-Based Labels, Jack O'Neill, Sarah Jane Delany, Brian Mac Namee Aug 2017

Rating By Ranking: An Improved Scale For Judgement-Based Labels, Jack O'Neill, Sarah Jane Delany, Brian Mac Namee

Conference papers

Labels representing value judgements are commonly elicited using an interval scale of absolute values. Data collected in such a manner is not always reliable. Psychologists have long recognized a number of biases to which many human raters are prone, and which result in disagreement among raters as to the true gold standard rating of any particular object. We hypothesize that the issues arising from rater bias may be mitigated by treating the data received as an ordered set of preferences rather than a collection of absolute values. We experiment on real-world and artificially generated data, finding that treating label ratings …


Investigating The Impact Of Unsupervised Feature-Extraction From Multi-Wavelength Image Data For Photometric Classification Of Stars, Galaxies And Qsos, Annika Lindh Dec 2016

Investigating The Impact Of Unsupervised Feature-Extraction From Multi-Wavelength Image Data For Photometric Classification Of Stars, Galaxies And Qsos, Annika Lindh

Conference papers

Accurate classification of astronomical objects currently relies on spectroscopic data. Acquiring this data is time-consuming and expensive compared to photometric data. Hence, improving the accuracy of photometric classification could lead to far better coverage and faster classification pipelines. This paper investigates the benefit of using unsupervised feature-extraction from multi-wavelength image data for photometric classification of stars, galaxies and QSOs. An unsupervised Deep Belief Network is used, giving the model a higher level of interpretability thanks to its generative nature and layer-wise training. A Random Forest classifier is used to measure the contribution of the novel features compared to a set …


Fundamentals Of Machine Learning For Neural Machine Translation, John D. Kelleher Oct 2016

Fundamentals Of Machine Learning For Neural Machine Translation, John D. Kelleher

Conference papers

This paper presents a short introduction to neural networks and how they are used for machine translation and concludes with some discussion on the current research challenges being addressed by neural machine translation (NMT) research. The primary goal of this paper is to give a no-tears introduction to NMT to readers that do not have a computer science or mathematical background. The secondary goal is to provide the reader with a deep enough understanding of NMT that they can appreciate the strengths of weaknesses of the technology. The paper starts with a brief introduction to standard feed-forward neural networks (what …


Idiom Token Classification Using Sentential Distributed Semantics, Giancarlo Salton, Robert J. Ross, John D. Kelleher Jan 2016

Idiom Token Classification Using Sentential Distributed Semantics, Giancarlo Salton, Robert J. Ross, John D. Kelleher

Conference papers

Idiom token classification is the task of deciding for a set of potentially idiomatic phrases whether each occurrence of a phrase is a literal or idiomatic usage of the phrase. In this work we explore the use of Skip-Thought Vectors to create distributed representations that encode features that are predictive with respect to idiom token classification. We show that classifiers using these representations have competitive performance compared with the state of the art in idiom token classification. Importantly, however, our models use only the sentence containing the tar- get phrase as input and are thus less dependent on a potentially …