Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher Nov 2022

A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher

Articles

Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or classification designed to help the reader better understand how models differ and the reasoning behind these differences. In this paper we review the existing literature on primary stroke risk prediction models. From our literature review we identify key similarities and differences in the existing models. We find that models can differ in a number of ways, including the …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever Jan 2022

Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever

Articles

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector …


A Review Of The Fractal Market Hypothesis For Trading And Market Price Prediction, Jonathan Blackledge, Marc Lamphiere Jan 2021

A Review Of The Fractal Market Hypothesis For Trading And Market Price Prediction, Jonathan Blackledge, Marc Lamphiere

Articles

This paper provides a review of the Fractal Market Hypothesis (FMH) focusing on financial times series analysis. In order to put the FMH into a broader perspective, the Random Walk and Efficient Market Hypotheses are considered together with the basic principles of fractal geometry. After exploring the historical developments associated with different financial hypotheses, an overview of the basic mathematical modelling is provided. The principal goal of this paper is to consider the intrinsic scaling properties that are characteristic for each hypothesis. In regard to the FMH, it is explained why a financial time series can be taken to be …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jun 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest matching clinical …


An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff Jan 2020

An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff

Articles

The forecasting of the workload in the maintenance industry is of great value to improve human resources allocation and reduce overwork. In this paper, we discuss the problem and the challenges it pertains. We analyze data from a company operating in the industry and present the results of several forecasting models.


Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret Jan 2020

Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret

Articles

Green communications can be crucial for saving energy in UAVs and enhancing their autonomy. The current work proposes to extract common sequential patterns of communications to gather each common pattern into a single several- fold message with a high-level compression. Since the messages of a pattern are elapsed from each other in time, the current approach performs a machine learning approach for estimating the elapsed times using off-line training. The learned predictive model is applied by each UAV during flight when receiving a several-fold compressed message. We have explored neural networks, linear regression and correlation analyses among others. The current …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jan 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest …


Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin Jan 2019

Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin

Articles

Background and Context: Computer Science attrition rates (in the western world) are very concerning, with a large number of students failing to progress each year. It is well acknowledged that a significant factor of this attrition, is the students’ difficulty to master the introductory programming module, often referred to as CS1.

Objective: The objective of this article is to describe the evolution of a prediction model named PreSS (Predict Student Success) over a 13-year period (2005–2018).

Method: This article ties together, the PreSS prediction model; pilot studies; a longitudinal, multi-institutional re-validation and replication …


An Assessment Of Case-Based Reasoning For Spam Filtering, Sarah Jane Delany, Padraig Cunningham, Lorcan Coyle Jan 2005

An Assessment Of Case-Based Reasoning For Spam Filtering, Sarah Jane Delany, Padraig Cunningham, Lorcan Coyle

Articles

Because of the changing nature of spam, a spam filtering system that uses machine learning will need to be dynamic. This suggests that a case-based (memory-based) approach may work well. Case-Based Reasoning (CBR) is a lazy approach to machine learning where induction is delayed to run time. This means that the case base can be updated continuously and new training data is immediately available to the induction process. In this paper we present a detailed description of such a system called ECUE and evaluate design decisions concerning the case representation. We compare its performance with an alternative system that uses …