Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher Nov 2022

A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher

Articles

Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or classification designed to help the reader better understand how models differ and the reasoning behind these differences. In this paper we review the existing literature on primary stroke risk prediction models. From our literature review we identify key similarities and differences in the existing models. We find that models can differ in a number of ways, including the …


Open-Source Clinical Machine Learning Models: Critical Appraisal Of Feasibility, Advantages, And Challenges, Keerthi B. Harish, W. Nicholson Price Ii, Yindalon Aphinyanaphongs Nov 2022

Open-Source Clinical Machine Learning Models: Critical Appraisal Of Feasibility, Advantages, And Challenges, Keerthi B. Harish, W. Nicholson Price Ii, Yindalon Aphinyanaphongs

Articles

Machine learning applications promise to augment clinical capabilities and at least 64 models have already been approved by the US Food and Drug Administration. These tools are developed, shared, and used in an environment in which regulations and market forces remain immature. An important consideration when evaluating this environment is the introduction of open-source solutions in which innovations are freely shared; such solutions have long been a facet of digital culture. We discuss the feasibility and implications of open-source machine learning in a health care infrastructure built upon proprietary information. The decreased cost of development as compared to drugs and …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever Jan 2022

Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever

Articles

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector …


Finding The Needle In A Haystack: On The Automatic Identification Of Accessibility User Reviews, Eman Abdullah Alomar, Wajdi Aljedaani, Murtaza Tamjeed, Mohamed Wiem Mkaouer, Yasime Elglaly May 2021

Finding The Needle In A Haystack: On The Automatic Identification Of Accessibility User Reviews, Eman Abdullah Alomar, Wajdi Aljedaani, Murtaza Tamjeed, Mohamed Wiem Mkaouer, Yasime Elglaly

Articles

In recent years, mobile accessibility has become an important trend with the goal of allowing all users the possibility of using any app without many limitations. User reviews include insights that are useful for app evolution. However, with the increase in the amount of received reviews, manually analyzing them is tedious and time-consuming, especially when searching for accessibility reviews. The goal of this paper is to support the automated identification of accessibility in user reviews, to help technology professionals in prioritizing their handling, and thus, creating more inclusive apps. Particularly, we design a model that takes as input accessibility user …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jun 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest matching clinical …


How We Refactor And How We Document It? On The Use Of Supervised Machine Learning Algorithms To Classify Refactoring Documentation, Eman Abdullah Alomar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Marouane Kessentini, Ali Ouni May 2020

How We Refactor And How We Document It? On The Use Of Supervised Machine Learning Algorithms To Classify Refactoring Documentation, Eman Abdullah Alomar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Marouane Kessentini, Ali Ouni

Articles

Refactoring is the art of improving the structural design of a software system without altering its external behavior. Today, refactoring has become a well-established and disciplined software engineering practice that has attracted a significant amount of research presuming that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design especially with the emerging challenges in contemporary software engineering. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, …


An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff Jan 2020

An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff

Articles

The forecasting of the workload in the maintenance industry is of great value to improve human resources allocation and reduce overwork. In this paper, we discuss the problem and the challenges it pertains. We analyze data from a company operating in the industry and present the results of several forecasting models.


Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret Jan 2020

Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret

Articles

Green communications can be crucial for saving energy in UAVs and enhancing their autonomy. The current work proposes to extract common sequential patterns of communications to gather each common pattern into a single several- fold message with a high-level compression. Since the messages of a pattern are elapsed from each other in time, the current approach performs a machine learning approach for estimating the elapsed times using off-line training. The learned predictive model is applied by each UAV during flight when receiving a several-fold compressed message. We have explored neural networks, linear regression and correlation analyses among others. The current …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jan 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest …


Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin Jan 2019

Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin

Articles

Background and Context: Computer Science attrition rates (in the western world) are very concerning, with a large number of students failing to progress each year. It is well acknowledged that a significant factor of this attrition, is the students’ difficulty to master the introductory programming module, often referred to as CS1.

Objective: The objective of this article is to describe the evolution of a prediction model named PreSS (Predict Student Success) over a 13-year period (2005–2018).

Method: This article ties together, the PreSS prediction model; pilot studies; a longitudinal, multi-institutional re-validation and replication …


Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley Jan 2019

Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley

Articles

This paper examines impressive new applications of legal text analytics in automated contract review, litigation support, conceptual legal information retrieval, and legal question answering against the backdrop of some pressing technological constraints. First, artificial intelligence (Al) programs cannot read legal texts like lawyers can. Using statistical methods, Al can only extract some semantic information from legal texts. For example, it can use the extracted meanings to improve retrieval and ranking, but it cannot yet extract legal rules in logical form from statutory texts. Second, machine learning (ML) may yield answers, but it cannot explain its answers to legal questions or …


Teaching Law And Digital Age Legal Practice With An Ai And Law Seminar: Justice, Lawyering And Legal Education In The Digital Age, Kevin D. Ashley Jan 2013

Teaching Law And Digital Age Legal Practice With An Ai And Law Seminar: Justice, Lawyering And Legal Education In The Digital Age, Kevin D. Ashley

Articles

A seminar on Artificial Intelligence ("Al") and Law can teach law students lessons about legal reasoning and legal practice in the digital age. Al and Law is a subfield of Al/computer science research that focuses on designing computer programs—computational models—that perform legal reasoning. These computational models are used in building tools to assist in legal practice and pedagogy and in studying legal reasoning in order to contribute to cognitive science and jurisprudence. Today, subject to a number of qualifications, computer programs can reason with legal rules, apply legal precedents, and even argue like a legal advocate.

This article provides a …


An Assessment Of Case-Based Reasoning For Spam Filtering, Sarah Jane Delany, Padraig Cunningham, Lorcan Coyle Jan 2005

An Assessment Of Case-Based Reasoning For Spam Filtering, Sarah Jane Delany, Padraig Cunningham, Lorcan Coyle

Articles

Because of the changing nature of spam, a spam filtering system that uses machine learning will need to be dynamic. This suggests that a case-based (memory-based) approach may work well. Case-Based Reasoning (CBR) is a lazy approach to machine learning where induction is delayed to run time. This means that the case base can be updated continuously and new training data is immediately available to the induction process. In this paper we present a detailed description of such a system called ECUE and evaluate design decisions concerning the case representation. We compare its performance with an alternative system that uses …