Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 38

Full-Text Articles in Physical Sciences and Mathematics

Using Chatgpt To Generate Gendered Language, Shweta Soundararajan, Manuela Nayantara Jeyaraj, Sarah Jane Delany Mar 2024

Using Chatgpt To Generate Gendered Language, Shweta Soundararajan, Manuela Nayantara Jeyaraj, Sarah Jane Delany

Conference papers

Gendered language is the use of words that denote an individual's gender. This can be explicit where the gender is evident in the actual word used, e.g. mother, she, man, but it can also be implicit where social roles or behaviours can signal an individual's gender - for example, expectations that women display communal traits (e.g., affectionate, caring, gentle) and men display agentic traits (e.g., assertive, competitive, decisive). The use of gendered language in NLP systems can perpetuate gender stereotypes and bias. This paper proposes an approach to generating gendered language datasets using ChatGPT which will provide data for data-driven …


The European Commission And Ai: Guidelines, Acts And Plans Impacting The Teaching Of Ai And Teaching With Ai, Keith Quille, Brett A. Becker, Lidia Vidal-Meliá Jan 2023

The European Commission And Ai: Guidelines, Acts And Plans Impacting The Teaching Of Ai And Teaching With Ai, Keith Quille, Brett A. Becker, Lidia Vidal-Meliá

Academic Posters Collection

Recent developments, guidelines, and acts by the European Commission have started to frame policy for AI and related areas such as ML and data, not only for the broader community, but in the context of education specifically. This poster presents a succinct overview of these developments. Specifically, we look to bring together all publications that might impact the teaching of AI (for example, teacher expectations in the coming years around AI competencies) and publications that affect the use of AI in the classroom. We mean using tools and systems that incorporate both ‘Good Old Fashioned’ AI and those that can …


Determining Child Sexual Abuse Posts Based On Artificial Intelligence, Susan Mckeever, Christina Thorpe, Vuong Ngo Jan 2023

Determining Child Sexual Abuse Posts Based On Artificial Intelligence, Susan Mckeever, Christina Thorpe, Vuong Ngo

Conference papers

The volume of child sexual abuse materials (CSAM) created and shared daily both surface web platforms such as Twitter and dark web forums is very high. Based on volume, it is not viable for human experts to intercept or identify CSAM manually. However, automatically detecting and analysing child sexual abusive language in online text is challenging and time-intensive, mostly due to the variety of data formats and privacy constraints of hosting platforms. We propose a CSAM detection intelligence algorithm based on natural language processing and machine learning techniques. Our CSAM detection model is not only used to remove CSAM on …


Dataset For Gendered Language, Shweta Soundararajan Jan 2023

Dataset For Gendered Language, Shweta Soundararajan

Datasets

Gendered language is the use of words that denote an individual’s gender. This can be explicit where the gender is evident in the actual word used, e.g. mother, she, man, but it can also be implicit where social roles or behaviours can signal an individual’s gender - for example, expectations that women display communal traits (e.g., affectionate, caring, gentle) and men display agentic traits (e.g., assertive, competitive, decisive). The use of gendered language in NLP systems can perpetuate gender stereotypes and bias. This paper proposes an approach to generating gendered language datasets using ChatGPT which will provide data for data-driven …


The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher Nov 2022

A Review Of Risk Concepts And Models For Predicting The Risk Of Primary Stroke, Elizabeth Hunter, John D. Kelleher

Articles

Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or classification designed to help the reader better understand how models differ and the reasoning behind these differences. In this paper we review the existing literature on primary stroke risk prediction models. From our literature review we identify key similarities and differences in the existing models. We find that models can differ in a number of ways, including the …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever Jan 2022

Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever

Articles

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector …


Monitoring Quality Of Life Indicators At Home From Sparse And Low-Cost Sensor Data., Dympna O'Sullivan, Rilwan Basaru, Simone Stumpf, Neil Maiden Jun 2021

Monitoring Quality Of Life Indicators At Home From Sparse And Low-Cost Sensor Data., Dympna O'Sullivan, Rilwan Basaru, Simone Stumpf, Neil Maiden

Conference papers

Supporting older people, many of whom live with chronic conditions or cognitive and physical impairments, to live independently at home is of increasing importance due to ageing demographics. To aid independent living at home, much effort is being directed at reliably detecting activities from sensor data to monitor people’s quality of life or to enhance self-management of their own health. Current efforts typically leverage smart homes which have large numbers of sensors installed to overcome challenges in the accurate detection of activities. In this work, we report on the results of machine learning models based on data collected with a …


Interactive Learning Approach For Arabic Target-Based Sentiment Analysis, Husamelddin Balla, Marisa Llorens, Sarah Jane Delany Jan 2021

Interactive Learning Approach For Arabic Target-Based Sentiment Analysis, Husamelddin Balla, Marisa Llorens, Sarah Jane Delany

Conference papers

Recently, the majority of sentiment analysis researchers focus on target-based sentiment analysis because it delivers in-depth analysis with more accurate results as compared to traditional sentiment analysis. In this paper, we propose an interactive learning approach to tackle a target-based sentiment analysis task for the Arabic language. The proposed IALSTM model uses an interactive attentionbased mechanism to force the model to focus on different parts (targets) of a sentence. We investigate the ability to use targets, right and left contexts, and model them separately to learn their own representations via interactive modeling. We evaluated our model on two different datasets: …


A Review Of The Fractal Market Hypothesis For Trading And Market Price Prediction, Jonathan Blackledge, Marc Lamphiere Jan 2021

A Review Of The Fractal Market Hypothesis For Trading And Market Price Prediction, Jonathan Blackledge, Marc Lamphiere

Articles

This paper provides a review of the Fractal Market Hypothesis (FMH) focusing on financial times series analysis. In order to put the FMH into a broader perspective, the Random Walk and Efficient Market Hypotheses are considered together with the basic principles of fractal geometry. After exploring the historical developments associated with different financial hypotheses, an overview of the basic mathematical modelling is provided. The principal goal of this paper is to consider the intrinsic scaling properties that are characteristic for each hypothesis. In regard to the FMH, it is explained why a financial time series can be taken to be …


Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz Dec 2020

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how …


Language-Driven Region Pointer Advancement For Controllable Image Captioning, Annika Lindh, Robert J. Ross, John D. Kelleher Dec 2020

Language-Driven Region Pointer Advancement For Controllable Image Captioning, Annika Lindh, Robert J. Ross, John D. Kelleher

Conference papers

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the …


Critical Media, Information, And Digital Literacy: Increasing Understanding Of Machine Learning Through An Interdisciplinary Undergraduate Course, Barbara R. Burke, Elena Machkasova Jul 2020

Critical Media, Information, And Digital Literacy: Increasing Understanding Of Machine Learning Through An Interdisciplinary Undergraduate Course, Barbara R. Burke, Elena Machkasova

Irish Communication Review

Widespread use of Artificial Intelligence in all areas of today’s society creates a unique problem: algorithms used in decision-making are generally not understandable to those without a background in data science. Thus, those who use out-of-the-box Machine Learning (ML) approaches in their work and those affected by these approaches are often not in a position to analyze their outcomes and applicability.

Our paper describes and evaluates our undergraduate course at the University of Minnesota Morris, which fosters understanding of the main ideas behind ML. With Communication, Media & Rhetoric and Computer Science faculty expertise, students from a variety of majors, …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jun 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest matching clinical …


Language Model Co-Occurrence Linking For Interleaved Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Language Model Co-Occurrence Linking For Interleaved Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

As ubiquitous computer and sensor systems become abundant, the potential for automatic identification and tracking of human behaviours becomes all the more evident. Annotating complex human behaviour datasets to achieve ground truth for supervised training can however be extremely labour-intensive, and error prone. One possible solution to this problem is activity discovery: the identification of activities in an unlabelled dataset by means of an unsupervised algorithm. This paper presents a novel approach to activity discovery that utilises deep learning based language production models to construct a hierarchical, tree-like structure over a sequential vector of sensor events. Our approach differs from …


Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

We propose a new approach to activity discovery, based on the neural language modelling of streaming sensor events. Our approach proceeds in multiple stages: we build binary links between activities using probability distributions generated by a neural language model trained on the dataset, and combine the binary links to produce complex activities. We then use the activities as sensor events, allowing us to build complex hierarchies of activities. We put an emphasis on dealing with interleaving, which represents a major challenge for many existing activity discovery systems. The system is tested on a realistic dataset, demonstrating it as a promising …


Empowering Qualitative Research Methods In Education With Artificial Intelligence, Luca Longo Jan 2020

Empowering Qualitative Research Methods In Education With Artificial Intelligence, Luca Longo

Conference papers

Artificial Intelligence is one of the fastest growing disciplines, disrupting many sectors. Originally mainly for computer scientists and engineers, it has been expanding its horizons and empowering many other disciplines contributing to the development of many novel applications in many sectors. These include medicine and health care, business and finance, psychology and neuroscience, physics and biology to mention a few. However, one of the disciplines in which artificial intelligence has not been fully explored and exploited yet is education. In this discipline, many research methods are employed by scholars, lecturers and practitioners to investigate the impact of different instructional approaches …


Explainable Artificial Intelligence: Concepts, Applications, Research Challenges And Visions, Luca Longo, Randy Goebel, Freddy Lecue, Peter Kieseberg, Andreas Holzinger Jan 2020

Explainable Artificial Intelligence: Concepts, Applications, Research Challenges And Visions, Luca Longo, Randy Goebel, Freddy Lecue, Peter Kieseberg, Andreas Holzinger

Conference papers

The development of theory, frameworks and tools for Explainable AI (XAI) is a very active area of research these days, and articulating any kind of coherence on a vision and challenges is itself a challenge. At least two sometimes complementary and colliding threads have emerged. The first focuses on the development of pragmatic tools for increasing the transparency of automatically learned prediction models, as for instance by deep or reinforcement learning. The second is aimed at anticipating the negative impact of opaque models with the desire to regulate or control impactful consequences of incorrect predictions, especially in sensitive areas like …


Singlet Oxygen Generation By Porphyrins And Metalloporphyrins Revisited: A Quantitative Structure-Property Relationship (Qspr) Study, Andrey A. Buglak, Mikhail Filatov, Althaf M. Hussain, Manabu Sugimoto Jan 2020

Singlet Oxygen Generation By Porphyrins And Metalloporphyrins Revisited: A Quantitative Structure-Property Relationship (Qspr) Study, Andrey A. Buglak, Mikhail Filatov, Althaf M. Hussain, Manabu Sugimoto

Books/Book chapters

state followed by formation of singlet oxygen (1O2), which is a highly reactive species and mediates various oxidative processes. The design of advanced sensitizers based on porphyrin compounds have attracted significant attention in recent years. However, it is still difficult to predict the efficiency of singlet oxygen generation for a given structure. Our goal was to develop a quantitative structure-property relationship (QSPR) model for the fast virtual screening and prediction of singlet oxygen quantum yields for pophyrins and metalloporphyrins. We performed QSPR analysis of a dataset containing 32 compounds, including various porphyrins and their analogues (chlorins and bacteriochlorins). Quantum-chemical descriptors …


An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff Jan 2020

An Univariable Approach For Forecasting Workload In The Maintenance Industry, Paulo Silva, Fernando Pérez Téllez, John Cardiff

Articles

The forecasting of the workload in the maintenance industry is of great value to improve human resources allocation and reduce overwork. In this paper, we discuss the problem and the challenges it pertains. We analyze data from a company operating in the industry and present the results of several forecasting models.


Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret Jan 2020

Smart Green Communication Protocols Based On Several-Fold Messages Extracted From Common Sequential Patterns In Uavs, Iván García-Magariño, Geraldine Gray, Raquel Lacuesta, Jaime Lloret

Articles

Green communications can be crucial for saving energy in UAVs and enhancing their autonomy. The current work proposes to extract common sequential patterns of communications to gather each common pattern into a single several- fold message with a high-level compression. Since the messages of a pattern are elapsed from each other in time, the current approach performs a machine learning approach for estimating the elapsed times using off-line training. The learned predictive model is applied by each UAV during flight when receiving a several-fold compressed message. We have explored neural networks, linear regression and correlation analyses among others. The current …


Synthesising Tabular Datasets Using Wasserstein Conditional Gans With Gradient Penalty (Wcgan-Gp), Manhar Singh Walia, Brendan Tierney, Susan Mckeever Jan 2020

Synthesising Tabular Datasets Using Wasserstein Conditional Gans With Gradient Penalty (Wcgan-Gp), Manhar Singh Walia, Brendan Tierney, Susan Mckeever

Conference papers

Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real …


Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird Jan 2020

Modulation Of Medical Condition Likelihood By Patient History Similarity, Jonathan Turner, Dympna O'Sullivan, Jon Bird

Articles

Introduction: We describe an analysis that modulates the simple population prevalence derived likelihood of a particular condition occurring in an individual by matching the individual with other individuals with similar clinical histories and determining the prevalence of the condition within the matched group.

Methods: We have taken clinical event codes and dates from anonymised longitudinal primary care records for 25,979 patients with 749,053 recorded clinical events. Using a nearest neighbour approach, for each patient, the likelihood of a condition occurring was adjusted from the population prevalence to the prevalence of the condition within those patients with the closest …


The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …


Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li Jan 2019

Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li

Dissertations

The advent of big data leads to many applications of Machine Learning techniques. University rankings is one of the applicable domains, which is currently playing a crucial role in the assessment of the universities' performance. Currently, the rankings are usually carried out by some authoritative ranking institutions by means of weighting techniques and the results are conveyed in numerical rankings. Three of the most famous university ranking institutions have been introduced from a technical perspective. However, these institutions have been proven to be subjective in relation to their data selection and weighting method.


Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin Jan 2019

Cs1: How Will They Do? How Can We Help? A Decade Of Research And Practice, Keith Quille, Susan Bergin

Articles

Background and Context: Computer Science attrition rates (in the western world) are very concerning, with a large number of students failing to progress each year. It is well acknowledged that a significant factor of this attrition, is the students’ difficulty to master the introductory programming module, often referred to as CS1.

Objective: The objective of this article is to describe the evolution of a prediction model named PreSS (Predict Student Success) over a 13-year period (2005–2018).

Method: This article ties together, the PreSS prediction model; pilot studies; a longitudinal, multi-institutional re-validation and replication …


Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher May 2018

Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

Activity discovery is a challenging machine learning problem where we seek to uncover new or altered behavioural patterns in sensor data. In this paper we motivate and introduce a novel approach to evaluating activity discovery systems. Pre-annotated ground truths, often used to evaluate the performance of such systems on existing datasets, may exist at different levels of abstraction to the output of the output produced by the system. We propose a method for detecting and dealing with this situation, allowing for useful ground truth comparisons. This work has applications for activity discovery, and also for related fields. For example, it …


Assesing Completeness Of Solvency And Financial Condition Reports Through The Use Of Machine Learning And Text Classification, Ruairí Nugent Jan 2018

Assesing Completeness Of Solvency And Financial Condition Reports Through The Use Of Machine Learning And Text Classification, Ruairí Nugent

Dissertations

Text mining is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. It is a valuable support tool for organisations. It enables a greater understanding and identification of relevant business insights from text. Critically it identifies connections between information within texts that would otherwise go unnoticed. Its application is prevalent in areas such as marketing and political science however, until recently it has been largely overlooked within economics. Central banks are beginning to investigate the benefits of machine learning, sentiment analysis and natural language processing in light of the large …


Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher Jun 2017

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.