Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Using Chatgpt To Generate Gendered Language, Shweta Soundararajan, Manuela Nayantara Jeyaraj, Sarah Jane Delany Mar 2024

Using Chatgpt To Generate Gendered Language, Shweta Soundararajan, Manuela Nayantara Jeyaraj, Sarah Jane Delany

Conference papers

Gendered language is the use of words that denote an individual's gender. This can be explicit where the gender is evident in the actual word used, e.g. mother, she, man, but it can also be implicit where social roles or behaviours can signal an individual's gender - for example, expectations that women display communal traits (e.g., affectionate, caring, gentle) and men display agentic traits (e.g., assertive, competitive, decisive). The use of gendered language in NLP systems can perpetuate gender stereotypes and bias. This paper proposes an approach to generating gendered language datasets using ChatGPT which will provide data for data-driven …


Determining Child Sexual Abuse Posts Based On Artificial Intelligence, Susan Mckeever, Christina Thorpe, Vuong Ngo Jan 2023

Determining Child Sexual Abuse Posts Based On Artificial Intelligence, Susan Mckeever, Christina Thorpe, Vuong Ngo

Conference papers

The volume of child sexual abuse materials (CSAM) created and shared daily both surface web platforms such as Twitter and dark web forums is very high. Based on volume, it is not viable for human experts to intercept or identify CSAM manually. However, automatically detecting and analysing child sexual abusive language in online text is challenging and time-intensive, mostly due to the variety of data formats and privacy constraints of hosting platforms. We propose a CSAM detection intelligence algorithm based on natural language processing and machine learning techniques. Our CSAM detection model is not only used to remove CSAM on …


Monitoring Quality Of Life Indicators At Home From Sparse And Low-Cost Sensor Data., Dympna O'Sullivan, Rilwan Basaru, Simone Stumpf, Neil Maiden Jun 2021

Monitoring Quality Of Life Indicators At Home From Sparse And Low-Cost Sensor Data., Dympna O'Sullivan, Rilwan Basaru, Simone Stumpf, Neil Maiden

Conference papers

Supporting older people, many of whom live with chronic conditions or cognitive and physical impairments, to live independently at home is of increasing importance due to ageing demographics. To aid independent living at home, much effort is being directed at reliably detecting activities from sensor data to monitor people’s quality of life or to enhance self-management of their own health. Current efforts typically leverage smart homes which have large numbers of sensors installed to overcome challenges in the accurate detection of activities. In this work, we report on the results of machine learning models based on data collected with a …


Interactive Learning Approach For Arabic Target-Based Sentiment Analysis, Husamelddin Balla, Marisa Llorens, Sarah Jane Delany Jan 2021

Interactive Learning Approach For Arabic Target-Based Sentiment Analysis, Husamelddin Balla, Marisa Llorens, Sarah Jane Delany

Conference papers

Recently, the majority of sentiment analysis researchers focus on target-based sentiment analysis because it delivers in-depth analysis with more accurate results as compared to traditional sentiment analysis. In this paper, we propose an interactive learning approach to tackle a target-based sentiment analysis task for the Arabic language. The proposed IALSTM model uses an interactive attentionbased mechanism to force the model to focus on different parts (targets) of a sentence. We investigate the ability to use targets, right and left contexts, and model them separately to learn their own representations via interactive modeling. We evaluated our model on two different datasets: …


Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz Dec 2020

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how …


Language-Driven Region Pointer Advancement For Controllable Image Captioning, Annika Lindh, Robert J. Ross, John D. Kelleher Dec 2020

Language-Driven Region Pointer Advancement For Controllable Image Captioning, Annika Lindh, Robert J. Ross, John D. Kelleher

Conference papers

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the …


Language Model Co-Occurrence Linking For Interleaved Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Language Model Co-Occurrence Linking For Interleaved Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

As ubiquitous computer and sensor systems become abundant, the potential for automatic identification and tracking of human behaviours becomes all the more evident. Annotating complex human behaviour datasets to achieve ground truth for supervised training can however be extremely labour-intensive, and error prone. One possible solution to this problem is activity discovery: the identification of activities in an unlabelled dataset by means of an unsupervised algorithm. This paper presents a novel approach to activity discovery that utilises deep learning based language production models to construct a hierarchical, tree-like structure over a sequential vector of sensor events. Our approach differs from …


Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

We propose a new approach to activity discovery, based on the neural language modelling of streaming sensor events. Our approach proceeds in multiple stages: we build binary links between activities using probability distributions generated by a neural language model trained on the dataset, and combine the binary links to produce complex activities. We then use the activities as sensor events, allowing us to build complex hierarchies of activities. We put an emphasis on dealing with interleaving, which represents a major challenge for many existing activity discovery systems. The system is tested on a realistic dataset, demonstrating it as a promising …


Empowering Qualitative Research Methods In Education With Artificial Intelligence, Luca Longo Jan 2020

Empowering Qualitative Research Methods In Education With Artificial Intelligence, Luca Longo

Conference papers

Artificial Intelligence is one of the fastest growing disciplines, disrupting many sectors. Originally mainly for computer scientists and engineers, it has been expanding its horizons and empowering many other disciplines contributing to the development of many novel applications in many sectors. These include medicine and health care, business and finance, psychology and neuroscience, physics and biology to mention a few. However, one of the disciplines in which artificial intelligence has not been fully explored and exploited yet is education. In this discipline, many research methods are employed by scholars, lecturers and practitioners to investigate the impact of different instructional approaches …


Explainable Artificial Intelligence: Concepts, Applications, Research Challenges And Visions, Luca Longo, Randy Goebel, Freddy Lecue, Peter Kieseberg, Andreas Holzinger Jan 2020

Explainable Artificial Intelligence: Concepts, Applications, Research Challenges And Visions, Luca Longo, Randy Goebel, Freddy Lecue, Peter Kieseberg, Andreas Holzinger

Conference papers

The development of theory, frameworks and tools for Explainable AI (XAI) is a very active area of research these days, and articulating any kind of coherence on a vision and challenges is itself a challenge. At least two sometimes complementary and colliding threads have emerged. The first focuses on the development of pragmatic tools for increasing the transparency of automatically learned prediction models, as for instance by deep or reinforcement learning. The second is aimed at anticipating the negative impact of opaque models with the desire to regulate or control impactful consequences of incorrect predictions, especially in sensitive areas like …


Synthesising Tabular Datasets Using Wasserstein Conditional Gans With Gradient Penalty (Wcgan-Gp), Manhar Singh Walia, Brendan Tierney, Susan Mckeever Jan 2020

Synthesising Tabular Datasets Using Wasserstein Conditional Gans With Gradient Penalty (Wcgan-Gp), Manhar Singh Walia, Brendan Tierney, Susan Mckeever

Conference papers

Deep learning based methods based on Generative Adversarial Networks (GANs) have seen remarkable success in data synthesis of images and text. This study investigates the use of GANs for the generation of tabular mixed dataset. We apply Wasserstein Conditional Generative Adversarial Network (WCGAN-GP) to the task of generating tabular synthetic data that is indistinguishable from the real data, without incurring information leakage. The performance of WCGAN-GP is compared against both the ground truth datasets and SMOTE using three labelled real-world datasets from different domains. Our results for WCGAN-GP show that the synthetic data preserves distributions and relationships of the real …


The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …


Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher May 2018

Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

Activity discovery is a challenging machine learning problem where we seek to uncover new or altered behavioural patterns in sensor data. In this paper we motivate and introduce a novel approach to evaluating activity discovery systems. Pre-annotated ground truths, often used to evaluate the performance of such systems on existing datasets, may exist at different levels of abstraction to the output of the output produced by the system. We propose a method for detecting and dealing with this situation, allowing for useful ground truth comparisons. This work has applications for activity discovery, and also for related fields. For example, it …


Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher Jun 2017

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Tackling The Interleaving Problem In Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher Jun 2017

Tackling The Interleaving Problem In Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

Activity discovery (AD) is the unsupervised process of discovering activities in data produced from streaming sensor networks that are recording the actions of human subjects. One major challenge for AD systems is interleaving, the tendency for people to carry out multiple activities at a time a parallel. Following on from our previous work, we continue to investigate AD in interleaved datasets, with a view towards progressing the state-of-the-art for AD.


Presenting A Labelled Dataset For Real-Time Detection Of Abusive User Posts, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2017

Presenting A Labelled Dataset For Real-Time Detection Of Abusive User Posts, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Social media sites facilitate users in posting their own personal comments online. Most support free format user posting, with close to real-time publishing speeds. However, online posts generated by a public user audience carry the risk of containing inappropriate, potentially abusive content. To detect such content, the straightforward approach is to filter against blacklists of profane terms. However, this lexicon filtering approach is prone to problems around word variations and lack of context. Although recent methods inspired by machine learning have boosted detection accuracies, the lack of gold standard labelled datasets limits the development of this approach. In this work, …


Abusive Text Detection Using Neural Networks, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2017

Abusive Text Detection Using Neural Networks, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

eural network models have become increasingly popular for text classification in recent years. In particular, the emergence of word embeddings within deep learning architectures has recently attracted a high level of attention amongst researchers. In this paper, we focus on how neural network models have been applied in text classification. Secondly, we extend our previous work [4, 3] using a neural network strategy for the task of abusive text detection. We compare word embedding features to the traditional feature representations such as n-grams and handcrafted features. In addition, we use an off-the-shelf neural network classifier, FastText[16]. Based on our results, …


Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2016

Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and …


Designing Medical Interactive Systems Via Assessment Of Human Mental Workload, Luca Longo Jan 2015

Designing Medical Interactive Systems Via Assessment Of Human Mental Workload, Luca Longo

Conference papers

In clinical settings, Human-computer systems need to be designed in a way that medical errors are reduced and patient care is enhanced. Inspection methods are usually employed in HCI to assess usability of interactive systems. However, they do not consider the state of the operator while executing a task, the surrounding environment and the task demands. It is argued that assessing performance of operators is fundamental for designing optimal systems with which healthcare can be effectively delivered. The aim of our solution is to assess performance of operators employing the notion of Mental Workload (MWL) this being a construct believed …


Dataset Threshold For The Performance Estimators In Supervised Machine Learning Experiments, Zanifa Omary, Fredrick Mtenzi Nov 2009

Dataset Threshold For The Performance Estimators In Supervised Machine Learning Experiments, Zanifa Omary, Fredrick Mtenzi

Conference papers

The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised …