Physical Sciences and Mathematics | Open Access Articles

Offenseval 2023: Offensive Language Identification In The Age Of Large Language Models, Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe

Natural Language Processing Faculty Publications

The OffensEval shared tasks organized as part of SemEval-2019-2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance …

Go to article

Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik

Computer Vision Faculty Publications

Recent statistics and studies show that the loss generated by insider threats is much higher than that generated by external attacks. More and more organizations are investing in or purchasing insider threat detection systems to prevent insider risks. However, the accurate and timely detection of insider threats faces significant challenges. In this study, we proposed an intelligent insider threat detection framework based on Digital Twins and self-attentions based deep learning models. First, this paper introduces insider threats and the challenges in detecting them. Then this paper presents recent related works on solving insider threat detection problems and their limitations. Next, …

Go to article

Disease Progression Modelling Of Alzheimer's Disease Using Probabilistic Principal Components Analysis, Martin Saint-Jalmes, Victor Fedyashov, Daniel Beck, Timothy Baldwin, Noel G. Faux, Pierrick Bourgeat, Jurgen Fripp, Colin L. Masters, Benjamin Goudey

Natural Language Processing Faculty Publications

The recent biological redefinition of Alzheimer's Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the …

Go to article

A Unified Optimization Framework Of Ann-Snn Conversion: Towards Optimal Mapping From Activation Values To Firing Rates, Haiyan Jiang, Srinivas Anumasa, Giulia De Masi, Huan Xiong, Bin Gu

Machine Learning Faculty Publications

Spiking Neural Networks (SNNs) have gained significant attention for their energy-efficient and fast-inference capabilities, but training SNNs from scratch can be challenging due to the discrete nature of spikes. One alternative method is to convert an Artificial Neural Network (ANN) into an SNN, known as ANN-SNN conversion. Currently, existing ANN-SNN conversion methods often involve redesigning the ANN with a new activation function, rather than utilizing the traditional ReLU, and converting it to an SNN. However, these methods do not take into account the potential performance loss between the regular ANN with ReLU and the tailored ANN. In this work, we …

Go to article

High-Probability Bounds For Stochastic Optimization And Variational Inequalities: The Case Of Unbounded Variance, Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

Machine Learning Faculty Publications

During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded …

Go to article

On The Accelerated Noise-Tolerant Power Method, Zhiqiang Xu

Machine Learning Faculty Publications

We revisit the acceleration of the noise-tolerant power method for which, despite previous studies, the results remain unsatisfactory as they are either wrong or suboptimal, also lacking generality. In this work, we present a simple yet general and optimal analysis via noise-corrupted Chebyshev polynomials, which allows a larger iteration rank p than the target rank k, requires less noise conditions in a new form, and achieves the optimal iteration complexity (Equation presented) for some q satisfying k ≤ q ≤ p in a certain regime of the momentum parameter. Interestingly, it shows dynamic dependence of the noise tolerance on the …

Go to article

A Hybrid Artificial Intelligence Model For Detecting Keratoconus, Zaid Abdi Alkareem Alyasseri, Ali H. Al-Timemy, Ammar Kamal Abasi, Alexandru Lavric, Husam Jasim Mohammed, Hidenori Takahashi, Jose Arthur Milhomens Filho, Mauro Campos, Rossen M. Hazarbassanov, Siamak Yousefi

Machine Learning Faculty Publications

Machine learning models have recently provided great promise in diagnosis of several ophthalmic disorders, including keratoconus (KCN). Keratoconus, a noninflammatory ectatic corneal disorder characterized by progressive cornea thinning, is challenging to detect as signs may be subtle. Several machine learning models have been proposed to detect KCN, however most of the models are supervised and thus require large well-annotated data. This paper proposes a new unsupervised model to detect KCN, based on adapted flower pollination algorithm (FPA) and the k-means algorithm. We will evaluate the proposed models using corneal data collected from 5430 eyes at different stages of KCN severity …

Go to article

Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik

Computer Vision Faculty Publications

Human bio-signal fusion is considered a critical technological solution that needs to be advanced to enable modern and secure digital health and well-being applications in the metaverse. To support such efforts, we propose a new data-driven digital twin (DT) system to fuse three human physiological bio-signals: heart rate (HR), breathing rate (BR), and blood oxygen saturation level (SpO2). To accomplish this goal, we design a computer vision technology based on the non-invasive photoplethysmography (PPG) technique to extract raw time-series bio-signal data from facial video frames. Then, we implement machine learning (ML) technology to model and measure the bio-signals. We accurately …

Go to article

Predicting Publication Of Clinical Trials Using Structured And Unstructured Data: Model Development And Validation Study, Siyang Wang, Simon Šuster, Timothy Baldwin, Karin Verspoor

Natural Language Processing Faculty Publications

Background: Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack …

Go to article

Robustar: Interactive Toolbox Supporting Precise Data Annotation For Robust Vision Learning, Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, Jingcheng Wu, Xinnuo Li, Linjing Sun, Eric Xing

Machine Learning Faculty Publications

We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model’s robustness is the tendency of the model’s learning of spurious features, we aim to solve this problem from its root at the data perspective by removing the spurious features from the data before training. In particular, we introduce a software that helps the users to better prepare the data for training image classification models by allowing the users to annotate the spurious features …

Go to article

Action-Sufficient State Representation Learning For Control With Structural Constraints, Biwei Huang, Chaochao Lu, Liu Leqi, Josã© Miguel Hernã¡Ndez-Lobato, Clark Glymour, Bernhard Schã¶Lkopf, Kun Zhang

Machine Learning Faculty Publications

Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing …

Go to article

Industrial Digital Twins At The Nexus Of Nextg Wireless Networks And Computational Intelligence: A Survey, Shah Zeb, Aamir Mahmood, Syed Ali Hassan, Md. Jalil Piran, Mikael Gidlund, Mohsen Guizani

Machine Learning Faculty Publications

By amalgamating recent communication and control technologies, computing and data analytics techniques, and modular manufacturing, Industry 4.0 promotes integrating cyber–physical worlds through cyber–physical systems (CPS) and digital twin (DT) for monitoring, optimization, and prognostics of industrial processes. A DT enables interaction with the digital image of the industrial physical objects/processes to simulate, analyze, and control their real-time operation. DT is rapidly diffusing in numerous industries with the interdisciplinary advances in the industrial Internet of things (IIoT), edge and cloud computing, machine learning, artificial intelligence, and advanced data analytics. However, the existing literature lacks in identifying and discussing the role and …

Go to article

Learning To Learn Kernels With Variational Random Features, Xiantong Zhen, Haoliang Sun, Yingjun Du, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

Machine Learning Faculty Publications

We introduce kernels with random Fourier features in the meta-learning framework for few-shot learning. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as a variational inference problem by deriving an evidence lower bound under the meta-learning framework. To incorporate shared knowledge from related tasks, we propose a context inference of the posterior, which is established by an LSTM architecture. The LSTMbased inference network effectively integrates the context information of previous …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Offenseval 2023: Offensive Language Identification In The Age Of Large Language Models, Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe

Natural Language Processing Faculty Publications

Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik

Computer Vision Faculty Publications

Disease Progression Modelling Of Alzheimer's Disease Using Probabilistic Principal Components Analysis, Martin Saint-Jalmes, Victor Fedyashov, Daniel Beck, Timothy Baldwin, Noel G. Faux, Pierrick Bourgeat, Jurgen Fripp, Colin L. Masters, Benjamin Goudey

Natural Language Processing Faculty Publications

A Unified Optimization Framework Of Ann-Snn Conversion: Towards Optimal Mapping From Activation Values To Firing Rates, Haiyan Jiang, Srinivas Anumasa, Giulia De Masi, Huan Xiong, Bin Gu

Machine Learning Faculty Publications

High-Probability Bounds For Stochastic Optimization And Variational Inequalities: The Case Of Unbounded Variance, Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik

Machine Learning Faculty Publications

On The Accelerated Noise-Tolerant Power Method, Zhiqiang Xu

Machine Learning Faculty Publications

A Hybrid Artificial Intelligence Model For Detecting Keratoconus, Zaid Abdi Alkareem Alyasseri, Ali H. Al-Timemy, Ammar Kamal Abasi, Alexandru Lavric, Husam Jasim Mohammed, Hidenori Takahashi, Jose Arthur Milhomens Filho, Mauro Campos, Rossen M. Hazarbassanov, Siamak Yousefi

Machine Learning Faculty Publications

Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik

Computer Vision Faculty Publications

Predicting Publication Of Clinical Trials Using Structured And Unstructured Data: Model Development And Validation Study, Siyang Wang, Simon Šuster, Timothy Baldwin, Karin Verspoor

Natural Language Processing Faculty Publications

Robustar: Interactive Toolbox Supporting Precise Data Annotation For Robust Vision Learning, Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, Jingcheng Wu, Xinnuo Li, Linjing Sun, Eric Xing

Machine Learning Faculty Publications

Action-Sufficient State Representation Learning For Control With Structural Constraints, Biwei Huang, Chaochao Lu, Liu Leqi, Josã© Miguel Hernã¡Ndez-Lobato, Clark Glymour, Bernhard Schã¶Lkopf, Kun Zhang

Machine Learning Faculty Publications

Industrial Digital Twins At The Nexus Of Nextg Wireless Networks And Computational Intelligence: A Survey, Shah Zeb, Aamir Mahmood, Syed Ali Hassan, Md. Jalil Piran, Mikael Gidlund, Mohsen Guizani

Machine Learning Faculty Publications

Learning To Learn Kernels With Variational Random Features, Xiantong Zhen, Haoliang Sun, Yingjun Du, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

Machine Learning Faculty Publications