Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Series

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 155

Full-Text Articles in Computer Sciences

Conceptthread: Visualizing Threaded Concepts In Mooc Videos, Zhiguang Zhou, Li Ye, Lihong Cai, Lei Wang, Yigang Wang, Yongheng Wang, Wei Chen, Yong Wang Jan 2024

Conceptthread: Visualizing Threaded Concepts In Mooc Videos, Zhiguang Zhou, Li Ye, Lihong Cai, Lei Wang, Yigang Wang, Yongheng Wang, Wei Chen, Yong Wang

Research Collection School Of Computing and Information Systems

Massive Open Online Courses (MOOCs) platforms are becoming increasingly popular in recent years. Online learners need to watch the whole course video on MOOC platforms to learn the underlying new knowledge, which is often tedious and time-consuming due to the lack of a quick overview of the covered knowledge and their structures. In this paper, we propose ConceptThread , a visual analytics approach to effectively show the concepts and the relations among them to facilitate effective online learning. Specifically, given that the majority of MOOC videos contain slides, we first leverage video processing and speech analysis techniques, including shot recognition, …


On The Effect Of Emotion Identification From Limited Translated Text Samples Using Computational Intelligence, Madiha Tahir, Zahid Halim, Muhmmad Waqas, Shanshan Tu Dec 2023

On The Effect Of Emotion Identification From Limited Translated Text Samples Using Computational Intelligence, Madiha Tahir, Zahid Halim, Muhmmad Waqas, Shanshan Tu

Research outputs 2022 to 2026

Emotion identification from text data has recently gained focus of the research community. This has multiple utilities in an assortment of domains. Many times, the original text is written in a different language and the end-user translates it to her native language using online utilities. Therefore, this paper presents a framework to detect emotions on translated text data in four different languages. The source language is English, whereas the four target languages include Chinese, French, German, and Spanish. Computational intelligence (CI) techniques are applied to extract features, dimensionality reduction, and classification of data into five basic classes of emotions. Results …


Predictive Analysis Of Students’ Learning Performance Using Data Mining Techniques: A Comparative Study Of Feature Selection Methods, S. M. F. D. Syed Mustapha Sep 2023

Predictive Analysis Of Students’ Learning Performance Using Data Mining Techniques: A Comparative Study Of Feature Selection Methods, S. M. F. D. Syed Mustapha

All Works

The utilization of data mining techniques for the prompt prediction of academic success has gained significant importance in the current era. There is an increasing interest in utilizing these methodologies to forecast the academic performance of students, thereby facilitating educators to intervene and furnish suitable assistance when required. The purpose of this study was to determine the optimal methods for feature engineering and selection in the context of regression and classification tasks. This study compared the Boruta algorithm and Lasso regression for regression, and Recursive Feature Elimination (RFE) and Random Forest Importance (RFI) for classification. According to the findings, Gradient …


Analyzing Syntactic Constructs Of Java Programs With Machine Learning, Francisco Ortin, Guillermo Facundo, Miguel Garcia Apr 2023

Analyzing Syntactic Constructs Of Java Programs With Machine Learning, Francisco Ortin, Guillermo Facundo, Miguel Garcia

Department of Computer Science Publications

The massive number of open-source projects in public repositories has notably increased in the last years. Such repositories represent valuable information to be mined for different purposes, such as documenting recurrent syntactic constructs, analyzing the particular constructs used by experts and beginners, using them to teach programming and to detect bad programming practices, and building programming tools such as decompilers, Integrated Development Environments or Intelligent Tutoring Systems. An inherent problem of source code is that its syntactic information is represented with tree structures, while traditional machine learning algorithms use -dimensional datasets. Therefore, we present a feature engineering process to translate …


Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua Feb 2023

Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Relation Extraction (RE) is a vital step to complete Knowledge Graph (KG) by extracting entity relations from texts. However, it usually suffers from the long-tail issue. The training data mainly concentrates on a few types of relations, leading to the lack of sufficient annotations for the remaining types of relations. In this paper, we propose a general approach to learn relation prototypes from unlabeled texts, to facilitate the long-tail relation extraction by transferring knowledge from the relation types with sufficient training data. We learn relation prototypes as an implicit factor between entities, which reflects the meanings of relations as well …


Dashboard Design Mining And Recommendation, Yanna Lin, Haotian Li, Aoyu Wu, Yong Wang, Huamin Qu Jan 2023

Dashboard Design Mining And Recommendation, Yanna Lin, Haotian Li, Aoyu Wu, Yong Wang, Huamin Qu

Research Collection School Of Computing and Information Systems

Dashboards, which comprise multiple views on a single display, help analyze and communicate multiple perspectives of data simultaneously. However, creating effective and elegant dashboards is challenging since it requires careful and logical arrangement and coordination of multiple visualizations. To solve the problem, we propose a data-driven approach for mining design rules from dashboards and automating dashboard organization. Specifically, we focus on two prominent aspects of the organization: , which describes the position, size, and layout of each view in the display space; and, which indicates the interaction between pairwise views. We build a new dataset containing 854 dashboards crawled online, …


Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu Dec 2022

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu

Research Collection School Of Computing and Information Systems

Recommender systems learn from historical user-item interactions to identify preferred items for target users. These observed interactions are usually unbalanced following a long-tailed distribution. Such long-tailed data lead to popularity bias to recommend popular but not personalized items to users. We present a gradient perspective to understand two negative impacts of popularity bias in recommendation model optimization: (i) the gradient direction of popular item embeddings is closer to that of positive interactions, and (ii) the magnitude of positive gradient for popular items are much greater than that of unpopular items. To address these issues, we propose a simple yet efficient …


Hybrid Feature Selection Based On Principal Component Analysis And Grey Wolf Optimizer Algorithm For Arabic News Article Classification, Osama Ahmad Alomari, Ashraf Elnagar, Imad Afyouni, Ismail Shahin, Ali Bou Nassif, Ibrahim Abaker Hashem, Mohammad Tubishat Nov 2022

Hybrid Feature Selection Based On Principal Component Analysis And Grey Wolf Optimizer Algorithm For Arabic News Article Classification, Osama Ahmad Alomari, Ashraf Elnagar, Imad Afyouni, Ismail Shahin, Ali Bou Nassif, Ibrahim Abaker Hashem, Mohammad Tubishat

All Works

The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a …


An Empirical Study Of Blockchain System Vulnerabilities: Modules, Types, And Patterns, Xiao Yi, Daoyuan Wu, Lingxiao Jiang, Yuzhou Fang, Kehuan Zhang, Wei Zhang Nov 2022

An Empirical Study Of Blockchain System Vulnerabilities: Modules, Types, And Patterns, Xiao Yi, Daoyuan Wu, Lingxiao Jiang, Yuzhou Fang, Kehuan Zhang, Wei Zhang

Research Collection School Of Computing and Information Systems

Blockchain, as a distributed ledger technology, becomes increasingly popular, especially for enabling valuable cryptocurrencies and smart contracts. However, the blockchain software systems inevitably have many bugs. Although bugs in smart contracts have been extensively investigated, security bugs of the underlying blockchain systems are much less explored. In this paper, we conduct an empirical study on blockchain’s system vulnerabilities from four representative blockchains, Bitcoin, Ethereum, Monero, and Stellar. Specifically, we first design a systematic filtering process to effectively identify 1,037 vulnerabilities and their 2,317 patches from 34,245 issues/PRs (pull requests) and 85,164 commits on GitHub. We thus build the first blockchain …


Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan Sep 2022

Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Subgraph enumeration is important for many applications such as network motif discovery, community detection, and frequent subgraph mining. To accelerate the execution, recent works utilize graphics processing units (GPUs) to parallelize subgraph enumeration. The performances of these parallel schemes are dominated by the set intersection operations which account for up to $95\%$ of the total processing time. (Un)surprisingly, a significant portion (as high as $99\%$) of these operations is actually redundant, i.e., the same set of vertices is repeatedly encountered and evaluated. Therefore, in this paper, we seek to salvage and recycle the results of such operations to avoid repeated …


Using Deep Learning To Detect Social Media ‘Trolls’, Áine Macdermott, Michal Motylinski, Farkhund Iqbal, Kellyann Stamp, Mohammed Hussain, Andrew Marrington Sep 2022

Using Deep Learning To Detect Social Media ‘Trolls’, Áine Macdermott, Michal Motylinski, Farkhund Iqbal, Kellyann Stamp, Mohammed Hussain, Andrew Marrington

All Works

Detecting criminal activity online is not a new concept but how it can occur is changing. Technology and the influx of social media applications and platforms has a vital part to play in this changing landscape. As such, we observe an increasing problem with cyber abuse and ‘trolling’/toxicity amongst social media platforms sharing stories, posts, memes sharing content. In this paper we present our work into the application of deep learning techniques for the detection of ‘trolls’ and toxic content shared on social media platforms. We propose a machine learning solution for the detection of toxic images based on embedded …


Design Demand Trend Acquisition Method Based On Short Text Mining Of User Comments In Shopping Websites, Zhiyong Xiong, Zhaoxiong Yan, Huanan Yao, Shangsong Liang Feb 2022

Design Demand Trend Acquisition Method Based On Short Text Mining Of User Comments In Shopping Websites, Zhiyong Xiong, Zhaoxiong Yan, Huanan Yao, Shangsong Liang

Machine Learning Faculty Publications

In order to facilitate designers to explore the market demand trend of laptops and to establish a better “network users-market feedback mechanism”, we propose a design and research method of a short text mining tool based on the K-means clustering algorithm and Kano mode. An improved short text clustering algorithm is used to extract the design elements of laptops. Based on the traditional questionnaire, we extract the user’s attention factors, score the emotional tendency, and analyze the user’s needs based on the Kano model. Then, we select 10 laptops, process them by the improved algorithm, cluster the evaluation words and …


Subomiembed: Self-Supervised Representation Learning Of Multi-Omics Data For Cancer Type Classification, Sayed Hashim, Muhammad Ali, Karthik Nandakumar, Mohammad Yaqub Feb 2022

Subomiembed: Self-Supervised Representation Learning Of Multi-Omics Data For Cancer Type Classification, Sayed Hashim, Muhammad Ali, Karthik Nandakumar, Mohammad Yaqub

Computer Vision Faculty Publications

For personalized medicines, very crucial intrinsic information is present in high dimensional omics data which is difficult to capture due to the large number of molecular features and small number of available samples. Different types of omics data show various aspects of samples. Integration and analysis of multi-omics data give us a broad view of tumours, which can improve clinical decision making. Omics data, mainly DNA methylation and gene expression profiles are usually high dimensional data with a lot of molecular features. In recent years, variational autoencoders (VAE) [13] have been extensively used in embedding image and text data into …


Data Science Applied To Discover Ancient Minoan-Indus Valley Trade Routes Implied By Commonweight Measures, Peter Revesz Jan 2022

Data Science Applied To Discover Ancient Minoan-Indus Valley Trade Routes Implied By Commonweight Measures, Peter Revesz

CSE Conference and Workshop Papers

This paper applies data mining of weight measures to discover possible long-distance trade routes among Bronze Age civilizations from the Mediterranean area to India. As a result, a new northern route via the Black Sea is discovered between the Minoan and the Indus Valley civilizations. This discovery enhances the growing set of evidence for a strong and vibrant connection among Bronze Age civilizations.


Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii Dec 2021

Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii

Publications and Research

The spaces we live in go through many transformations over the course of a year, a month, or a day; My room has seen tremendous clutter and pristine order within the span of a few hours. My goal is to discover patterns within my space and formulate an understanding of the changes that occur. This insight will provide actionable direction for maintaining a cleaner environment, as well as provide some information about the optimal times for productivity and energy preservation.

Using a Raspberry Pi, I will set up automated image capture in a room in my home. These images will …


Tweet-To-Act: Towards Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed Aug 2021

Tweet-To-Act: Towards Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed

All Works

The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams, synthesizes event-specific …


Characterizing Search Activities On Stack Overflow, Jiakun Liu, Sebastian Baltes, Christoph Treude, David Lo, Yun Zhang, Xin Xia Aug 2021

Characterizing Search Activities On Stack Overflow, Jiakun Liu, Sebastian Baltes, Christoph Treude, David Lo, Yun Zhang, Xin Xia

Research Collection School Of Computing and Information Systems

To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek …


Robust Inference Of Kinase Activity Using Functional Networks, Serhan Yılmaz, Marzieh Ayati, Daniela Schlatzer, A. Ercüment Çiçek, Mark A. Chance, Mehmet Koyutürk Feb 2021

Robust Inference Of Kinase Activity Using Functional Networks, Serhan Yılmaz, Marzieh Ayati, Daniela Schlatzer, A. Ercüment Çiçek, Mark A. Chance, Mehmet Koyutürk

Computer Science Faculty Publications and Presentations

Mass spectrometry enables high-throughput screening of phosphoproteins across a broad range of biological contexts. When complemented by computational algorithms, phospho-proteomic data allows the inference of kinase activity, facilitating the identification of dysregulated kinases in various diseases including cancer, Alzheimer’s disease and Parkinson’s disease. To enhance the reliability of kinase activity inference, we present a network-based framework, RoKAI, that integrates various sources of functional information to capture coordinated changes in signaling. Through computational experiments, we show that phosphorylation of sites in the functional neighborhood of a kinase are significantly predictive of its activity. The incorporation of this knowledge in RoKAI consistently …


Occam Manual, Martin Zwick Jan 2021

Occam Manual, Martin Zwick

Systems Science Faculty Publications and Presentations

Occam is a Discrete Multivariate Modeling (DMM) tool based on the methodology of Reconstructability Analysis (RA). Its typical usage is for analysis of problems involving large numbers of discrete variables. Models are developed which consist of one or more components, which are then evaluated for their fit and statistical significance. Occam can search the lattice of all possible models, or can do detailed analysis on a specific model.

In Variable-Based Modeling (VBM), model components are collections of variables. In State-Based Modeling (SBM), components identify one or more specific states or substates.

Occam provides a web-based interface, which …


Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed Jan 2021

Toward Tweet-Mining Framework For Extracting Terrorist Attack-Related Information And Reporting, Farkhund Iqbal, Rabia Batool, Benjamin C. M. Fung, Saiqa Aleem, Ahmed Abbasi, Abdul Rehman Javed

All Works

The widespread popularity of social networking is leading to the adoption of Twitter as an information dissemination tool. Existing research has shown that information dissemination over Twitter has a much broader reach than traditional media and can be used for effective post-incident measures. People use informal language on Twitter, including acronyms, misspelled words, synonyms, transliteration, and ambiguous terms. This makes incident-related information extraction a non-trivial task. However, this information can be valuable for public safety organizations that need to respond in an emergency. This paper proposes an early event-related information extraction and reporting framework that monitors Twitter streams synthesizes event-specific …


Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo Aug 2020

Prevalence, Contents And Automatic Detection Of Kl-Satd, Leevi Rantala, Mika Mantyla, David Lo

Research Collection School Of Computing and Information Systems

When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic …


Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick Jul 2020

Joint Lattice Of Reconstructability Analysis And Bayesian Network General Graphs, Marcus Harris, Martin Zwick

Systems Science Faculty Publications and Presentations

This paper integrates the structures considered in Reconstructability Analysis (RA) and those considered in Bayesian Networks (BN) into a joint lattice of probabilistic graphical models. This integration and associated lattice visualizations are done in this paper for four variables, but the approach can easily be expanded to more variables. The work builds on the RA work of Klir (1985), Krippendorff (1986), and Zwick (2001), and the BN work of Pearl (1985, 1987, 1988, 2000), Verma (1990), Heckerman (1994), Chickering (1995), Andersson (1997), and others. The RA four variable lattice and the BN four variable lattice partially overlap: there are ten …


Reconstructability Analysis & Its Occam Implementation, Martin Zwick Jul 2020

Reconstructability Analysis & Its Occam Implementation, Martin Zwick

Systems Science Faculty Publications and Presentations

This talk will describe Reconstructability Analysis (RA), a probabilistic graphical modeling methodology deriving from the 1960s work of Ross Ashby and developed in the systems community in the 1980s and afterwards. RA, based on information theory and graph theory, resembles and partially overlaps Bayesian networks (BN) and log-linear techniques, but also has some unique capabilities. (A paper explaining the relationship between RA and BN will be given in this special session.) RA is designed for exploratory modeling although it can also be used for confirmatory hypothesis testing. In RA modeling, one either predicts some DV from a set of IVs …


Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng Jul 2020

Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng

Research Collection School Of Computing and Information Systems

An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …


Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi Feb 2020

Developing Big Data Projects In Open University Engineering Courses: Lessons Learned, Juan A. Lara, Aurea Anguera De Sojo, Shadi Aljawarneh, Robert P. Schumaker, Bassam Al-Shargabi

Computer Science Faculty Publications and Presentations

Big Data courses in which students are asked to carry out Big Data projects are becoming more frequent as a part of University Engineering curriculum. In these courses, instructors and students must face a series of special characteristics, difficulties and challenges that it is important to know about beforehand, so the lecturer can better plan the subject and manage the teaching methods in order to prevent students' academic dropout and low performance. The goal of this research is to approach this problem by sharing the lessons learned in the process of teaching e-learning courses where students are required to develop …


A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe Jan 2020

A Framework For Online Social Network Volatile Data Analysis: A Case For The Fast Fashion Industry, Anoud Bani-Hani, Feras Al-Obeidat, Elhadj Benkhelifa, Oluwasegun Adedugbe

All Works

Consumer satisfaction is an important part for any business as it has been shown to be a major factor for consumer loyalty. Identifying satisfaction in products is also important as it allows businesses alter production plans based on the level of consumer satisfaction for a product. With consumer satisfaction data being very volatile for some products due to a short requirement period for such products, current consumer satisfaction must be identified within a shorter period before the data becomes obsolete. The fast fashion industry, which is part of the fashion industry, is adopted as a case study in this research. …


Alpha Insurance: A Predictive Analytics Case To Analyze Automobile Insurance Fraud Using Sas Enterprise Miner (Tm), Richard Mccarthy, Wendy Ceccucci, Mary Mccarthy, Leila Halawi Apr 2019

Alpha Insurance: A Predictive Analytics Case To Analyze Automobile Insurance Fraud Using Sas Enterprise Miner (Tm), Richard Mccarthy, Wendy Ceccucci, Mary Mccarthy, Leila Halawi

Publications

Automobile Insurance fraud costs the insurance industry billions of dollars annually. This case study addresses claim fraud based on data extracted from Alpha Insurance’s automobile claim database. Students are provided the business problem and data sets. Initially, the students are required to develop their hypotheses and analyze the data. This includes identification of any missing or inaccurate data values and outliers as well as evaluation of the 22 variables. Next students will develop and optimize their predictive models using five techniques: regression, decision tree, neural network, gradient boosting, and ensemble. Then students will determine which model is the best fit …


Applications Of Supervised Machine Learning In Autism Spectrum Disorder Research: A Review, Kayleigh K. Hyde, Marlena N. Novack, Nicholas Lahaye, Chelsea Parlett-Pelleriti, Raymond Anden, Dennis R. Dixon, Erik Linstead Feb 2019

Applications Of Supervised Machine Learning In Autism Spectrum Disorder Research: A Review, Kayleigh K. Hyde, Marlena N. Novack, Nicholas Lahaye, Chelsea Parlett-Pelleriti, Raymond Anden, Dennis R. Dixon, Erik Linstead

Engineering Faculty Articles and Research

Autism spectrum disorder (ASD) research has yet to leverage "big data" on the same scale as other fields; however, advancements in easy, affordable data collection and analysis may soon make this a reality. Indeed, there has been a notable increase in research literature evaluating the effectiveness of machine learning for diagnosing ASD, exploring its genetic underpinnings, and designing effective interventions. This paper provides a comprehensive review of 45 papers utilizing supervised machine learning in ASD, including algorithms for classification and text analysis. The goal of the paper is to identify and describe supervised machine learning trends in ASD literature as …


Exploratory Factor Analysis Of Graphical Features For Link Prediction In Social Networks, Lale Madahali, Lotfi Najjar, Margeret Hall Jan 2019

Exploratory Factor Analysis Of Graphical Features For Link Prediction In Social Networks, Lale Madahali, Lotfi Najjar, Margeret Hall

Interdisciplinary Informatics Faculty Proceedings & Presentations

Social Networks attract much attention due to their ability to replicate social interactions at scale. Link prediction, or the assessment of which unconnected nodes are likely to connect in the future, is an interesting but non-trivial research area. Three approaches exist to deal with the link prediction problem: feature-based models, Bayesian probabilistic models, probabilistic relational models. In feature-based methods, graphical features are extracted and used for classification. Usually, these features are subdivided into three feature groups based on their formula. Some formulas are extracted based on neighborhood graph traverse. Accordingly, there exists three groups of features, neighborhood features, path-based features, …


Brexit: A Granger Causality Of Twitter Political Polarisation On The Ftse 100 Index And The Pound, James Usher, Lucia Morales, Pierpaolo Dondio Jan 2019

Brexit: A Granger Causality Of Twitter Political Polarisation On The Ftse 100 Index And The Pound, James Usher, Lucia Morales, Pierpaolo Dondio

Conference papers

BREXIT is the single biggest geopolitical event in British history since WWII. Whilst the political fallout has become a tragicomedy, the political ramifications has had a profound impact on the Pound and the FTSE 100 index. This paper examines Twitter political discourse surrounding the BREXIT withdrawal agreement. In particular we focus on the discussions around four different exit strategies known as “Norway”, “Article 50”, the“Backstop” and “No Deal” and their effect on the pound and FTSE 100 index from the period of rumblings of the cancellation of the Meaning Vote on December 10th 2018 inclusive of second defeat on the …