Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 121 - 140 of 140

Full-Text Articles in Data Science

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi Dec 2020

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi

Dissertations

Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …


Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua Nov 2020

Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua

Research Collection School Of Computing and Information Systems

Education is very important to Singapore, and the government has continued to invest heavily in our education system to become one of the world-class systems today. A strong foundation of Science, Technology, Engineering, and Mathematics (STEM) was what underpinned Singapore's development over the past 50 years. PISA is a triennial international survey that evaluates education systems worldwide by testing the skills and knowledge of 15-year-old students who are nearing the end of compulsory education. In this paper, the authors used the PISA data from 2012 and 2015 and developed machine learning techniques to predictive the students' scores and understand the …


A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui Oct 2020

A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui

Markey Cancer Center Faculty Publications

BACKGROUND: Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of …


European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong Oct 2020

European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong

Research Collection School Of Computing and Information Systems

This research utilized the intrinsic quality of European floating strike lookback call options, alongside selected return and volatility parameters, in a K-means clustering environment, to recommend an alpha generative trading strategy. The result is an elegant easy-to-use alpha strategy based on the option mechanisms which identifies investment assets with high degree of significance. In an upward trending market, the research had identified European floating strike lookback call option as an evaluative criterion and investable asset, which would both allow investors to predict and profit from alpha opportunities. The findings will be useful for (i) buy-side investors seeking alpha generation and/or …


Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum Aug 2020

Predicting Attrition - A Driver For Creating Value, Realizing Strategy, And Refining Key Hr Processes, Kevin Mendonsa, Maureen Stolberg, Vivek Viswanathan, Scott Crum

SMU Data Science Review

Talent is the most important asset for every organization's success. While attrition (or churn) and turnover can refer to both employees and customers, this paper will focus on employee attrition only. Many organizations accept attrition as an inevitable cost of doing business and do nothing to adopt or implement mitigating strategies to combat it. World class companies on the other hand take deliberate measures to understand, control and mitigate attrition (turnover) at every stage. Unmitigated attrition can have a devastating effect on an organization's bottom line and market value. In addition, the “invisible" costs of low employee morale, reduced employee …


Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield Aug 2020

Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield

Computational and Data Sciences (PhD) Dissertations

Over the past 100 years, assessment tools have been developed that allow us to explore mental and behavioral processes that could not be measured before. However, conventional statistical models used for psychological data are lacking in thoroughness and predictability. This provides a perfect opportunity to use machine learning to study the data in a novel way. In this paper, we present examples of using machine learning techniques with data in three areas: eating disorders, body satisfaction, and Autism Spectrum Disorder (ASD). We explore clustering algorithms as well as virtual reality (VR).

Our first study employs the k-means clustering algorithm to …


Applications Of Artificial Intelligence And Graphy Theory To Cyberbullying, Jesse D. Simpson Aug 2020

Applications Of Artificial Intelligence And Graphy Theory To Cyberbullying, Jesse D. Simpson

MSU Graduate Theses

Cyberbullying is an ongoing and devastating issue in today's online social media. Abusive users engage in cyber-harassment by utilizing social media to send posts, private messages, tweets, or pictures to innocent social media users. Detecting and preventing cases of cyberbullying is crucial. In this work, I analyze multiple machine learning, deep learning, and graph analysis algorithms and explore their applicability and performance in pursuit of a robust system for detecting cyberbullying. First, I evaluate the performance of the machine learning algorithms Support Vector Machine, Naïve Bayes, Random Forest, Decision Tree, and Logistic Regression. This yielded positive results and obtained upwards …


Prediction Of Feed Utilization Performance In Clarias Gariepinus Using Multiple Linear Regression In Machine Learning, Adekunle Oluwatosin Familusi Jun 2020

Prediction Of Feed Utilization Performance In Clarias Gariepinus Using Multiple Linear Regression In Machine Learning, Adekunle Oluwatosin Familusi

Journal of Bioresource Management

Machine learning models can be used to make predictions about nutrient utilization performance index using available proximate analysis data on feed composition. Data from similar experiments on nutrient utilization performance was used to fit a multiple linear regression model for the prediction of four performance indexes. The Specific Growth Rate and percentage inclusion with strength of 0.57 was noted along with a negative relationship between protein efficiency and protein content. A negative relationship between Nitrogen Free Extract (NFE) and Protein Efficiency Ratio (PER) at NFE content ≥25 % was observed. PER was predicted with 85 % accuracy, while Weight Gain …


Pathways To The Native Storyteller: A Method To Enable Computational Story Understanding, Aramide O. Kehinde Jun 2020

Pathways To The Native Storyteller: A Method To Enable Computational Story Understanding, Aramide O. Kehinde

College of Computing and Digital Media Dissertations

The primary objective of this thesis is to develop a method that uses machine learning algorithms to enable computational story understanding. This research is conducted with the aim of establishing a system called the Native Storyteller that plans and creates storytelling experiences for human users. The paper first establishes the desired capabilities of the system and then deep dives into how to enable story understanding, which is the core ability the system needs to function. As such, the research places emphasis on natural language processing and its application to solving key problems in this context. Namely, machine representation of story …


Development Of Fully Balanced Ssfp And Computer Vision Applications For Mri-Assisted Radiosurgery (Mars), Jeremiah Sanders May 2020

Development Of Fully Balanced Ssfp And Computer Vision Applications For Mri-Assisted Radiosurgery (Mars), Jeremiah Sanders

Dissertations & Theses (Open Access)

Prostate cancer is the second most common cancer in men and the second-leading cause of cancer death in men. Brachytherapy is a highly effective treatment option for prostate cancer, and is the most cost-effective initial treatment among all other therapeutic options for low to intermediate risk patients of prostate cancer. In low-dose-rate (LDR) brachytherapy, verifying the location of the radioactive seeds within the prostate and in relation to critical normal structures after seed implantation is essential to ensuring positive treatment outcomes.

One current gap in knowledge is how to simultaneously image the prostate, surrounding anatomy, and radioactive seeds within the …


High Performance And Machine Learning Algorithms For Brain Fmri Data, Taban Eslami Apr 2020

High Performance And Machine Learning Algorithms For Brain Fmri Data, Taban Eslami

Dissertations

Brain disorders are very difficult to diagnose for reasons such as overlapping nature of symptoms, individual differences in brain structure, lack of medical tests and unknown causes of some disorders. The current psychiatric diagnostic process is based on behavioral observation and may be prone to misdiagnosis.

Noninvasive brain imaging technologies such as Magnetic Resonance Imaging (MRI) and functional Magnetic Resonance Imaging (fMRI) make the process of understanding the structure and function of the brain easier. Quantitative analysis of brain imaging data using machine learning and data mining techniques can be advantageous not only to increase the accuracy of brain disorder …


Algorithm Selection Framework: A Holistic Approach To The Algorithm Selection Problem, Marc W. Chalé Mar 2020

Algorithm Selection Framework: A Holistic Approach To The Algorithm Selection Problem, Marc W. Chalé

Theses and Dissertations

A holistic approach to the algorithm selection problem is presented. The “algorithm selection framework" uses a combination of user input and meta-data to streamline the algorithm selection for any data analysis task. The framework removes the conjecture of the common trial and error strategy and generates a preference ranked list of recommended analysis techniques. The framework is performed on nine analysis problems. Each of the recommended analysis techniques are implemented on the corresponding data sets. Algorithm performance is assessed using the primary metric of recall and the secondary metric of run time. In six of the problems, the recall of …


A Systematic Literature Survey Of Unmanned Aerial Vehicle Based Structural Health Monitoring, Sreehari Sreenath Jan 2020

A Systematic Literature Survey Of Unmanned Aerial Vehicle Based Structural Health Monitoring, Sreehari Sreenath

Theses, Dissertations and Capstones

Unmanned Aerial Vehicles (UAVs) are being employed in a multitude of civil applications owing to their ease of use, low maintenance, affordability, high-mobility, and ability to hover. UAVs are being utilized for real-time monitoring of road traffic, providing wireless coverage, remote sensing, search and rescue operations, delivery of goods, security and surveillance, precision agriculture, and civil infrastructure inspection. They are the next big revolution in technology and civil infrastructure, and it is expected to dominate more than $45 billion market value. The thesis surveys the UAV assisted Structural Health Monitoring or SHM literature over the last decade and categorize UAVs …


Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland Jan 2020

Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland

Research outputs 2014 to 2021

© 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because …


Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

We propose a new approach to activity discovery, based on the neural language modelling of streaming sensor events. Our approach proceeds in multiple stages: we build binary links between activities using probability distributions generated by a neural language model trained on the dataset, and combine the binary links to produce complex activities. We then use the activities as sensor events, allowing us to build complex hierarchies of activities. We put an emphasis on dealing with interleaving, which represents a major challenge for many existing activity discovery systems. The system is tested on a realistic dataset, demonstrating it as a promising …


Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe Jan 2020

Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe

Engineering Management & Systems Engineering Faculty Publications

Special information has a significant role in disaster management. Land cover mapping can detect short- and long-term changes and monitor the vulnerable habitats. It is an effective evaluation to be included in the disaster management system to protect the conservation areas. The critical visual and statistical information presented to the decision-makers can help in mitigation or adaption before crossing a threshold. This paper aims to contribute in the academic and the practice aspects by offering a potential solution to enhance the disaster data source effectiveness. The key research question that the authors try to answer in this paper is how …


Experiments On The Neural Network Approach To The Handwritten Digit Classification Problem, William Meissner Jan 2020

Experiments On The Neural Network Approach To The Handwritten Digit Classification Problem, William Meissner

Electronic Theses and Dissertations

When the MNIST dataset was introduced in 1998, training a network was a multiple week problem in order to receive results far less accurate than an average CPU can produce within a couple of hours today. While this indicates that training a network on such a dataset is not the complicated problem it may have been twenty years ago, the MNIST dataset makes a good tool for study and testing with beginner and medium complexity neural networks. This paper follows along with the work presented in the online textbook “Neural Networks and Deep Learning” by Michael Nielson and an updated …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman May 2019

Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman

Journal Articles

Watershed technique from mathematical morphology (MM) is one of the most widely used operators for image segmentation. Recently watersheds are adapted to edge weighted graphs, allowing for wider applicability. However, a few questions remain to be answered - How do the boundaries of the watershed operator behave? Which loss function does the watershed operator optimize? How does watershed operator relate with existing ideas from machine learning. In this letter, a framework is developed, which allows one to answer these questions. This is achieved by generalizing the maximum margin principle to maximum margin partition and proposing a generic solution, morphMedian, resulting …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …