Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 629

Full-Text Articles in Physical Sciences and Mathematics

Experimental Comparison Of Features And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Wei Minn Oct 2020

Experimental Comparison Of Features And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Wei Minn

Research Collection School Of Information Systems

Android platform has dominated the smart phone market for years now and, consequently, gained a lot of attention from attackers. Malicious apps (malware) pose a serious threat to the security and privacy of Android smart phone users. Available approaches to detect mobile malware based on machine learning rely on features extracted with static analysis or dynamic analysis techniques. Dif- ferent types of machine learning classi ers (such as support vector machine and random forest) deep learning classi ers (based on deep neural networks) are then trained on extracted features, to produce models that can be used to detect mobile malware ...


At The Interface Of Algebra And Statistics, Tai-Danae Bradley Jun 2020

At The Interface Of Algebra And Statistics, Tai-Danae Bradley

All Dissertations, Theses, and Capstone Projects

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals ...


Deploying Machine Learning For A Sustainable Future, Cary Coglianese May 2020

Deploying Machine Learning For A Sustainable Future, Cary Coglianese

Faculty Scholarship at Penn Law

To meet the environmental challenges of a warming planet and an increasingly complex, high tech economy, government must become smarter about how it makes policies and deploys its limited resources. It specifically needs to build a robust capacity to analyze large volumes of environmental and economic data by using machine-learning algorithms to improve regulatory oversight, monitoring, and decision-making. Three challenges can be expected to drive the need for algorithmic environmental governance: more problems, less funding, and growing public demands. This paper explains why algorithmic governance will prove pivotal in meeting these challenges, but it also presents four likely obstacles that ...


Yoga Pose Classification Using Deep Learning, Shruti Kothari May 2020

Yoga Pose Classification Using Deep Learning, Shruti Kothari

Master's Projects

Human pose estimation is a deep-rooted problem in computer vision that has exposed many challenges in the past. Analyzing human activities is beneficial in many fields like video- surveillance, biometrics, assisted living, at-home health monitoring etc. With our fast-paced lives these days, people usually prefer exercising at home but feel the need of an instructor to evaluate their exercise form. As these resources are not always available, human pose recognition can be used to build a self-instruction exercise system that allows people to learn and practice exercises correctly by themselves. This project lays the foundation for building such a system ...


Emerging Technologies In Healthcare: Analysis Of Unos Data Through Machine Learning, Reyhan Merekar May 2020

Emerging Technologies In Healthcare: Analysis Of Unos Data Through Machine Learning, Reyhan Merekar

Student Theses

The healthcare industry is primed for a massive transformation in the coming decades due to emerging technologies such as Artificial Intelligence (AI) and Machine Learning. With a practical application to the UNOS (United Network of Organ Sharing) database, this Thesis seeks to investigate how Machine Learning and analytic methods may be used to predict one-year heart transplantation outcomes. This study also sought to improve on predictive performances from prior studies by analyzing both Donor and Recipient data. Models built with algorithms such as Stacking and Tree Boosting gave the highest performance, with AUC’s of 0.6810 and 0.6804 ...


Using Color Thresholding And Contouring To Understand Coral Reef Biodiversity, Scott Vuong Tran May 2020

Using Color Thresholding And Contouring To Understand Coral Reef Biodiversity, Scott Vuong Tran

Master's Projects

This paper presents research outcomes of understanding coral reef biodiversity through the usage of various computer vision applications and techniques. It aims to help further analyze and understand the coral reef biodiversity through the usage of color thresholding and contouring onto images of the ARMS plates to extract groups of microorganisms based on color. The results are comparable to the manual markup tool developed to do the same tasks and shows that the manual process can be sped up using computer vision. The paper presents an automated way to extract groups of microorganisms based on color without the use of ...


Understanding Impact Of Twitter Feed On Bitcoin Price And Trading Patterns, Ashrit Deebadi May 2020

Understanding Impact Of Twitter Feed On Bitcoin Price And Trading Patterns, Ashrit Deebadi

Master's Projects

‘‘Cryptocurrency trading was one of the most exciting jobs of 2017’’. ‘‘Bit- coin’’,‘‘Blockchain’’, ‘‘Bitcoin Trading’’ were the most searched words in Google during 2017. High return on investment has attracted many people towards this crypto market. Existing research has shown that the trading price is completely based on speculation, and its trading volume is highly impacted by news media. This paper discusses the existing work to evaluate the sentiment and price of the cryptocurrency, the issues with the current trading models. It builds possible solutions to understand better the semantic orientation of text by comparing different machine learning techniques ...


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and ...


Straggler-Resistant Distributed Matrix Computation Via Coding Theory: Removing A Bottleneck In Large-Scale Data Processing, Aditya Ramamoorthy, Anindya Bijoy Das, Li Tang May 2020

Straggler-Resistant Distributed Matrix Computation Via Coding Theory: Removing A Bottleneck In Large-Scale Data Processing, Aditya Ramamoorthy, Anindya Bijoy Das, Li Tang

Electrical and Computer Engineering Publications

The current BigData era routinely requires the processing of large scale data on massive distributed computing clusters. Such large scale clusters often suffer from the problem of "stragglers", which are defined as slow or failed nodes. The overall speed of a computational job on these clusters is typically dominated by stragglers in the absence of a sophisticated assignment of tasks to the worker nodes. In recent years, approaches based on coding theory (referred to as "coded computation") have been effectively used for straggler mitigation. Coded computation offers significant benefits for specific classes of problems such as distributed matrix computations (which ...


Investigating Machine Learning Techniques For Gesture Recognition With Low-Cost Capacitive Sensing Arrays, Michael Fahr Jr. May 2020

Investigating Machine Learning Techniques For Gesture Recognition With Low-Cost Capacitive Sensing Arrays, Michael Fahr Jr.

Computer Science and Computer Engineering Undergraduate Honors Theses

Machine learning has proven to be an effective tool for forming models to make predictions based on sample data. Supervised learning, a subset of machine learning, can be used to map input data to output labels based on pre-existing paired data. Datasets for machine learning can be created from many different sources and vary in complexity, with popular datasets including the MNIST handwritten dataset and CIFAR10 image dataset. The focus of this thesis is to test and validate multiple machine learning models for accurately classifying gestures performed on a low-cost capacitive sensing array. Multiple neural networks are trained using gesture ...


Applying Imitation And Reinforcement Learning To Sparse Reward Environments, Haven Brown May 2020

Applying Imitation And Reinforcement Learning To Sparse Reward Environments, Haven Brown

Computer Science and Computer Engineering Undergraduate Honors Theses

The focus of this project was to shorten the time it takes to train reinforcement learning agents to perform better than humans in a sparse reward environment. Finding a general purpose solution to this problem is essential to creating agents in the future capable of managing large systems or performing a series of tasks before receiving feedback. The goal of this project was to create a transition function between an imitation learning algorithm (also referred to as a behavioral cloning algorithm) and a reinforcement learning algorithm. The goal of this approach was to allow an agent to first learn to ...


Comparison Of A Collective Intelligence Tailored Messaging System On Smoking Cessation Between African American And White People Who Smoke: Quasi-Experimental Design, Jamie Faro, Catherine S. Nagawa, Jeroan J. Allison, Stephenie C. Lemon, Kathleen M. Mazor, Thomas K. Houston, Rajani S. Sadasivam Apr 2020

Comparison Of A Collective Intelligence Tailored Messaging System On Smoking Cessation Between African American And White People Who Smoke: Quasi-Experimental Design, Jamie Faro, Catherine S. Nagawa, Jeroan J. Allison, Stephenie C. Lemon, Kathleen M. Mazor, Thomas K. Houston, Rajani S. Sadasivam

Open Access Articles

BACKGROUND: The Patient Experience Recommender System for Persuasive Communication Tailoring (PERSPeCT) is a machine learning recommender system with a database of messages to motivate smoking cessation. PERSPeCT uses the collective intelligence of users (ie, preferences and feedback) and demographic and smoking profiles to select motivating messages. PERSPeCT may be more beneficial for tailoring content to minority groups influenced by complex, personally relevant factors.

OBJECTIVE: The objective of this study was to describe and evaluate the use of PERSPeCT in African American people who smoke compared with white people who smoke.

METHODS: Using a quasi-experimental design, we compared African American people ...


Effective Smart Video Survillance And Alertinng, Kranthikumar Putta Apr 2020

Effective Smart Video Survillance And Alertinng, Kranthikumar Putta

Other Student Works

Machine learning models & AI are popular term in past couple of years. Almost every aspect on our daily life we are using these machine learning models & AI knowing or unknowing. The simple statement of this research paper is to prove reduction of human effort in camera surveillance by using machine learning models such as objection detection & facial recognition.

TensorFlow is leading open source tool developed and published by Google helps to build objection detection and facial recognition library. The code we about to develop can allow user to train model with his choice of images of objects and faces of ...


Does Applying Deep Learning In Financial Sentiment Analysis Lead To Better Classification Performance?, Tao Wang, Changhe Yuan, Cuiyuan Wang Apr 2020

Does Applying Deep Learning In Financial Sentiment Analysis Lead To Better Classification Performance?, Tao Wang, Changhe Yuan, Cuiyuan Wang

Publications and Research

Using a unique data set from Seeking Alpha, we compare the deep learning approach with traditional machine learning approaches in classifying financial text. We apply the long short-term memory (LSTM) as the deep learning method and Naive Bayes, SVM, Logistic Regression, XGBoost as the traditional machine learning approaches. The results suggest that the LSTM model outperforms the conventional machine learning methods on all metrics. Based on the tSNE graph, the success of the LSTM model is partially explained as the high-accuracy LSTM model distinguishes between positive and negative important sentiment words while those words are chosen based on SHAP values ...


The Effects Of Mixed-Initiative Visualization Systems On Exploratory Data Analysis, Adam Kern Apr 2020

The Effects Of Mixed-Initiative Visualization Systems On Exploratory Data Analysis, Adam Kern

Engineering and Applied Science Theses & Dissertations

The main purpose of information visualization is to act as a window between a user and data. Historically, this has been accomplished via a single-agent framework: the only decisionmaker in the relationship between visualization system and analyst is the analyst herself. Yet this framework arose not from first principles, but from necessity: prior to this decade, computers were limited in their decision-making capabilities, especially in the face of large, complex datasets and visualization systems. This thesis aims to present the design and evaluation of a mixed-initiative system that aids the user in handling large, complex datasets and dense visualization systems ...


Atmospheric Contrail Detection With A Deep Learning Algorithm, Nasir Siddiqui Apr 2020

Atmospheric Contrail Detection With A Deep Learning Algorithm, Nasir Siddiqui

Student Research, Papers, and Creative Works

Aircraft contrail emission is widely believed to be a contributing factor to global climate change. We have used machine learning techniques on images containing contrails in hopes of being able to identify those which contain contrails and those that do not. The developed algorithm processes data on contrail characteristics as captured by long-term image records. Images collected by the United States Deparment of Energy’s Atmospheric Radiation Management user facility(ARM) were used to train a deep convolutional neural network for the purpose of this contrail classification. The neural network model was trained with 1600 images taken by the Total ...


Applications Of Machine Learning To Threat Intelligence, Intrusion Detection And Malware, Charity Barker Apr 2020

Applications Of Machine Learning To Threat Intelligence, Intrusion Detection And Malware, Charity Barker

Senior Honors Theses

Artificial Intelligence (AI) and Machine Learning (ML) are emerging technologies with applications to many fields. This paper is a survey of use cases of ML for threat intelligence, intrusion detection, and malware analysis and detection. Threat intelligence, especially attack attribution, can benefit from the use of ML classification. False positives from rule-based intrusion detection systems can be reduced with the use of ML models. Malware analysis and classification can be made easier by developing ML frameworks to distill similarities between the malicious programs. Adversarial machine learning will also be discussed, because while ML can be used to solve problems or ...


Text Analytics, Nlp, And Accounting Research, Richard M. Crowley Apr 2020

Text Analytics, Nlp, And Accounting Research, Richard M. Crowley

Research Collection School Of Accountancy

The presentation covered: What is text analytics and NLP?; How text analytics has evolved in the accounting literature since the 1980s; What current (as of 2020) methods are used in the literature; What methods are on the horizon.


Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor Mar 2020

Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor

Doctoral Dissertations

A supervised machine learning model is trained with a large set of labeled training data, and evaluated on a smaller but still large set of test data. Especially with deep neural networks (DNNs), the complexity of the model requires that an extremely large data set is collected to prevent overfitting. It is often the case that these models do not take into account specific attributes of the training set examples, but instead treat each equally in the process of model training. This is due to the fact that it is difficult to model latent traits of individual examples at the ...


Event-Based Visual-Inertial Odometry Using Smart Features, Zachary P. Friedel Mar 2020

Event-Based Visual-Inertial Odometry Using Smart Features, Zachary P. Friedel

Theses and Dissertations

Event-based cameras are a novel type of visual sensor that operate under a unique paradigm, providing asynchronous data on the log-level changes in light intensity for individual pixels. This hardware-level approach to change detection allows these cameras to achieve ultra-wide dynamic range and high temporal resolution. Furthermore, the advent of convolutional neural networks (CNNs) has led to state-of-the-art navigation solutions that now rival or even surpass human engineered algorithms. The advantages offered by event cameras and CNNs make them excellent tools for visual odometry (VO). This document presents the implementation of a CNN trained to detect and describe features within ...


Retiming Smoke Simulation Using Machine Learning, Samuel Charles Gérard Giraud Carrier Mar 2020

Retiming Smoke Simulation Using Machine Learning, Samuel Charles Gérard Giraud Carrier

Theses and Dissertations

Art-directability is a crucial aspect of creating aesthetically pleasing visual effects that help tell stories. A particularly common method of art direction is the retiming of a simulation. Unfortunately, the means of retiming an existing simulation sequence which preserves the desired shapes is an ill-defined problem. Naively interpolating values between frames leads to visual artifacts such as choppy frames or jittering intensities. Due to the difficulty in formulating a proper interpolation method we elect to use a machine learning approach to approximate this function. Our model is based on the ODE-net structure and reproduces a set of desired time samples ...


Automated Detection And Mitigation Of Inefficient Visual Searching Using Electroencephalography And Machine Learning, Joshua P. Gallaher Mar 2020

Automated Detection And Mitigation Of Inefficient Visual Searching Using Electroencephalography And Machine Learning, Joshua P. Gallaher

Theses and Dissertations

Decisions made during the high-stress and fast-paced operations of the military are extremely prone to cognitive biases. A commonly known cognitive bias is a confirmation bias, or the inappropriate bolstering of an unknown hypothesis. One such critical military operation that can fall prey to a confirmation bias is a visual search. During a visual search, a military operator must perform a visual scan of an environment for a specific target. However, the visual search process can fall prey to the same confirmation bias which can cause inefficient searches. This study elicits inefficient visual search patterns and applies various mitigation techniques ...


Characterizing Regime-Based Flow Uncertainty, John L. Fioretti Mar 2020

Characterizing Regime-Based Flow Uncertainty, John L. Fioretti

Theses and Dissertations

The goal of this work is to develop a regime-based quantification of horizontal wind field uncertainty utilizing a global ensemble numerical weather prediction model. In this case, the Global Ensemble Forecast System Reforecast (GEFSR) data is utilized. The machine learning algorithm that is employed is the mini-batch K-means clustering algorithm. 850 hPa Horizontal flow fields are clustered and the forecast uncertainty in these flow fields is calculated for different forecast times for regions across the globe. This provides end-users quantified flow-based forecast uncertainty.


Cyber-Physical Security With Rf Fingerprint Classification Through Distance Measure Extensions Of Generalized Relevance Learning Vector Quantization, Trevor J. Bihl, Todd J. Paciencia, Kenneth W. Bauer Jr., Michael A. Temple Feb 2020

Cyber-Physical Security With Rf Fingerprint Classification Through Distance Measure Extensions Of Generalized Relevance Learning Vector Quantization, Trevor J. Bihl, Todd J. Paciencia, Kenneth W. Bauer Jr., Michael A. Temple

Faculty Publications

Radio frequency (RF) fingerprinting extracts fingerprint features from RF signals to protect against masquerade attacks by enabling reliable authentication of communication devices at the “serial number” level. Facilitating the reliable authentication of communication devices are machine learning (ML) algorithms which find meaningful statistical differences between measured data. The Generalized Relevance Learning Vector Quantization-Improved (GRLVQI) classifier is one ML algorithm which has shown efficacy for RF fingerprinting device discrimination. GRLVQI extends the Learning Vector Quantization (LVQ) family of “winner take all” classifiers that develop prototype vectors (PVs) which represent data. In LVQ algorithms, distances are computed between exemplars and PVs, and ...


Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni Jan 2020

Developing And Improving Risk Models Using Machine-Learning Based Algorithms, Yan Wang, Sherry Ni

Grey Literature from PhD Candidates

The objective of this study is to develop a good risk model for classifying business delinquency by simultaneously exploring several machine learning-based methods including regularization, hyperparameter optimization, and model ensembling algorithms. The rationale under the analyses is firstly to obtain good base binary classifiers (include Logistic Regression (LR), K-Nearest Neighbors (KNN ), Decision Tree (DT), and Artificial Neural Networks (ANN )) via regularization and appropriate settings of hyper-parameters. Then two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement. The models are evaluated using accuracy, Area Under the Receiver Operating Characteristic Curve (AUC ...


Learning-Guided Network Fuzzing For Testing Cyber-Physical System Defences, Yuqi Chen, Christopher M. Poskitt, Jun Sun, Sridhar Adepu, Fan Zhang Jan 2020

Learning-Guided Network Fuzzing For Testing Cyber-Physical System Defences, Yuqi Chen, Christopher M. Poskitt, Jun Sun, Sridhar Adepu, Fan Zhang

Research Collection School Of Information Systems

The threat of attack faced by cyber-physical systems (CPSs), especially when they play a critical role in automating public infrastructure, has motivated research into a wide variety of attack defence mechanisms. Assessing their effectiveness is challenging, however, as realistic sets of attacks to test them against are not always available. In this paper, we propose smart fuzzing, an automated, machine learning guided technique for systematically finding 'test suites' of CPS network attacks, without requiring any knowledge of the system's control programs or physical processes. Our approach uses predictive machine learning models and metaheuristic search algorithms to guide the fuzzing ...


Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen Jan 2020

Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen

Graduate Theses and Dissertations

Cancer is one of the most deadly diseases that the world has been fighting against over decades. An enormous number of research has been conducted, via a wide scale of approaches, raging from genetic analysis to mathematical modeling. Survival analysis is a well-performed methodology frequently used to estimate the survival probability of a patient. Although there has been a large number of methods for survival analysis, efficient exploration of a high-dimensional feature space has been challenging due to its computational cost and complexity. This thesis adapts the component-wise gradient boosting algorithms for cancer survival analysis, and also proposes a new ...


Multi-Class Twitter Data Categorization And Geocoding With A Novel Computing Framework, Sakib Mahmud Khan, Mashrur Chowdhury, Linh B. Ngo, Amy Apon Jan 2020

Multi-Class Twitter Data Categorization And Geocoding With A Novel Computing Framework, Sakib Mahmud Khan, Mashrur Chowdhury, Linh B. Ngo, Amy Apon

Computer Science

This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labeled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets ...


Heterogeneous Multi-Layered Network Model For Omics Data Integration And Analysis, Bohyun Lee, Shuo Zhang, Aleksandar Poleksic, Lei Xie Jan 2020

Heterogeneous Multi-Layered Network Model For Omics Data Integration And Analysis, Bohyun Lee, Shuo Zhang, Aleksandar Poleksic, Lei Xie

Faculty Publications

Advances in next-generation sequencing and high-throughput techniques have enabled the generation of vast amounts of diverse omics data. These big data provide an unprecedented opportunity in biology, but impose great challenges in data integration, data mining, and knowledge discovery due to the complexity, heterogeneity, dynamics, uncertainty, and high-dimensionality inherited in the omics data. Network has been widely used to represent relations between entities in biological system, such as protein-protein interaction, gene regulation, and brain connectivity (i.e. network construction) as well as to infer novel relations given a reconstructed network (aka link prediction). Particularly, heterogeneous multi-layered network (HMLN) has proven ...


Analyze Informant-Based Questionnaire For The Early Diagnosis Of Senile Dementia Using Deep Learning, Fubao Zhu, Xiaonan Li, Daniel Mcgonigle, Haipeng Tang, Zhuo He, Chaoyang Zhang, Guang-Uei Hung, Pai-Yi Chiu, Weihua Zhou Jan 2020

Analyze Informant-Based Questionnaire For The Early Diagnosis Of Senile Dementia Using Deep Learning, Fubao Zhu, Xiaonan Li, Daniel Mcgonigle, Haipeng Tang, Zhuo He, Chaoyang Zhang, Guang-Uei Hung, Pai-Yi Chiu, Weihua Zhou

Michigan Tech Publications

OBJECTIVE: This paper proposes a multiclass deep learning method for the classification of dementia using an informant-based questionnaire.

METHODS: A deep neural network classification model based on Keras framework is proposed in this paper. To evaluate the advantages of our proposed method, we compared the performance of our model with industry-standard machine learning approaches. We enrolled 6,701 individuals, which were randomly divided into training data sets (6030 participants) and test data sets (671 participants). We evaluated each diagnostic model in the test set using accuracy, precision, recall, and F1-Score.

RESULTS: Compared with the seven conventional machine learning algorithms, the ...