Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

2016

Discipline
Institution
Publication
Publication Type

Articles 1 - 30 of 58

Full-Text Articles in Physical Sciences and Mathematics

Evaluating Machine Learning Classifiers For Defensive Cyber Operations, Michael D. Rich, Robert F. Mills, Thomas E. Dube, Steven K. Rogers Dec 2016

Evaluating Machine Learning Classifiers For Defensive Cyber Operations, Michael D. Rich, Robert F. Mills, Thomas E. Dube, Steven K. Rogers

Military Cyber Affairs

Today’s defensive cyber sensors are dominated by signature-based analytical methods that require continuous maintenance and lack the ability to detect unknown threats. Anomaly detection offers the ability to detect unknown threats, but despite over 15 years of active research, the operationalization of anomaly detection and machine learning for Defensive Cyber Operations (DCO) is lagging. This article provides an introduction to machine learning concepts with a focus on the unique challenges to using machine learning for DCO. Traditional machine learning evaluation methods are challenged in favor of a value-focused evaluation method that incorporates evaluator-specific weights for classifier and sensitivity threshold selection …


Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez Dec 2016

Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez

Theses and Dissertations

The purpose of this study was to research and develop a way to use machine learning algorithms (MLAs) to predict student achievement on the State of Texas Assessment of Academic Readiness (STAAR), specifically in the charter school setting. Charter schools have the disadvantage of a constant influx in students, so providing historical student data in order to analyze trends proves difficult. This study expands on previous research done on students in secondary and post-secondary school and determining features that indicate success in these settings. The data used is from the district of IDEA Public Schools who focuses on providing education …


Predicting Malignant Nodules From Screening Ct Scans, Samuel Hawkins, Hua Wang, Ying Liu, Alberto Garcia, Olya Stringfield, Henry Krewer, Qiang Li, Dmitry Cherezov, Matthew Schabath, Lawrence O. Hall, Robert J. Gillies Dec 2016

Predicting Malignant Nodules From Screening Ct Scans, Samuel Hawkins, Hua Wang, Ying Liu, Alberto Garcia, Olya Stringfield, Henry Krewer, Qiang Li, Dmitry Cherezov, Matthew Schabath, Lawrence O. Hall, Robert J. Gillies

Computer Science and Engineering Faculty Publications

Objectives

The aim of this study was to determine whether quantitative analyses (“radiomics”) of low-dose computed tomography lung cancer screening images at baseline can predict subsequent emergence of cancer.

Methods

Public data from the National Lung Screening Trial (ACRIN 6684) were assembled into two cohorts of 104 and 92 patients with screen-detected lung cancer and then matched with cohorts of 208 and 196 screening subjects with benign pulmonary nodules. Image features were extracted from each nodule and used to predict the subsequent emergence of cancer.

Results

The best models used 23 stable features in a random forests classifier and could …


Collective Personalized Change Classification With Multiobjective Search, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang Dec 2016

Collective Personalized Change Classification With Multiobjective Search, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang

Research Collection School Of Computing and Information Systems

Many change classification techniques have been proposed to identify defect-prone changes. These techniques consider all developers' historical change data to build a global prediction model. In practice, since developers have their own coding preferences and behavioral patterns, which causes different defect patterns, a separate change classification model for each developer can help to improve performance. Jiang, Tan, and Kim refer to this problem as personalized change classification, and they propose PCC+ to solve this problem. A software project has a number of developers; for a developer, building a prediction model not only based on his/her change data, but also on …


Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm Nov 2016

Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm

Computer Science ETDs

Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in …


Learning From Pairwise Proximity Data, Hamid Dadkhahi Nov 2016

Learning From Pairwise Proximity Data, Hamid Dadkhahi

Doctoral Dissertations

In many areas of machine learning, the characterization of the input data is given by a form of proximity measure between data points. Examples of such representations are pairwise differences, pairwise distances, and pairwise comparisons. In this work, we investigate different learning problems on data represented in terms of such pairwise proximities. More specifically, we consider three problems: masking (feature selection) for dimensionality reduction, extension of the dimensionality reduction for time series, and online collaborative filtering. For each of these problems, we start with a form of pairwise proximity which is relevant in the problem at hand. We evaluate the …


Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo Nov 2016

Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo

Research Collection School Of Computing and Information Systems

Social media provides a convenient way for customers to express their feedback to companies. Identifying different types of customers based on their feedback behavior can help companies to maintain their customers. In this paper, we use a machine learning approach to predict a customer’s feedback behavior based on her first feedback tweet. First, we identify a few categories of customers based on their feedback frequency and the sentiment of the feedback. We identify three main categories: spiteful, one-off, and kind. Next, we build a model to predict the category of a customer given her first feedback. We use profile and …


Biogeographical Patterns Of Soil Microbial Communities: Ecological, Structural, And Functional Diversity And Their Application To Soil Provenance, Natalie Damaso Oct 2016

Biogeographical Patterns Of Soil Microbial Communities: Ecological, Structural, And Functional Diversity And Their Application To Soil Provenance, Natalie Damaso

FIU Electronic Theses and Dissertations

The current ecological hypothesis states that the soil type (e.g., chemical and physical properties) determines which microbes occupy a particular soil and provides the foundation for soil provenance studies. As human profiles are used to determine a match between evidence from a crime scene and a suspect, a soil microbial profile can be used to determine a match between soil found on the suspect’s shoes or clothing to the soil at a crime scene. However, for a robust tool to be applied in forensic application, an understanding of the uncertainty associated with any comparisons and the parameters that can significantly …


Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan Oct 2016

Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to …


Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii Oct 2016

Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii

Psychology Theses & Dissertations

Mediation and moderated mediation models are two commonly used models for indirect effects analysis. In practice, missing data is a pervasive problem in structural equation modeling with psychological data. Multiple imputation (MI) is one method used to estimate model parameters in the presence of missing data, while accounting for uncertainty due to the missing data. Unfortunately, commonly used MI methods are not equipped to handle categorical variables or nonlinear variables such as interactions. In this study, we introduce a general MI framework that uses the Bayesian bootstrap (BB) method to generate posterior inferences for indirect effects and gradient boosted machine …


A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury Sep 2016

A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury

Dissertations, Theses, and Capstone Projects

In the Evolutionary Computation (EC) research community, a major concern is maintaining optimal levels of population diversity. In the Multiagent Systems (MAS) research community, a major concern is implementing effective agent coordination through various interaction mechanisms. These two concerns coincide when one is faced with Evolutionary Multiagent Systems (EMAS).

This thesis demonstrates a methodology to study the relationship between interaction mechanisms, population diversity, and performance of an evolving multiagent system in a dynamic, real-time, and asynchronous environment. An open sourced extensible experimentation platform is developed that allows plug-ins for evolutionary models, interaction mechanisms, and genotypical encoding schemes beyond the one …


A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan Sep 2016

A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan

Theses and Dissertations

This dissertation addresses a problem found in supervised machine learning (ML) classification, that the target variable, i.e., the variable a classifier predicts, has to be identified before training begins and cannot change during training and testing. This research develops a computational agent, which overcomes this problem. The Qualia Modeling Agent (QMA) is modeled after two cognitive theories: Stanovich's tripartite framework, which proposes learning results from interactions between conscious and unconscious processes; and, the Integrated Information Theory (IIT) of Consciousness, which proposes that the fundamental structural elements of consciousness are qualia. By modeling the informational relationships of qualia, the QMA allows …


Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood Sep 2016

Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood

Theses and Dissertations

We have created a question type that allows teachers to easily create questions, helps provide an intuitive user experience for students to take questions, and reduces the time it currently takes teachers to grade and provide feedback to students. This question type, or an "annotated" question, will allow teachers to test students' knowledge in a particular subject area by having students "annotate" or mark text and video sources to answer questions. Through user testing we determined that overall the interface and the implemented system decrease the time it would take a teacher to grade annotated quiz questions. However, there are …


Soft Confidence-Weighted Learning, Jialei Wang, Peilin Zhao, Hoi, Steven C. H. Sep 2016

Soft Confidence-Weighted Learning, Jialei Wang, Peilin Zhao, Hoi, Steven C. H.

Research Collection School Of Computing and Information Systems

Online learning plays an important role in many big datamining problems because of its high efficiency and scalability. In theliterature, many online learning algorithms using gradient information havebeen applied to solve online classification problems. Recently, more effectivesecond-order algorithms have been proposed, where the correlation between thefeatures is utilized to improve the learning efficiency. Among them,Confidence-Weighted (CW) learning algorithms are very effective, which assumethat the classification model is drawn from a Gaussian distribution, whichenables the model to be effectively updated with the second-order informationof the data stream. Despite being studied actively, these CW algorithms cannothandle nonseparable datasets and noisy datasets very …


Real Time Activity Recognition Of Treadmill Usage Via Machine Learning, Nathan Blank, Matt Buckner, Christian Owen, Anna Scott Aug 2016

Real Time Activity Recognition Of Treadmill Usage Via Machine Learning, Nathan Blank, Matt Buckner, Christian Owen, Anna Scott

Rose-Hulman Undergraduate Research Publications

Our objective is to provide real-time classification of treadmill usage patterns based on accelerometer and magnetometer measurements. We collected data from treadmills in the Rose-Hulman Student Recreation Center (SRC) using Shimmer3 sensor units. We identified useful data features and classifiers for predicting treadmill usage patterns. We also prototyped a proof of concept wireless, real-time classification system.


Prediction And Optimal Scheduling Of Advertisements In Linear Television, Mark J. Panaggio, Pak-Wing Fok, Ghan S. Bhatt, Simon Burhoe, Michael Capps, Christina J. Edholm, Fadoua El Moustaid, Tegan Emerson, Star-Lena Estock, Nathan Gold, Ryan Halabi, Madelyn Houser, Peter R. Kramer, Hsuan-Wei Lee, Qingxia Li, Weiqiang Li, Dan Lu, Yuzhou Qian, Louis F. Rossi, Deborah Shutt, Vicky Chuqiao Yang, Yingxiang Zhou Aug 2016

Prediction And Optimal Scheduling Of Advertisements In Linear Television, Mark J. Panaggio, Pak-Wing Fok, Ghan S. Bhatt, Simon Burhoe, Michael Capps, Christina J. Edholm, Fadoua El Moustaid, Tegan Emerson, Star-Lena Estock, Nathan Gold, Ryan Halabi, Madelyn Houser, Peter R. Kramer, Hsuan-Wei Lee, Qingxia Li, Weiqiang Li, Dan Lu, Yuzhou Qian, Louis F. Rossi, Deborah Shutt, Vicky Chuqiao Yang, Yingxiang Zhou

Mathematical Sciences Faculty Research

Advertising is a crucial component of marketing and an important way for companies to raise awareness of goods and services in the marketplace. Advertising campaigns are designed to convey a marketing image or message to an audience of potential consumers and television commercials can be an effective way of transmitting these messages to a large audience. In order to meet the requirements for a typical advertising order, television content providers must provide advertisers with a predetermined number of "impressions" in the target demographic. However, because the number of impressions for a given program is not known a priori and because …


Classifying Pattern Formation In Materials Via Machine Learning, Lukasz Burzawa, Shuo Liu, Erica W. Carlson Aug 2016

Classifying Pattern Formation In Materials Via Machine Learning, Lukasz Burzawa, Shuo Liu, Erica W. Carlson

The Summer Undergraduate Research Fellowship (SURF) Symposium

Scanning probe experiments such as scanning tunneling microscopy (STM) and atomic force microscopy (AFM) on strongly correlated materials often reveal complex pattern formation that occurs on multiple length scales. We have shown in two disparate correlated materials that the pattern formation is driven by proximity to a disorder-driven critical point. We developed new analysis concepts and techniques that relate the observed pattern formation to critical exponents by analyzing the geometry and statistics of clusters observed in these experiments and converting that information into critical exponents. Machine learning algorithms can be helpful correlating data from scanning probe experiments to theoretical models …


A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen Aug 2016

A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen

Open Access Dissertations

Mobile app poses both traditional and new potential threats to system security and user privacy. There are malicious apps that may do harm to the system, and there are mis-behaviors of apps, which are reasonable and legal when not abused, yet may lead to real threats otherwise. Moreover, due to the nature of mobile apps, a running app in mobile devices may be only part of the software, and the server side behavior is usually not covered by analysis. Therefore, direct analysis on the app itself may be incomplete and additional sources of information are needed. In this dissertation, we …


Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner Aug 2016

Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner

Open Access Dissertations

This thesis introduces a low-bandwidth control architecture for navigating the input space of an un-modeled combustor system between desired operating conditions while avoiding regions of instability and blow-out. An experimental procedure is discussed for identifying regions of instability and gathering sufficient data to build a data-driven model of the system's operating modes. Regions of instability and blow-out are identified experimentally and a data-driven operating point classifier is designed. This classifier acts as a map of the operating space of the combustor, indicating regions in which the flame is in a "good" or "bad" operating mode. A data-driven predictor is also …


Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier Aug 2016

Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier

Open Access Dissertations

Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning …


Lidar-Assisted Extraction Of Old Growth Baldcypress Stands Along The Black River Of North Carolina, Weston Pierce Murch Aug 2016

Lidar-Assisted Extraction Of Old Growth Baldcypress Stands Along The Black River Of North Carolina, Weston Pierce Murch

Graduate Theses and Dissertations

The remnants of ancient baldcypress forests continue to grow across the Southeastern United States. These long lived trees are invaluable for biodiversity along riverine ecosystems, provide habitat to a myriad of animal species, and augment the proxy climate record for North America. While extensive logging of the areas along the Black River in North Carolina has mostly decimated ancient forests of many species including the baldcypress, conservation efforts from The Nature Conservancy and other partners are under way. In order to more efficiently find and study these enduring stands of baldcypress, some of which are estimated to be more than …


Social Sentiment And Stock Trading Via Mobile Phones, Kwansoo Kim, Sang Yong Lee, Robert John Kauffman Aug 2016

Social Sentiment And Stock Trading Via Mobile Phones, Kwansoo Kim, Sang Yong Lee, Robert John Kauffman

Research Collection School Of Computing and Information Systems

What happens when uninformed investors trade stocks via mobile phones? Do they react to social sentiment differently than more informed traders in traditional trading? Based on 16,817 data observations and econometric analysis for the trading of 251 equities in Korea over 39 days, we present evidence of herding behavior among uninformed traders in the mobile channel. The results indicate that mobile traders seem more easily swayed by changing social sentiment. In addition, stock trading in the traditional channel probably influences sentiment formation in the market overall. Mobile traders follow signals in social media suggesting that they engage in less beneficial …


Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li Jul 2016

Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li

Computer Science Theses & Dissertations

Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for efficient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientific discoveries …


Designing Human-Centered Collective Intelligence, Ivor Addo Jul 2016

Designing Human-Centered Collective Intelligence, Ivor Addo

Dissertations (1934 -)

Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting …


Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry Jul 2016

Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry

Computer Science Theses & Dissertations

Understanding how the brain functions and quantifying compound interactions between complex synaptic networks inside the brain remain some of the most challenging problems in neuroscience. Lack or abundance of data, shortage of manpower along with heterogeneity of data following from various species all served as an added complexity to the already perplexing problem. The ability to process vast amount of brain data need to be performed automatically, yet with an accuracy close to manual human-level performance. These automated methods essentially need to generalize well to be able to accommodate data from different species. Also, novel approaches and techniques are becoming …


Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee Jul 2016

Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Research Collection School Of Computing and Information Systems

If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data-which include user "check-ins", types of business, and business locations-to evaluate a user-selected physical location with respect to a type of business. Using a dataset of 20,877 food businesses in Singapore, we conduct analysis of …


Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar Jun 2016

Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar

USF Tampa Graduate Theses and Dissertations

Plants are vital to the health of our biosphere, and effectively sustaining their growth is fundamental to the existence of life on this planet. A critical aspect, which decides the sustainability of plant growth is the quality of soil. All other things being fixed, the quality of soil greatly impacts the plant stress, which in turn impacts overall health. Although plant stress manifests in many ways, one of the clearest indicators are colors of the leaves. In this thesis, we conducted an experimental study in a greenhouse for detecting plant stress caused by nutrient deficienceies in soil using smartphone cameras, …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


Machine Learning For Disease Prediction, Abraham Jacob Frandsen Jun 2016

Machine Learning For Disease Prediction, Abraham Jacob Frandsen

Theses and Dissertations

Millions of people in the United States alone suffer from undiagnosed or late-diagnosed chronic diseases such as Chronic Kidney Disease and Type II Diabetes. Catching these diseases earlier facilitates preventive healthcare interventions, which in turn can lead to tremendous cost savings and improved health outcomes. We develop algorithms for predicting disease occurrence by drawing from ideas and techniques in the field of machine learning. We explore standard classification methods such as logistic regression and random forest, as well as more sophisticated sequence models, including recurrent neural networks. We focus especially on the use of medical code data for disease prediction, …


Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs May 2016

Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs

Theses and Dissertations

NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level.