Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

2016

Machine learning

Discipline
Institution
Publication

Articles 1 - 30 of 39

Full-Text Articles in Physical Sciences and Mathematics

Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez Dec 2016

Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez

Theses and Dissertations

The purpose of this study was to research and develop a way to use machine learning algorithms (MLAs) to predict student achievement on the State of Texas Assessment of Academic Readiness (STAAR), specifically in the charter school setting. Charter schools have the disadvantage of a constant influx in students, so providing historical student data in order to analyze trends proves difficult. This study expands on previous research done on students in secondary and post-secondary school and determining features that indicate success in these settings. The data used is from the district of IDEA Public Schools who focuses on providing education …


Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm Nov 2016

Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm

Computer Science ETDs

Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in …


Learning From Pairwise Proximity Data, Hamid Dadkhahi Nov 2016

Learning From Pairwise Proximity Data, Hamid Dadkhahi

Doctoral Dissertations

In many areas of machine learning, the characterization of the input data is given by a form of proximity measure between data points. Examples of such representations are pairwise differences, pairwise distances, and pairwise comparisons. In this work, we investigate different learning problems on data represented in terms of such pairwise proximities. More specifically, we consider three problems: masking (feature selection) for dimensionality reduction, extension of the dimensionality reduction for time series, and online collaborative filtering. For each of these problems, we start with a form of pairwise proximity which is relevant in the problem at hand. We evaluate the …


Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii Oct 2016

Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii

Psychology Theses & Dissertations

Mediation and moderated mediation models are two commonly used models for indirect effects analysis. In practice, missing data is a pervasive problem in structural equation modeling with psychological data. Multiple imputation (MI) is one method used to estimate model parameters in the presence of missing data, while accounting for uncertainty due to the missing data. Unfortunately, commonly used MI methods are not equipped to handle categorical variables or nonlinear variables such as interactions. In this study, we introduce a general MI framework that uses the Bayesian bootstrap (BB) method to generate posterior inferences for indirect effects and gradient boosted machine …


A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury Sep 2016

A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury

Dissertations, Theses, and Capstone Projects

In the Evolutionary Computation (EC) research community, a major concern is maintaining optimal levels of population diversity. In the Multiagent Systems (MAS) research community, a major concern is implementing effective agent coordination through various interaction mechanisms. These two concerns coincide when one is faced with Evolutionary Multiagent Systems (EMAS).

This thesis demonstrates a methodology to study the relationship between interaction mechanisms, population diversity, and performance of an evolving multiagent system in a dynamic, real-time, and asynchronous environment. An open sourced extensible experimentation platform is developed that allows plug-ins for evolutionary models, interaction mechanisms, and genotypical encoding schemes beyond the one …


A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan Sep 2016

A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan

Theses and Dissertations

This dissertation addresses a problem found in supervised machine learning (ML) classification, that the target variable, i.e., the variable a classifier predicts, has to be identified before training begins and cannot change during training and testing. This research develops a computational agent, which overcomes this problem. The Qualia Modeling Agent (QMA) is modeled after two cognitive theories: Stanovich's tripartite framework, which proposes learning results from interactions between conscious and unconscious processes; and, the Integrated Information Theory (IIT) of Consciousness, which proposes that the fundamental structural elements of consciousness are qualia. By modeling the informational relationships of qualia, the QMA allows …


Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood Sep 2016

Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood

Theses and Dissertations

We have created a question type that allows teachers to easily create questions, helps provide an intuitive user experience for students to take questions, and reduces the time it currently takes teachers to grade and provide feedback to students. This question type, or an "annotated" question, will allow teachers to test students' knowledge in a particular subject area by having students "annotate" or mark text and video sources to answer questions. Through user testing we determined that overall the interface and the implemented system decrease the time it would take a teacher to grade annotated quiz questions. However, there are …


A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen Aug 2016

A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen

Open Access Dissertations

Mobile app poses both traditional and new potential threats to system security and user privacy. There are malicious apps that may do harm to the system, and there are mis-behaviors of apps, which are reasonable and legal when not abused, yet may lead to real threats otherwise. Moreover, due to the nature of mobile apps, a running app in mobile devices may be only part of the software, and the server side behavior is usually not covered by analysis. Therefore, direct analysis on the app itself may be incomplete and additional sources of information are needed. In this dissertation, we …


Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner Aug 2016

Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner

Open Access Dissertations

This thesis introduces a low-bandwidth control architecture for navigating the input space of an un-modeled combustor system between desired operating conditions while avoiding regions of instability and blow-out. An experimental procedure is discussed for identifying regions of instability and gathering sufficient data to build a data-driven model of the system's operating modes. Regions of instability and blow-out are identified experimentally and a data-driven operating point classifier is designed. This classifier acts as a map of the operating space of the combustor, indicating regions in which the flame is in a "good" or "bad" operating mode. A data-driven predictor is also …


Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier Aug 2016

Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier

Open Access Dissertations

Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning …


Lidar-Assisted Extraction Of Old Growth Baldcypress Stands Along The Black River Of North Carolina, Weston Pierce Murch Aug 2016

Lidar-Assisted Extraction Of Old Growth Baldcypress Stands Along The Black River Of North Carolina, Weston Pierce Murch

Graduate Theses and Dissertations

The remnants of ancient baldcypress forests continue to grow across the Southeastern United States. These long lived trees are invaluable for biodiversity along riverine ecosystems, provide habitat to a myriad of animal species, and augment the proxy climate record for North America. While extensive logging of the areas along the Black River in North Carolina has mostly decimated ancient forests of many species including the baldcypress, conservation efforts from The Nature Conservancy and other partners are under way. In order to more efficiently find and study these enduring stands of baldcypress, some of which are estimated to be more than …


Designing Human-Centered Collective Intelligence, Ivor Addo Jul 2016

Designing Human-Centered Collective Intelligence, Ivor Addo

Dissertations (1934 -)

Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting …


Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li Jul 2016

Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li

Computer Science Theses & Dissertations

Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for efficient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientific discoveries …


Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry Jul 2016

Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry

Computer Science Theses & Dissertations

Understanding how the brain functions and quantifying compound interactions between complex synaptic networks inside the brain remain some of the most challenging problems in neuroscience. Lack or abundance of data, shortage of manpower along with heterogeneity of data following from various species all served as an added complexity to the already perplexing problem. The ability to process vast amount of brain data need to be performed automatically, yet with an accuracy close to manual human-level performance. These automated methods essentially need to generalize well to be able to accommodate data from different species. Also, novel approaches and techniques are becoming …


Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar Jun 2016

Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar

USF Tampa Graduate Theses and Dissertations

Plants are vital to the health of our biosphere, and effectively sustaining their growth is fundamental to the existence of life on this planet. A critical aspect, which decides the sustainability of plant growth is the quality of soil. All other things being fixed, the quality of soil greatly impacts the plant stress, which in turn impacts overall health. Although plant stress manifests in many ways, one of the clearest indicators are colors of the leaves. In this thesis, we conducted an experimental study in a greenhouse for detecting plant stress caused by nutrient deficienceies in soil using smartphone cameras, …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


Machine Learning For Disease Prediction, Abraham Jacob Frandsen Jun 2016

Machine Learning For Disease Prediction, Abraham Jacob Frandsen

Theses and Dissertations

Millions of people in the United States alone suffer from undiagnosed or late-diagnosed chronic diseases such as Chronic Kidney Disease and Type II Diabetes. Catching these diseases earlier facilitates preventive healthcare interventions, which in turn can lead to tremendous cost savings and improved health outcomes. We develop algorithms for predicting disease occurrence by drawing from ideas and techniques in the field of machine learning. We explore standard classification methods such as logistic regression and random forest, as well as more sophisticated sequence models, including recurrent neural networks. We focus especially on the use of medical code data for disease prediction, …


Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs May 2016

Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs

Theses and Dissertations

NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level.


An Exercise And Sports Equipment Recognition System, Siddarth Kalra May 2016

An Exercise And Sports Equipment Recognition System, Siddarth Kalra

Electronic Thesis and Dissertation Repository

Most mobile health management applications today require manual input or use sensors like the accelerometer or GPS to record user data. The onboard camera remains underused. We propose an Exercise and Sports Equipment Recognition System (ESRS) that can recognize physical activity equipment from raw image data. This system can be integrated with mobile phones to allow the camera to become a primary input device for recording physical activity. We employ a deep convolutional neural network to train models capable of recognizing 14 different equipment categories. Furthermore, we propose a preprocessing scheme that uses color normalization and denoising techniques to improve …


Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu May 2016

Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu

McKelvey School of Engineering Theses & Dissertations

The long-held assumption of never-ending rapid growth in biotechnology and especially in synthetic biology has been recently questioned, due to lack of substantial return of investment. One of the main reasons for failures in synthetic biology and metabolic engineering is the metabolic burdens that result in resource losses. Metabolic burden is defined as the portion of a host cells resources either energy molecules (e.g., NADH, NADPH and ATP) or carbon building blocks (e.g., amino acids) that is used to maintain the engineered components (e.g., pathways). As a result, the effectiveness of synthetic biology tools heavily dependents on cell capability to …


A General Framework Of Large-Scale Convex Optimization Using Jensen Surrogates And Acceleration Techniques, Soysal Degirmenci May 2016

A General Framework Of Large-Scale Convex Optimization Using Jensen Surrogates And Acceleration Techniques, Soysal Degirmenci

McKelvey School of Engineering Theses & Dissertations

In a world where data rates are growing faster than computing power, algorithmic acceleration based on developments in mathematical optimization plays a crucial role in narrowing the gap between the two. As the scale of optimization problems in many fields is getting larger, we need faster optimization methods that not only work well in theory, but also work well in practice by exploiting underlying state-of-the-art computing technology.

In this document, we introduce a unified framework of large-scale convex optimization using Jensen surrogates, an iterative optimization method that has been used in different fields since the 1970s. After this general treatment, …


Data Driven Sample Generator Model With Application To Classification, Alvaro Emilio Ulloa Cerna May 2016

Data Driven Sample Generator Model With Application To Classification, Alvaro Emilio Ulloa Cerna

Mathematics & Statistics ETDs

Despite the rapidly growing interest, progress in the study of relations between physiological abnormalities and mental disorders is hampered by complexity of the human brain and high costs of data collection. The complexity can be captured by machine learning approaches, but they still may require significant amounts of data. In this thesis, we seek to mitigate the latter challenge by developing a data driven sample generator model for the generation of synthetic realistic training data. Our method greatly improves generalization in classification of schizophrenia patients and healthy controls from their structural magnetic resonance images. A feed forward neural network trained …


A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel May 2016

A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel

Theses and Dissertations

In this paper I will analyze three different algorithms and approaches to implement Question Answering Systems (QA-Systems). I will analyze the efficiency, strengths, and weaknesses of multiple algorithms by explaining them in detail and comparing them with each other. The overarching aim of this thesis is to explore ideas that can be used to create a truly open context QA-System. Open context QA-Systems remain an open problem.

The various algorithms and approaches presented in this work will be focused on complex questions. Complex questions are usually verbose and the context of the question is equally important to answer the query …


Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl May 2016

Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl

Electronic Theses and Dissertations

The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep …


Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho May 2016

Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho

Graduate Theses and Dissertations

Due to the popularity of smart phones and mobile apps, a potential privacy risk with the usage of mobile apps is that, from the usage information of mobile apps (e.g., how many hours a user plays mobile games in each day), private information about a user’s living habits and personal activities can be inferred. To assess this risk, this thesis answers the following research question: can the type of a mobile app (e.g., email, web browsing, mobile game, music streaming, etc.) used by a user be inferred from the resource (e.g., CPU, memory, network, etc.) usage patterns of the mobile …


Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei Apr 2016

Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei

Open Access Theses

Current cyber-infrastructures are facing increasingly stealthy attacks that implant malicious payloads under the cover of benign programs. Current attack detection approaches based on statistical learning methods may generate misleading decision boundaries when processing noisy data with such a mixture of benign and malicious behaviors. On the other hand, attack detection based on formal program analysis may lack completeness or adaptivity when modeling attack behaviors. In light of these limitations, we have developed LEAPS, an attack detection system based on supervised statistical learning to classify benign and malicious system events. Furthermore, we leverage control flow graphs inferred from the system event …


Predicting Changes To Source Code, Justin James Roll Apr 2016

Predicting Changes To Source Code, Justin James Roll

Master's Theses

Organizations typically use issue tracking systems (ITS) such as Jira to plan software releases and assign requirements to developers. Organizations typically also use source control management (SCM) repositories such as Git to track historical changes to a code-base. These ITS and SCM repositories contain valuable data that remains largely untapped. As developers churn through an organization, it becomes expensive for developers to spend time determining which software artifact must be modified to implement a requirement. In this work we created, developed, tested and evaluated a tool called Class Change Predictor, otherwise known as CCP, for predicting which class will implement …


Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta Mar 2016

Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta

Theses and Dissertations

Operator functional state (OFS) in remotely piloted aircraft (RPA) simulations is modeled using electroencephalograph (EEG) physiological data and continuous analytic workload profiles (CAWPs). A framework is proposed that provides solutions to the limitations that stem from lengthy training data collection and labeling techniques associated with generating CAWPs for multiple operators/trials. The framework focuses on the creation of scalable machine learning models using two generalization methods: 1) the stochastic generation of CAWPs and 2) the use of cross-subject physiological training data to calibrate machine learning models. Cross-subject workload models are used to infer OFS on new subjects, reducing the need to …


Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu Mar 2016

Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu

Doctoral Dissertations

This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational …


Vehicle Engine Classification Using Of Laser Vibrometry Feature Extraction, Chi Him Liu Jan 2016

Vehicle Engine Classification Using Of Laser Vibrometry Feature Extraction, Chi Him Liu

Dissertations and Theses

Used as a non-invasive and remote sensor, the laser Doppler vibrometer (LDV) has been used in many different applications, such as inspection of aircrafts, bridge and structure and remote voice acquisition. However, using LDV as a vehicle surveillance device has not been feasible due to the lack of systematic investigations on its behavioral properties. In this thesis, the LDV data from different vehicles are examined and features are extracted. A tone-pitch indexing (TPI) scheme is developed to classify different vehicles by exploiting the engine’s periodic vibrations that are transferred throughout the vehicle’s body. Using the TPI with a two-layer feed-forward …