Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Theses/Dissertations

2016

Machine learning

Institution
Publication
File Type

Articles 1 - 30 of 36

Full-Text Articles in Entire DC Network

Integrative Approaches For Large-Scale Biomedical Data Analysis, Ashis Kumer Biswas Dec 2016

Integrative Approaches For Large-Scale Biomedical Data Analysis, Ashis Kumer Biswas

Computer Science and Engineering Dissertations

Advancement of the Next Generation Sequencing (NGS), also known as the High Throughput Sequencing (HTS) technologies allow researchers investigate genome, transcriptome, or epigenome of any organism from any perspective, thereby contributing to the enrichment of the biomedical data repositories for many of the lesser known phenomena. The regulatory activities inside genome by the non-coding RNAs (ncRNAs), the transcribed product of the long-neglected "junk DNA" molecules is one such phenomenon. While large-scale data about the ncRNAs are becoming publicly available, the computational challenges are being imposed to the bioinformaticians for efficient mining to get reliable answers to few subtle questions. Given …


Machine Learning Based Datacenter Monitoring Framework, Ravneet Singh Sidhu Dec 2016

Machine Learning Based Datacenter Monitoring Framework, Ravneet Singh Sidhu

Computer Science and Engineering Theses

Monitoring the health of large data centers is a major concern with the ever-increasing demand of grid/cloud computing and the higher need of computational power. In a High Performance Computing (HPC) environment, the need to maintain high availability makes monitoring tasks and hardware more daunting and demanding. As data centers grow it becomes hard to manage the complex interactions between different systems. Many open source systems have been implemented which give specific state of any individual machine using Nagios, Ganglia or Torque monitoring software. In this work we focus on the detection and prediction of data center anomalies by using …


Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez Dec 2016

Using Machine Learning To Predict Student Achievement On The State Of Texas Assessment Of Academic Readiness Examination In Charter Schools, Christopher D. Gonzalez

Theses and Dissertations

The purpose of this study was to research and develop a way to use machine learning algorithms (MLAs) to predict student achievement on the State of Texas Assessment of Academic Readiness (STAAR), specifically in the charter school setting. Charter schools have the disadvantage of a constant influx in students, so providing historical student data in order to analyze trends proves difficult. This study expands on previous research done on students in secondary and post-secondary school and determining features that indicate success in these settings. The data used is from the district of IDEA Public Schools who focuses on providing education …


Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm Nov 2016

Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm

Computer Science ETDs

Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in …


Learning From Pairwise Proximity Data, Hamid Dadkhahi Nov 2016

Learning From Pairwise Proximity Data, Hamid Dadkhahi

Doctoral Dissertations

In many areas of machine learning, the characterization of the input data is given by a form of proximity measure between data points. Examples of such representations are pairwise differences, pairwise distances, and pairwise comparisons. In this work, we investigate different learning problems on data represented in terms of such pairwise proximities. More specifically, we consider three problems: masking (feature selection) for dimensionality reduction, extension of the dimensionality reduction for time series, and online collaborative filtering. For each of these problems, we start with a form of pairwise proximity which is relevant in the problem at hand. We evaluate the …


Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii Oct 2016

Multiple Imputation Of Missing Data In Structural Equation Models With Mediators And Moderators Using Gradient Boosted Machine Learning, Robert J. Milletich Ii

Psychology Theses & Dissertations

Mediation and moderated mediation models are two commonly used models for indirect effects analysis. In practice, missing data is a pervasive problem in structural equation modeling with psychological data. Multiple imputation (MI) is one method used to estimate model parameters in the presence of missing data, while accounting for uncertainty due to the missing data. Unfortunately, commonly used MI methods are not equipped to handle categorical variables or nonlinear variables such as interactions. In this study, we introduce a general MI framework that uses the Bayesian bootstrap (BB) method to generate posterior inferences for indirect effects and gradient boosted machine …


A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury Sep 2016

A Study Of The Impact Of Interaction Mechanisms And Population Diversity In Evolutionary Multiagent Systems, Sadat U. Chowdhury

Dissertations, Theses, and Capstone Projects

In the Evolutionary Computation (EC) research community, a major concern is maintaining optimal levels of population diversity. In the Multiagent Systems (MAS) research community, a major concern is implementing effective agent coordination through various interaction mechanisms. These two concerns coincide when one is faced with Evolutionary Multiagent Systems (EMAS).

This thesis demonstrates a methodology to study the relationship between interaction mechanisms, population diversity, and performance of an evolving multiagent system in a dynamic, real-time, and asynchronous environment. An open sourced extensible experimentation platform is developed that allows plug-ins for evolutionary models, interaction mechanisms, and genotypical encoding schemes beyond the one …


A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan Sep 2016

A Novel Machine Learning Classifier Based On A Qualia Modeling Agent (Qma), Sandra L. Vaughan

Theses and Dissertations

This dissertation addresses a problem found in supervised machine learning (ML) classification, that the target variable, i.e., the variable a classifier predicts, has to be identified before training begins and cannot change during training and testing. This research develops a computational agent, which overcomes this problem. The Qualia Modeling Agent (QMA) is modeled after two cognitive theories: Stanovich's tripartite framework, which proposes learning results from interactions between conscious and unconscious processes; and, the Integrated Information Theory (IIT) of Consciousness, which proposes that the fundamental structural elements of consciousness are qualia. By modeling the informational relationships of qualia, the QMA allows …


Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood Sep 2016

Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood

Theses and Dissertations

We have created a question type that allows teachers to easily create questions, helps provide an intuitive user experience for students to take questions, and reduces the time it currently takes teachers to grade and provide feedback to students. This question type, or an "annotated" question, will allow teachers to test students' knowledge in a particular subject area by having students "annotate" or mark text and video sources to answer questions. Through user testing we determined that overall the interface and the implemented system decrease the time it would take a teacher to grade annotated quiz questions. However, there are …


A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen Aug 2016

A Study Of Security Issues Of Mobile Apps In The Android Platform Using Machine Learning Approaches, Lei Cen

Open Access Dissertations

Mobile app poses both traditional and new potential threats to system security and user privacy. There are malicious apps that may do harm to the system, and there are mis-behaviors of apps, which are reasonable and legal when not abused, yet may lead to real threats otherwise. Moreover, due to the nature of mobile apps, a running app in mobile devices may be only part of the software, and the server side behavior is usually not covered by analysis. Therefore, direct analysis on the app itself may be incomplete and additional sources of information are needed. In this dissertation, we …


Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner Aug 2016

Data Driven Low-Bandwidth Intelligent Control Of A Jet Engine Combustor, Nathan L. Toner

Open Access Dissertations

This thesis introduces a low-bandwidth control architecture for navigating the input space of an un-modeled combustor system between desired operating conditions while avoiding regions of instability and blow-out. An experimental procedure is discussed for identifying regions of instability and gathering sufficient data to build a data-driven model of the system's operating modes. Regions of instability and blow-out are identified experimentally and a data-driven operating point classifier is designed. This classifier acts as a map of the operating space of the combustor, indicating regions in which the flame is in a "good" or "bad" operating mode. A data-driven predictor is also …


Personalization And Data Relation Exploration Using Predictive Analytics For The Production And Distributed Analysis System (Panda), Mikhail Titov Aug 2016

Personalization And Data Relation Exploration Using Predictive Analytics For The Production And Distributed Analysis System (Panda), Mikhail Titov

Computer Science and Engineering Dissertations

Efficient data distribution among computing centers is one of the biggest challenges in large-scale scientific distributed computing systems. Such data distribution issues include: i) the rational utilization of storage and computing resources, ii) the minimization of the completion time for data processing (which requires a reduction in redundant data transfers, and intelligent allocation of processing tasks), and iii) user experience enhancement, i.e., availability and fast access to the desired data, and discovery of new relevant data. In the literature and in practice, there have been significant new approaches to the improvement of workflow management to address the above described issues, …


Designing Human-Centered Collective Intelligence, Ivor Addo Jul 2016

Designing Human-Centered Collective Intelligence, Ivor Addo

Dissertations (1934 -)

Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting …


Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li Jul 2016

Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li

Computer Science Theses & Dissertations

Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for efficient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientific discoveries …


Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry Jul 2016

Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry

Computer Science Theses & Dissertations

Understanding how the brain functions and quantifying compound interactions between complex synaptic networks inside the brain remain some of the most challenging problems in neuroscience. Lack or abundance of data, shortage of manpower along with heterogeneity of data following from various species all served as an added complexity to the already perplexing problem. The ability to process vast amount of brain data need to be performed automatically, yet with an accuracy close to manual human-level performance. These automated methods essentially need to generalize well to be able to accommodate data from different species. Also, novel approaches and techniques are becoming …


Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar Jun 2016

Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar

USF Tampa Graduate Theses and Dissertations

Plants are vital to the health of our biosphere, and effectively sustaining their growth is fundamental to the existence of life on this planet. A critical aspect, which decides the sustainability of plant growth is the quality of soil. All other things being fixed, the quality of soil greatly impacts the plant stress, which in turn impacts overall health. Although plant stress manifests in many ways, one of the clearest indicators are colors of the leaves. In this thesis, we conducted an experimental study in a greenhouse for detecting plant stress caused by nutrient deficienceies in soil using smartphone cameras, …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs May 2016

Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs

Theses and Dissertations

NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level.


An Exercise And Sports Equipment Recognition System, Siddarth Kalra May 2016

An Exercise And Sports Equipment Recognition System, Siddarth Kalra

Electronic Thesis and Dissertation Repository

Most mobile health management applications today require manual input or use sensors like the accelerometer or GPS to record user data. The onboard camera remains underused. We propose an Exercise and Sports Equipment Recognition System (ESRS) that can recognize physical activity equipment from raw image data. This system can be integrated with mobile phones to allow the camera to become a primary input device for recording physical activity. We employ a deep convolutional neural network to train models capable of recognizing 14 different equipment categories. Furthermore, we propose a preprocessing scheme that uses color normalization and denoising techniques to improve …


Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu May 2016

Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu

McKelvey School of Engineering Theses & Dissertations

The long-held assumption of never-ending rapid growth in biotechnology and especially in synthetic biology has been recently questioned, due to lack of substantial return of investment. One of the main reasons for failures in synthetic biology and metabolic engineering is the metabolic burdens that result in resource losses. Metabolic burden is defined as the portion of a host cells resources either energy molecules (e.g., NADH, NADPH and ATP) or carbon building blocks (e.g., amino acids) that is used to maintain the engineered components (e.g., pathways). As a result, the effectiveness of synthetic biology tools heavily dependents on cell capability to …


Comparison Of Machine Learning Algorithms In Suggesting Candidate Edges To Construct A Query On Heterogeneous Graphs, Rohit Ravi Kumar Bhoopalam May 2016

Comparison Of Machine Learning Algorithms In Suggesting Candidate Edges To Construct A Query On Heterogeneous Graphs, Rohit Ravi Kumar Bhoopalam

Computer Science and Engineering Theses

Querying graph data can be difficult as it requires the user to have knowledge of the underlying schema and the query language. Visual query builders allow users to formulate the intended query by drawing nodes and edges of the query graph, which can be translated into a database query. Visual query builders help users formulate the query without requiring the user to have knowledge of the query language and the underlying schema. To the best of our knowledge, none of the currently available visual query builders suggest users what nodes/edges to include into their query graph. We provide suggestions to …


Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho May 2016

Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho

Graduate Theses and Dissertations

Due to the popularity of smart phones and mobile apps, a potential privacy risk with the usage of mobile apps is that, from the usage information of mobile apps (e.g., how many hours a user plays mobile games in each day), private information about a user’s living habits and personal activities can be inferred. To assess this risk, this thesis answers the following research question: can the type of a mobile app (e.g., email, web browsing, mobile game, music streaming, etc.) used by a user be inferred from the resource (e.g., CPU, memory, network, etc.) usage patterns of the mobile …


A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel May 2016

A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel

Theses and Dissertations

In this paper I will analyze three different algorithms and approaches to implement Question Answering Systems (QA-Systems). I will analyze the efficiency, strengths, and weaknesses of multiple algorithms by explaining them in detail and comparing them with each other. The overarching aim of this thesis is to explore ideas that can be used to create a truly open context QA-System. Open context QA-Systems remain an open problem.

The various algorithms and approaches presented in this work will be focused on complex questions. Complex questions are usually verbose and the context of the question is equally important to answer the query …


Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl May 2016

Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl

Electronic Theses and Dissertations

The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep …


Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei Apr 2016

Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei

Open Access Theses

Current cyber-infrastructures are facing increasingly stealthy attacks that implant malicious payloads under the cover of benign programs. Current attack detection approaches based on statistical learning methods may generate misleading decision boundaries when processing noisy data with such a mixture of benign and malicious behaviors. On the other hand, attack detection based on formal program analysis may lack completeness or adaptivity when modeling attack behaviors. In light of these limitations, we have developed LEAPS, an attack detection system based on supervised statistical learning to classify benign and malicious system events. Furthermore, we leverage control flow graphs inferred from the system event …


Predicting Changes To Source Code, Justin James Roll Apr 2016

Predicting Changes To Source Code, Justin James Roll

Master's Theses

Organizations typically use issue tracking systems (ITS) such as Jira to plan software releases and assign requirements to developers. Organizations typically also use source control management (SCM) repositories such as Git to track historical changes to a code-base. These ITS and SCM repositories contain valuable data that remains largely untapped. As developers churn through an organization, it becomes expensive for developers to spend time determining which software artifact must be modified to implement a requirement. In this work we created, developed, tested and evaluated a tool called Class Change Predictor, otherwise known as CCP, for predicting which class will implement …


Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta Mar 2016

Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta

Theses and Dissertations

Operator functional state (OFS) in remotely piloted aircraft (RPA) simulations is modeled using electroencephalograph (EEG) physiological data and continuous analytic workload profiles (CAWPs). A framework is proposed that provides solutions to the limitations that stem from lengthy training data collection and labeling techniques associated with generating CAWPs for multiple operators/trials. The framework focuses on the creation of scalable machine learning models using two generalization methods: 1) the stochastic generation of CAWPs and 2) the use of cross-subject physiological training data to calibrate machine learning models. Cross-subject workload models are used to infer OFS on new subjects, reducing the need to …


Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu Mar 2016

Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu

Doctoral Dissertations

This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational …


Vehicle Engine Classification Using Of Laser Vibrometry Feature Extraction, Chi Him Liu Jan 2016

Vehicle Engine Classification Using Of Laser Vibrometry Feature Extraction, Chi Him Liu

Dissertations and Theses

Used as a non-invasive and remote sensor, the laser Doppler vibrometer (LDV) has been used in many different applications, such as inspection of aircrafts, bridge and structure and remote voice acquisition. However, using LDV as a vehicle surveillance device has not been feasible due to the lack of systematic investigations on its behavioral properties. In this thesis, the LDV data from different vehicles are examined and features are extracted. A tone-pitch indexing (TPI) scheme is developed to classify different vehicles by exploiting the engine’s periodic vibrations that are transferred throughout the vehicle’s body. Using the TPI with a two-layer feed-forward …


Greenc5: An Adaptive, Energy-Aware Collection For Green Software Development, Junya Michanan Jan 2016

Greenc5: An Adaptive, Energy-Aware Collection For Green Software Development, Junya Michanan

Electronic Theses and Dissertations

Dynamic data structures in software applications have been shown to have a large impact on system performance. In this paper, we explore energy saving opportunities of interface-based dynamic data structures. Our results suggest that savings opportunities exist in the C5 Collection between 16.95% and 97.50%. We propose a prototype and architecture for creating adaptive green data structures by applying machine learning tools to build a model for predicting energy efficient data structures based on the dynamic workload. Our neural network model can classify energy efficient data structures based on features such as the number of elements, frequency of operations, interface …