Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Brigham Young University

Theses and Dissertations

Machine learning

Articles 1 - 30 of 32

Full-Text Articles in Entire DC Network

Flow Adaptive Video Object Segmentation, Fanqing Lin Dec 2018

Flow Adaptive Video Object Segmentation, Fanqing Lin

Theses and Dissertations

We tackle the task of semi-supervised video object segmentation, i.e, pixel-level object classification of the images in video sequences using very limited ground truth training data of its corresponding video. Recently introduced online adaptation of convolutional neural networks for video object segmentation (OnAVOS) has achieved good results by pretraining the network, fine-tuning on the first frame and training the network at test time using its approximate prediction as newly obtained ground truth. We propose Flow Adaptive Video Object Segmentation (FAVOS) that refines the generated adaptive ground truth for online updates and utilizes temporal consistency between video frames with the help …


Toward Real-Time Flip Fluid Simulation Through Machine Learning Approximations, Javid Kennon Pack Dec 2018

Toward Real-Time Flip Fluid Simulation Through Machine Learning Approximations, Javid Kennon Pack

Theses and Dissertations

Fluids in computer generated imagery can add an impressive amount of realism to a scene, but are particularly time-consuming to simulate. In an attempt to run fluid simulations in real-time, recent efforts have attempted to simulate fluids by using machine learning techniques to approximate the movement of fluids. We explore utilizing machine learning to simulate fluids while also integrating the Fluid-Implicit-Particle (FLIP) simulation method into machine learning fluid simulation approaches.


The Ogcleaner: Detecting False-Positive Sequence Homology, Masaki Stanley Fujimoto Jun 2017

The Ogcleaner: Detecting False-Positive Sequence Homology, Masaki Stanley Fujimoto

Theses and Dissertations

Within bioinformatics, phylogenetics is the study of the evolutionary relationships between different species and organisms. The genetic revolution has caused an explosion in the amount of raw genomic information that is available to scientists for study. While there has been an explosion in available data, analysis methods have lagged behind. A key task in phylogenetics is identifying homology clusters. Current methods rely on using heuristics based on pairwise sequence comparison to identify homology clusters. We propose the Orthology Group Cleaner (the OGCleaner) as a method to evaluate cluster level verification of putative homology clusters in order to create higher quality …


Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood Sep 2016

Creating And Automatically Grading Annotated Questions, Alicia Crowder Wood

Theses and Dissertations

We have created a question type that allows teachers to easily create questions, helps provide an intuitive user experience for students to take questions, and reduces the time it currently takes teachers to grade and provide feedback to students. This question type, or an "annotated" question, will allow teachers to test students' knowledge in a particular subject area by having students "annotate" or mark text and video sources to answer questions. Through user testing we determined that overall the interface and the implemented system decrease the time it would take a teacher to grade annotated quiz questions. However, there are …


Feature Identification And Reduction For Improved Generalization Accuracy In Secondary-Structure Prediction Using Temporal Context Inputs In Machine-Learning Models, Matthew Benjamin Seeley May 2015

Feature Identification And Reduction For Improved Generalization Accuracy In Secondary-Structure Prediction Using Temporal Context Inputs In Machine-Learning Models, Matthew Benjamin Seeley

Theses and Dissertations

A protein's properties are influenced by both its amino-acid sequence and its three-dimensional conformation. Ascertaining a protein's sequence is relatively easy using modern techniques, but determining its conformation requires much more expensive and time-consuming techniques. Consequently, it would be useful to identify a method that can accurately predict a protein's secondary-structure conformation using only the protein's sequence data. This problem is not trivial, however, because identical amino-acid subsequences in different contexts sometimes have disparate secondary structures, while highly dissimilar amino-acid subsequences sometimes have identical secondary structures. We propose (1) to develop a set of metrics that facilitates better comparisons between …


Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith Apr 2015

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith

Theses and Dissertations

As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of …


Intelligent Indexing: A Semi-Automated, Trainable System For Field Labeling, Robert T. Clawson Sep 2014

Intelligent Indexing: A Semi-Automated, Trainable System For Field Labeling, Robert T. Clawson

Theses and Dissertations

We present Intelligent Indexing: a general, scalable, collaborative approach to indexing and transcription of non-machine-readable documents that exploits visual consensus and group labeling while harnessing human recognition and domain expertise. In our system, indexers work directly on the page, and with minimal context switching can navigate the page, enter labels, and interact with the recognition engine. Interaction with the recognition engine occurs through preview windows that allow the indexer to quickly verify and correct recommendations. This interaction is far superior to conventional, tedious, inefficient post-correction and editing. Intelligent Indexing is a trainable system that improves over time and can provide …


Musical Motif Discovery In Non-Musical Media, Daniel S. Johnson Jun 2014

Musical Motif Discovery In Non-Musical Media, Daniel S. Johnson

Theses and Dissertations

Many music composition algorithms attempt to compose music in a particular style. The resulting music is often impressive and indistinguishable from the style of the training data, but it tends to lack significant innovation. In an effort to increase innovation in the selection of pitches and rhythms, we present a system that discovers musical motifs by coupling machine learning techniques with an inspirational component. The inspirational component allows for the discovery of musical motifs that are unlikely to be produced by a generative model, while the machine learning component harnesses innovation. Candidate motifs are extracted from non-musical media such as …


Ensemble Methods For Historical Machine-Printed Document Recognition, William B. Lund Apr 2014

Ensemble Methods For Historical Machine-Printed Document Recognition, William B. Lund

Theses and Dissertations

The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document …


Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel Aug 2013

Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel

Theses and Dissertations

Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We …


Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen Apr 2013

Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen

Theses and Dissertations

Latent Dirichlet Allocation (LDA) is widely used for automatic discovery of latent topics in document corpora. However, output from analysis using an LDA topic model suffers from a lack of identifiability between topics not only across corpora, but across runs of the algorithm. The output is also isolated from enriching information from knowledge sources such as Wikipedia and is difficult for humans to interpret due to a lack of meaningful topic labels. This thesis introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). LDA-STWD …


Bayesian Test Analytics For Document Collections, Daniel David Walker Nov 2012

Bayesian Test Analytics For Document Collections, Daniel David Walker

Theses and Dissertations

Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved …


A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson Aug 2012

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …


Practical Improvements In Applied Spectral Learning, Adam C. Drake Jun 2010

Practical Improvements In Applied Spectral Learning, Adam C. Drake

Theses and Dissertations

Spectral learning algorithms, which learn an unknown function by learning a spectral representation of the function, have been widely used in computational learning theory to prove many interesting learnability results. These algorithms have also been successfully used in real-world applications. However, previous work has left open many questions about how to best use these methods in real-world learning scenarios. This dissertation presents several significant advances in real-world spectral learning. It presents new algorithms for finding large spectral coefficients (a key sub-problem in spectral learning) that allow spectral learning methods to be applied to much larger problems and to a wider …


Transformation Learning: Modeling Transferable Transformations In High-Dimensional Data, Christopher R. Wilson May 2010

Transformation Learning: Modeling Transferable Transformations In High-Dimensional Data, Christopher R. Wilson

Theses and Dissertations

The goal of learning transfer is to apply knowledge gained from one problem to a separate related problem. Transformation learning is a proposed approach to computational learning transfer that focuses on modeling high-level transformations that are well suited for transfer. By using a high-level representation of transferable data, transformation learning facilitates both shallow transfer (intra-domain) and deep transfer (inter-domain) scenarios. Transformations can be discovered in data using manifold learning to order data instances according to the transformations they represent. For high-dimensional data representable with coordinate systems, such as images and sounds, data instances can be decomposed into small sub-instances based …


A Bayesian Decision Theoretical Approach To Supervised Learning, Selective Sampling, And Empirical Function Optimization, James Lamond Carroll Mar 2010

A Bayesian Decision Theoretical Approach To Supervised Learning, Selective Sampling, And Empirical Function Optimization, James Lamond Carroll

Theses and Dissertations

Many have used the principles of statistics and Bayesian decision theory to model specific learning problems. It is less common to see models of the processes of learning in general. One exception is the model of the supervised learning process known as the "Extended Bayesian Formalism" or EBF. This model is descriptive, in that it can describe and compare learning algorithms. Thus the EBF is capable of modeling both effective and ineffective learning algorithms. We extend the EBF to model un-supervised learning, semi-supervised learning, supervised learning, and empirical function optimization. We also generalize the utility model of the EBF to …


Noninvasive Estimation Of Pulmonary Artery Pressure Using Heart Sound Analysis, Aaron W. Dennis Dec 2009

Noninvasive Estimation Of Pulmonary Artery Pressure Using Heart Sound Analysis, Aaron W. Dennis

Theses and Dissertations

Right-heart catheterization is the most accurate method for estimating pulmonary artery pressure (PAP). Because it is an invasive procedure it is expensive, exposes patients to the risk of infection, and is not suited for long-term monitoring situations. Medical researchers have shown that PAP influences the characteristics of heart sounds. This suggests that heart sound analysis is a potential noninvasive solution to the PAP estimation problem. This thesis describes the development of a prototype system, called PAPEr, which estimates PAP noninvasively using heart sound analysis. PAPEr uses patient data with machine learning algorithms to build models of how PAP affects heart …


Real-Time Automatic Price Prediction For Ebay Online Trading, Ilya Igorevitch Raykhel Nov 2008

Real-Time Automatic Price Prediction For Ebay Online Trading, Ilya Igorevitch Raykhel

Theses and Dissertations

While Machine Learning is one of the most popular research areas in Computer Science, there are still only a few deployed applications intended for use by the general public. We have developed an exemplary application that can be directly applied to eBay trading. Our system predicts how much an item would sell for on eBay based on that item's attributes. We ran our experiments on the eBay laptop category, with prior trades used as training data. The system implements a feature-weighted k-Nearest Neighbor algorithm, using genetic algorithms to determine feature weights. Our results demonstrate an average prediction error of 16%; …


Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton Mar 2008

Improving Liquid State Machines Through Iterative Refinement Of The Reservoir, R David Norton

Theses and Dissertations

Liquid State Machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, a reservoir, or liquid, is randomly created which acts as a filter for a readout function. We develop three methods for iteratively refining a randomly generated liquid to create a more effective one. First, we apply Hebbian learning to LSMs by building the liquid with spike-time dependant plasticity (STDP) synapses. Second, we create an eligibility based reinforcement learning algorithm for synaptic development. Third, we apply principles of Hebbian learning and reinforcement learning to create a new algorithm called separation driven synaptic modification …


A Direct Algorithm For The K-Nearest-Neighbor Classifier Via Local Warping Of The Distance Metric, Tohkoon Neo Nov 2007

A Direct Algorithm For The K-Nearest-Neighbor Classifier Via Local Warping Of The Distance Metric, Tohkoon Neo

Theses and Dissertations

The k-nearest neighbor (k-NN) pattern classifier is a simple yet effective learner. However, it has a few drawbacks, one of which is the large model size. There are a number of algorithms that are able to condense the model size of the k-NN classifier at the expense of accuracy. Boosting is therefore desirable for increasing the accuracy of these condensed models. Unfortunately, there does not exist a boosting algorithm that works well with k-NN directly. We present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted …


Heuristic Weighted Voting, Kristine Perry Monteith Oct 2007

Heuristic Weighted Voting, Kristine Perry Monteith

Theses and Dissertations

Selecting an effective method for combining the votes of classifiers in an ensemble can have a significant impact on the overall classification accuracy an ensemble is able to achieve. With some methods, the ensemble cannot even achieve as high a classification accuracy as the most accurate individual classifying component. To address this issue, we present the strategy of Heuristic Weighted Voting, a technique that uses heuristics to determine the confidence that a classifier has in its predictions on an instance by instance basis. Using these heuristics to weight the votes in an ensemble results in an overall average increase in …


Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook Sep 2007

Limitations And Extensions Of The Wolf-Phc Algorithm, Philip R. Cook

Theses and Dissertations

Policy Hill Climbing (PHC) is a reinforcement learning algorithm that extends Q-learning to learn probabilistic policies for multi-agent games. WoLF-PHC extends PHC with the "win or learn fast" principle. A proof that PHC will diverge in self-play when playing Shapley's game is given, and WoLF-PHC is shown empirically to diverge as well. Various WoLF-PHC based modifications were created, evaluated, and compared in an attempt to obtain convergence to the single shot Nash equilibrium when playing Shapley's game in self-play without using more information than WoLF-PHC uses. Partial Commitment WoLF-PHC (PCWoLF-PHC), which performs best on Shapley's game, is tested on other …


Improving Neural Network Classification Training, Michael Edwin Rimer Sep 2007

Improving Neural Network Classification Training, Michael Edwin Rimer

Theses and Dissertations

The following work presents a new set of general methods for improving neural network accuracy on classification tasks, grouped under the label of classification-based methods. The central theme of these approaches is to provide problem representations and error functions that more directly improve classification accuracy than conventional learning and error functions. The CB1 algorithm attempts to maximize classification accuracy by selectively backpropagating error only on misclassified training patterns. CB2 incorporates a sliding error threshold to the CB1 algorithm, interpolating between the behavior of CB1 and standard error backpropagation as training progresses in order to avoid prematurely saturated network weights. CB3 …


Obstacle Avoidance And Path Traversal Using Interactive Machine Learning, Jonathan M. Turner Jul 2007

Obstacle Avoidance And Path Traversal Using Interactive Machine Learning, Jonathan M. Turner

Theses and Dissertations

Recently there has been a growing interest in using robots in activities that are dangerous or cost prohibitive for humans to do. Such activities include military uses and space exploration. While robotic hardware is often capable of being used in these types of situations, the ability of human operators to control robots in an effective manner is often limited. This deficiency is often related to the control interface of the robot and the level of autonomy that control system affords the human operator. This thesis describes a robot control system, called the safe/unsafe system, which gives a human operator the …


Cognitive And Behavioral Model Ensembles For Autonomous Virtual Characters, Jeffrey S. Whiting Jun 2007

Cognitive And Behavioral Model Ensembles For Autonomous Virtual Characters, Jeffrey S. Whiting

Theses and Dissertations

Cognitive and behavioral models have become popular methods to create autonomous self-animating characters. Creating these models presents the following challenges: (1) Creating a cognitive or behavioral model is a time intensive and complex process that must be done by an expert programmer (2) The models are created to solve a specific problem in a given environment and because of their specific nature cannot be easily reused. Combining existing models together would allow an animator, without the need of a programmer, to create new characters in less time and would be able to leverage each model's strengths to increase the character's …


Learning In Short-Time Horizons With Measurable Costs, Patrick Bowen Mullen Nov 2006

Learning In Short-Time Horizons With Measurable Costs, Patrick Bowen Mullen

Theses and Dissertations

Dynamic pricing is a difficult problem for machine learning. The environment is noisy, dynamic and has a measurable cost associated with exploration that necessitates that learning be done in short-time horizons. These short-time horizons force the learning algorithms to make pricing decisions based on scarce data. In this work, various machine learning algorithms are compared in the context of dynamic pricing. These algorithms include the Kalman filter, artificial neural networks, particle swarm optimization and genetic algorithms. The majority of these algorithms have been modified to handle the pricing problem. The results show that these adaptations allow the learning algorithms to …


Temporal Data Mining In A Dynamic Feature Space, Brent K. Wenerstrom May 2006

Temporal Data Mining In A Dynamic Feature Space, Brent K. Wenerstrom

Theses and Dissertations

Many interesting real-world applications for temporal data mining are hindered by concept drift. One particular form of concept drift is characterized by changes to the underlying feature space. Seemingly little has been done to address this issue. This thesis presents FAE, an incremental ensemble approach to mining data subject to concept drift. FAE achieves better accuracies over four large datasets when compared with a similar incremental learning algorithm.


Learning Real-World Problems By Finding Correlated Basis Functions, Adam C. Drake Mar 2006

Learning Real-World Problems By Finding Correlated Basis Functions, Adam C. Drake

Theses and Dissertations

Learning algorithms based on the Fourier transform attempt to learn functions by approximating the largest coefficients of their Fourier representations. Nearly all previous work in Fourier-based learning has been in the theoretical realm, where properties of the transform have made it possible to prove many interesting learnability results. The real-world usefulness of Fourier-based methods, however, has not been thoroughly explored. This thesis explores methods for the practical application of Fourier-based learning. The primary contribution of this thesis is a new search algorithm for finding the largest coefficients of a function's Fourier representation. Although the search space is exponentially large, empirical …


Surface Realization Using A Featurized Syntactic Statistical Language Model, Thomas L. Packer Mar 2006

Surface Realization Using A Featurized Syntactic Statistical Language Model, Thomas L. Packer

Theses and Dissertations

An important challenge in natural language surface realization is the generation of grammatical sentences from incomplete sentence plans. Realization can be broken into a two-stage process consisting of an over-generating rule-based module followed by a ranker that outputs the most probable candidate sentence based on a statistical language model. Thus far, an n-gram language model has been evaluated in this context. More sophisticated syntactic knowledge is expected to improve such a ranker. In this thesis, a new language model based on featurized functional dependency syntax was developed and evaluated. Generation accuracies and cross-entropy for the new language model did not …


Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein Apr 2005

Improving And Extending Behavioral Animation Through Machine Learning, Jonathan J. Dinerstein

Theses and Dissertations

Behavioral animation has become popular for creating virtual characters that are autonomous agents and thus self-animating. This is useful for lessening the workload of human animators, populating virtual environments with interactive agents, etc. Unfortunately, current behavioral animation techniques suffer from three key problems: (1) deliberative behavioral models (i.e., cognitive models) are slow to execute; (2) interactive virtual characters cannot adapt online due to interaction with a human user; (3) programming of behavioral models is a difficult and time-intensive process. This dissertation presents a collection of papers that seek to overcome each of these problems. Specifically, these issues are alleviated …