Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Theses/Dissertations

2017

Machine Learning

Discipline
Institution
Publication

Articles 1 - 30 of 31

Full-Text Articles in Computer Sciences

Knowledge Driven Approaches And Machine Learning Improve The Identification Of Clinically Relevant Somatic Mutations In Cancer Genomics, Benjamin John Ainscough Dec 2017

Knowledge Driven Approaches And Machine Learning Improve The Identification Of Clinically Relevant Somatic Mutations In Cancer Genomics, Benjamin John Ainscough

Arts & Sciences Electronic Theses and Dissertations

For cancer genomics to fully expand its utility from research discovery to clinical adoption, somatic variant detection pipelines must be optimized and standardized to ensure identification of clinically relevant mutations and to reduce laborious and error-prone post-processing steps. To address the need for improved catalogues of clinically and biologically important somatic mutations, we developed DoCM, a Database of Curated Mutations in Cancer (http://docm.info), as described in Chapter 2. DoCM is an open source, openly licensed resource to enable the cancer research community to aggregate, store and track biologically and clinically important cancer variants. DoCM is currently comprised of 1,364 variants …


Threshold Free Detection Of Elliptical Landmarks Using Machine Learning, Lifan Zhang Dec 2017

Threshold Free Detection Of Elliptical Landmarks Using Machine Learning, Lifan Zhang

Theses and Dissertations

Elliptical shape detection is widely used in practical applications. Nearly all classical ellipse detection algorithms require some form of threshold, which can be a major cause of detection failure, especially in the challenging case of Moire Phase Tracking (MPT) target images. To meet the challenge, a threshold free detection algorithm for elliptical landmarks is proposed in this thesis. The proposed Aligned Gradient and Unaligned Gradient (AGUG) algorithm is a Support Vector Machine (SVM)-based classification algorithm, original features are extracted from the gradient information corresponding to the sampled pixels. with proper selection of features, the proposed algorithm has a high accuracy …


Developing Leading And Lagging Indicators To Enhance Equipment Reliability In A Lean System, Dhanush Agara Mallesh Dec 2017

Developing Leading And Lagging Indicators To Enhance Equipment Reliability In A Lean System, Dhanush Agara Mallesh

Masters Theses

With increasing complexity in equipment, the failure rates are becoming a critical metric due to the unplanned maintenance in a production environment. Unplanned maintenance in manufacturing process is created issues with downtimes and decreasing the reliability of equipment. Failures in equipment have resulted in the loss of revenue to organizations encouraging maintenance practitioners to analyze ways to change unplanned to planned maintenance. Efficient failure prediction models are being developed to learn about the failures in advance. With this information, failures predicted can reduce the downtimes in the system and improve the throughput.

The goal of this thesis is to predict …


Graph-Based Latent Embedding, Annotation And Representation Learning In Neural Networks For Semi-Supervised And Unsupervised Settings, Ismail Ozsel Kilinc Nov 2017

Graph-Based Latent Embedding, Annotation And Representation Learning In Neural Networks For Semi-Supervised And Unsupervised Settings, Ismail Ozsel Kilinc

USF Tampa Graduate Theses and Dissertations

Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called …


Adaft: A Resource-Efficient Framework For Adaptive Fault-Tolerance In Cyber-Physical Systems, Ye Xu Nov 2017

Adaft: A Resource-Efficient Framework For Adaptive Fault-Tolerance In Cyber-Physical Systems, Ye Xu

Doctoral Dissertations

Cyber-physical systems frequently have to use massive redundancy to meet application requirements for high reliability. While such redundancy is required, it can be activated adaptively, based on the current state of the controlled plant. Most of the time the physical plant is in a state that allows for a lower level of fault-tolerance. Avoiding the continuous deployment of massive fault-tolerance will greatly reduce the workload of CPSs. In this dissertation, we demonstrate a software simulation framework (AdaFT) that can automatically generate the sub-spaces within which our adaptive fault-tolerance can be applied. We also show the theoretical benefits of AdaFT, and …


Automatic Music Transcription With Convolutional Neural Networks Using Intuitive Filter Shapes, Jonathan Sleep Oct 2017

Automatic Music Transcription With Convolutional Neural Networks Using Intuitive Filter Shapes, Jonathan Sleep

Master's Theses

This thesis explores the challenge of automatic music transcription with a combination of digital signal processing and machine learning methods. Automatic music transcription is important for musicians who can't do it themselves or find it tedious. We start with an existing model, designed by Sigtia, Benetos and Dixon, and develop it in a number of original ways. We find that by using convolutional neural networks with filter shapes more tailored for spectrogram data, we see better and faster transcription results when evaluating the new model on a dataset of classical piano music. We also find that employing better practices shows …


Lung Ct Radiomics: An Overview Of Using Images As Data, Samuel Hunt Hawkins Sep 2017

Lung Ct Radiomics: An Overview Of Using Images As Data, Samuel Hunt Hawkins

USF Tampa Graduate Theses and Dissertations

Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Early detection of lung cancer can help improve patient outcomes, and survival prediction can inform plans of treatment. By extracting quantitative features from computed tomography scans of lung cancer, predictive models can be built that can achieve both early detection and survival prediction. To build these predictive models, first a detected lung nodule is segmented, then image features are extracted, and finally a model can be built utilizing image features to make predictions. These predictions can help radiologists improve cancer care.

Building predictive models based …


Information Theoretic Study Of Gaussian Graphical Models And Their Applications, Ali Moharrer Aug 2017

Information Theoretic Study Of Gaussian Graphical Models And Their Applications, Ali Moharrer

LSU Doctoral Dissertations

In many problems we are dealing with characterizing a behavior of a complex stochastic system or its response to a set of particular inputs. Such problems span over several topics such as machine learning, complex networks, e.g., social or communication networks; biology, etc. Probabilistic graphical models (PGMs) are powerful tools that offer a compact modeling of complex systems. They are designed to capture the random behavior, i.e., the joint distribution of the system to the best possible accuracy. Our goal is to study certain algebraic and topological properties of a special class of graphical models, known as Gaussian graphs. First, …


Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal Aug 2017

Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal

University of New Orleans Theses and Dissertations

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that …


Operating System Identification By Ipv6 Communication Using Machine Learning Ensembles, Adrian Ordorica Aug 2017

Operating System Identification By Ipv6 Communication Using Machine Learning Ensembles, Adrian Ordorica

Graduate Theses and Dissertations

Operating system (OS) identification tools, sometimes called fingerprinting tools, are essential for the reconnaissance phase of penetration testing. While OS identification is traditionally performed by passive or active tools that use fingerprint databases, very little work has focused on using machine learning techniques. Moreover, significantly more work has focused on IPv4 than IPv6. We introduce a collaborative neural network ensemble that uses a unique voting system and a random forest ensemble to deliver accurate predictions. This approach uses IPv6 features as well as packet metadata features for OS identification. Our experiment shows that our approach is valid and we achieve …


Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu Aug 2017

Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu

Theses and Dissertations

Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to …


Method For Enabling Causal Inference In Relational Domains, David Arbour Jul 2017

Method For Enabling Causal Inference In Relational Domains, David Arbour

Doctoral Dissertations

The analysis of data from complex systems is quickly becoming a fundamental aspect of modern business, government, and science. The field of causal learning is concerned with developing a set of statistical methods that allow practitioners make inferences about unseen interventions. This field has seen significant advances in recent years. However, the vast majority of this work assumes that data instances are independent, whereas many systems are best described in terms of interconnected instances, i.e. relational systems. This discrepancy prevents causal inference techniques from being reliably applied in many real-world settings.
In this thesis, I will present three contributions to …


Signet: A Neural Network Architecture For Predicting Protein-Protein Interactions, Muhammad S. Ahmed Jul 2017

Signet: A Neural Network Architecture For Predicting Protein-Protein Interactions, Muhammad S. Ahmed

Electronic Thesis and Dissertation Repository

The study of protein-protein interactions (PPI) is critically important within the field of Molecular Biology, as proteins facilitate key organismal functions including the maintenance of both cellular structure and function. Current experimental methods for elucidating PPIs are greatly hindered by large operating costs, lengthy wait times, as well as low accuracy. The recent development of computational PPI predicting techniques has worked to address many of these issues. Despite this, many of these methods utilize over-engineered features and naive learning algorithms. With the recent advances in Machine Learning and Artificial Intelligence, we attempt to view this problem through a novel, deep …


Travel Mode Identification With Smartphone Sensors, Xing Su Jun 2017

Travel Mode Identification With Smartphone Sensors, Xing Su

Dissertations, Theses, and Capstone Projects

Personal trips in a modern urban society typically involve multiple travel modes. Recognizing a traveller's transportation mode is not only critical to personal context-awareness in related applications, but also essential to urban traffic operations, transportation planning, and facility design. While the state of the art in travel mode recognition mainly relies on large-scale infrastructure-based fixed sensors or on individuals' GPS devices, the emergence of the smartphone provides a promising alternative with its ever-growing computing, networking, and sensing powers. In this thesis, we propose new algorithms for travel mode identification using smartphone sensors. The prototype system is built upon the latest …


Document Classification Using Machine Learning, Ankit Basarkar May 2017

Document Classification Using Machine Learning, Ankit Basarkar

Master's Projects

To perform document classification algorithmically, documents need to be represented such that it is understandable to the machine learning classifier. The report discusses the different types of feature vectors through which document can be represented and later classified. The project aims at comparing the Binary, Count and TfIdf feature vectors and their impact on document classification. To test how well each of the three mentioned feature vectors perform, we used the 20-newsgroup dataset and converted the documents to all the three feature vectors. For each feature vector representation, we trained the Naïve Bayes classifier and then tested the generated classifier …


Credit Scoring Using Logistic Regression, Ansen Mathew May 2017

Credit Scoring Using Logistic Regression, Ansen Mathew

Master's Projects

This report presents an approach to predict the credit scores of customers using the Logistic Regression machine learning algorithm. The research objective of this project is to perform a comparative study between feature selection and feature extraction, against the same dataset using the Logistic Regression machine learning algorithm. For feature selection, we have used Stepwise Logistic Regression. For feature extraction, we have used Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). In order to test the accuracy obtained using feature selection and feature extraction, we used a public credit dataset having 11 features and 150,000 records. After performing …


On The Aggregation Of Subjective Inputs From Multiple Sources, Mithun Chakraborty May 2017

On The Aggregation Of Subjective Inputs From Multiple Sources, Mithun Chakraborty

McKelvey School of Engineering Theses & Dissertations

When we have a population of individuals or artificially intelligent agents possessing diverse subjective inputs (e.g. predictions, opinions, etc.) about a common topic, how should we collect and combine them into a single judgment or estimate? This has long been a fundamental question across disciplines that concern themselves with forecasting and decision-making, and has attracted the attention of computer scientists particularly on account of the proliferation of online platforms for electronic commerce and the harnessing of collective intelligence. In this dissertation, I study this problem through the lens of computational social science in three main parts: (1) Incentives in information …


Adaptive Region-Based Approaches For Cellular Segmentation Of Bright-Field Microscopy Images, Hady Ahmady Phoulady May 2017

Adaptive Region-Based Approaches For Cellular Segmentation Of Bright-Field Microscopy Images, Hady Ahmady Phoulady

USF Tampa Graduate Theses and Dissertations

Microscopy image processing is an emerging and quickly growing field in medical imaging research area. Recent advancements in technology including higher computation power, larger and cheaper storage modules, and more efficient and faster data acquisition devices such as whole-slide imaging scanners contributed to the recent microscopy image processing research advancement. Most of the methods in this research area either focus on automatically process images and make it easier for pathologists to direct their focus on the important regions in the image, or they aim to automate the whole job of experts including processing and classifying images or tissues that leads …


On The Role Of Genetic Algorithms In The Pattern Recognition Task Of Classification, Isaac Ben Sherman May 2017

On The Role Of Genetic Algorithms In The Pattern Recognition Task Of Classification, Isaac Ben Sherman

Masters Theses

In this dissertation we ask, formulate an apparatus for answering, and answer the following three questions: Where do Genetic Algorithms fit in the greater scheme of pattern recognition? Given primitive mechanics, can Genetic Algorithms match or exceed the performance of theoretically-based methods? Can we build a generic universal Genetic Algorithm for classification? To answer these questions, we develop a genetic algorithm which optimizes MATLAB classifiers and a variable length genetic algorithm which does classification based entirely on boolean logic. We test these algorithms on disparate datasets rooted in cellular biology, music theory, and medicine. We then get results from these …


Explorations Into Machine Learning Techniques For Precipitation Nowcasting, Aditya Nagarajan Mar 2017

Explorations Into Machine Learning Techniques For Precipitation Nowcasting, Aditya Nagarajan

Masters Theses

Recent advances in cloud-based big-data technologies now makes data driven solutions feasible for increasing numbers of scientific computing applications. One such data driven solution approach is machine learning where patterns in large data sets are brought to the surface by finding complex mathematical relationships within the data. Nowcasting or short-term prediction of rainfall in a given region is an important problem in meteorology. In this thesis we explore the nowcasting problem through a data driven approach by formulating it as a machine learning problem.

State-of-the-art nowcasting systems today are based on numerical models which describe the physical processes leading to …


Deep Learning Approach For Intrusion Detection System (Ids) In The Internet Of Things (Iot) Network Using Gated Recurrent Neural Networks (Gru), Manoj Kumar Putchala Jan 2017

Deep Learning Approach For Intrusion Detection System (Ids) In The Internet Of Things (Iot) Network Using Gated Recurrent Neural Networks (Gru), Manoj Kumar Putchala

Browse all Theses and Dissertations

The Internet of Things (IoT) is a complex paradigm where billions of devices are connected to a network. These connected devices form an intelligent system of systems that share the data without human-to-computer or human-to-human interaction. These systems extract meaningful data that can transform human lives, businesses, and the world in significant ways. However, the reality of IoT is prone to countless cyber-attacks in the extremely hostile environment like the internet. The recent hack of 2014 Jeep Cherokee, iStan pacemaker, and a German steel plant are a few notable security breaches. To secure an IoT system, the traditional high-end security …


Pulsar Search Using Supervised Machine Learning, John M. Ford Jan 2017

Pulsar Search Using Supervised Machine Learning, John M. Ford

CCE Theses and Dissertations

Pulsars are rapidly rotating neutron stars which emit a strong beam of energy through mechanisms that are not entirely clear to physicists. These very dense stars are used by astrophysicists to study many basic physical phenomena, such as the behavior of plasmas in extremely dense environments, behavior of pulsar-black hole pairs, and tests of general relativity. Many of these tasks require information to answer the scientific questions posed by physicists. In order to provide more pulsars to study, there are several large-scale pulsar surveys underway, which are generating a huge backlog of unprocessed data. Searching for pulsars is a very …


Performance Envelopes Of Adaptive Ensemble Data Stream Classifiers, Stefan Joe-Yen Jan 2017

Performance Envelopes Of Adaptive Ensemble Data Stream Classifiers, Stefan Joe-Yen

CCE Theses and Dissertations

This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as “concept drift,” often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities …


Machine Learning And Natural Language Methods For Detecting Psychopathy In Textual Data, Andrew Stephen Henning Jan 2017

Machine Learning And Natural Language Methods For Detecting Psychopathy In Textual Data, Andrew Stephen Henning

Electronic Theses and Dissertations

Among the myriad of mental conditions permeating through society, psychopathy is perhaps the most elusive to diagnose and treat. With the advent of natural language processing and machine learning, however, we have ushered in a new age of technology that provides a fresh toolkit for analyzing text and context. Because text remains the medium of choice for most personal and professional interactions, it may be possible to use textual samples from psychopaths as a means for understanding and ultimately classifying similar individuals based on the content of their language usage. This paper aims to investigate natural language processing and supervised …


Improved Detection For Advanced Polymorphic Malware, James B. Fraley Jan 2017

Improved Detection For Advanced Polymorphic Malware, James B. Fraley

CCE Theses and Dissertations

Malicious Software (malware) attacks across the internet are increasing at an alarming rate. Cyber-attacks have become increasingly more sophisticated and targeted. These targeted attacks are aimed at compromising networks, stealing personal financial information and removing sensitive data or disrupting operations. Current malware detection approaches work well for previously known signatures. However, malware developers utilize techniques to mutate and change software properties (signatures) to avoid and evade detection. Polymorphic malware is practically undetectable with signature-based defensive technologies. Today’s effective detection rate for polymorphic malware detection ranges from 68.75% to 81.25%. New techniques are needed to improve malware detection rates. Improved detection …


Context-Aware Debugging For Concurrent Programs, Justin Chu Jan 2017

Context-Aware Debugging For Concurrent Programs, Justin Chu

Theses and Dissertations--Computer Science

Concurrency faults are difficult to reproduce and localize because they usually occur under specific inputs and thread interleavings. Most existing fault localization techniques focus on sequential programs but fail to identify faulty memory access patterns across threads, which are usually the root causes of concurrency faults. Moreover, existing techniques for sequential programs cannot be adapted to identify faulty paths in concurrent programs. While concurrency fault localization techniques have been proposed to analyze passing and failing executions obtained from running a set of test cases to identify faulty access patterns, they primarily focus on using statistical analysis. We present a novel …


Optimized Multilayer Perceptron With Dynamic Learning Rate To Classify Breast Microwave Tomography Image, Chulwoo Pack Jan 2017

Optimized Multilayer Perceptron With Dynamic Learning Rate To Classify Breast Microwave Tomography Image, Chulwoo Pack

Electronic Theses and Dissertations

Most recently developed Computer Aided Diagnosis (CAD) systems and their related research is based on medical images that are usually obtained through conventional imaging techniques such as Magnetic Resonance Imaging (MRI), x-ray mammography, and ultrasound. With the development of a new imaging technology called Microwave Tomography Imaging (MTI), it has become inevitable to develop a CAD system that can show promising performance using new format of data. The platform can have a flexibility on its input by adopting Artificial Neural Network (ANN) as a classifier. Among the various phases of CAD system, we have focused on optimizing the classification phase …


Analysing The Effects Of Data Augmentation And Free Parameters For Text Classification With Recurrent Convolutional Neural Networks, Jonathan Quijas Jan 2017

Analysing The Effects Of Data Augmentation And Free Parameters For Text Classification With Recurrent Convolutional Neural Networks, Jonathan Quijas

Open Access Theses & Dissertations

Convolutional neural networks have seen much success in computer vision and natural language processing tasks. When training convolutional neural networks for text classification tasks, a common technique is to transform an input sequence of words into a dense matrix of word embeddings, or words represented as dense vectors, using table lookup operations. This enables the inputs to be represented in a way that the well-known convolution/pooling operations can be applied to them in a manner similar to images. These word embeddings may be further incorporated into the neural network itself as a trainable layer to allow fine-tuning, usually leading to …


Autonomous Driving With A Simulation Trained Convolutional Neural Network, Cameron Franke Jan 2017

Autonomous Driving With A Simulation Trained Convolutional Neural Network, Cameron Franke

University of the Pacific Theses and Dissertations

Autonomous vehicles will help society if they can easily support a broad range of driving environments, conditions, and vehicles.

Achieving this requires reducing the complexity of the algorithmic system, easing the collection of training data, and verifying operation using real-world experiments. Our work addresses these issues by utilizing a reflexive neural network that translates images into steering and throttle commands. This network is trained using simulation data from Grand Theft Auto V~\cite{gtav}, which we augment to reduce the number of simulation hours driven. We then validate our work using a RC car system through numerous tests. Our system successfully drive …


Triple Non-Negative Matrix Factorization Technique For Sentiment Analysis And Topic Modeling, Alexander A. Waggoner Jan 2017

Triple Non-Negative Matrix Factorization Technique For Sentiment Analysis And Topic Modeling, Alexander A. Waggoner

CMC Senior Theses

Topic modeling refers to the process of algorithmically sorting documents into categories based on some common relationship between the documents. This common relationship between the documents is considered the “topic” of the documents. Sentiment analysis refers to the process of algorithmically sorting a document into a positive or negative category depending whether this document expresses a positive or negative opinion on its respective topic. In this paper, I consider the open problem of document classification into a topic category, as well as a sentiment category. This has a direct application to the retail industry where companies may want to scour …