Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Physical Sciences and Mathematics

Can Clustering Improve Requirements Traceability? A Tracelab-Enabled Study, Brett Taylor Armstrong Dec 2013

Can Clustering Improve Requirements Traceability? A Tracelab-Enabled Study, Brett Taylor Armstrong

Master's Theses

Software permeates every aspect of our modern lives. In many applications, such in the software for airplane flight controls, or nuclear power control systems software failures can have catastrophic consequences. As we place so much trust in software, how can we know if it is trustworthy? Through software assurance, we can attempt to quantify just that.

Building complex, high assurance software is no simple task. The difficult information landscape of a software engineering project can make verification and validation, the process by which the assurance of a software is assessed, very difficult. In order to manage the inevitable information overload …


Enabling Richer Insight Into Runtime Executions Of Systems, Karthik Swaminathan Nagaraj Oct 2013

Enabling Richer Insight Into Runtime Executions Of Systems, Karthik Swaminathan Nagaraj

Open Access Dissertations

Systems software of very large scales are being heavily used today in various important scenarios such as online retail, banking, content services, web search and social networks. As the scale of functionality and complexity grows in these software, managing the implementations becomes a considerable challenge for developers, designers and maintainers. Software needs to be constantly monitored and tuned for optimal efficiency and user satisfaction. With large scale, these systems incorporate significant degrees of asynchrony, parallelism and distributed executions, reducing the manageability of software including performance management. Adding to the complexity, developers are under pressure between developing new functionality for customers …


Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel Aug 2013

Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel

Theses and Dissertations

Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We …


Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose Aug 2013

Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose

Doctoral Dissertations

Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. …


Segmentation And Model Generation For Large-Scale Cyber Attacks, Steven E. Strapp Aug 2013

Segmentation And Model Generation For Large-Scale Cyber Attacks, Steven E. Strapp

Theses

Raw Cyber attack traffic can present more questions than answers to security analysts. Especially with large-scale observables it is difficult to identify which packets are relevant and what attack behaviors are present. Many existing works in Host or Flow Clustering attempt to group similar behaviors to expedite analysis; these works often phrase the problem directly as offline unsupervised machine learning. This work proposes online processing to simultaneously model coordinating actors and segment traffic that is relevant to a target of interest, all while it is being received. The goal is not just to aggregate similar attack behaviors, but to provide …


Computer Sketch Recognition, Richard Steigerwald Jun 2013

Computer Sketch Recognition, Richard Steigerwald

Master's Theses

Tens of thousands of years ago, humans drew sketches that we can see and identify even today. Sketches are the oldest recorded form of human communication and are still widely used. The universality of sketches supersedes that of culture and language. Despite the universal accessibility of sketches by humans, computers are unable to interpret or even correctly identify the contents of sketches drawn by humans with a practical level of accuracy.

In my thesis, I demonstrate that the accuracy of existing sketch recognition techniques can be improved by optimizing the classification criteria. Current techniques classify a 20,000 sketch crowd-sourced dataset …


Document Classification, Shane K. Panter May 2013

Document Classification, Shane K. Panter

Boise State University Theses and Dissertations

We present an overview of the document classification process and present research conducted against the newly constructed SBIR-STTR corpus. Specifically, the current methods in use for annotation, corpus construction, feature construction, feature weighting, and classifier algorithms are surveyed. We introduce a new dataset derived from public data downloaded from sbir.gov and the Text Annotation Toolkit (TAT) 1 for use in classification research.

TAT is a collection of independent components packaged together into one open source software application. TAT was engineered to support the document classification process and workflow. Tracking of changes in a working corpus, saving data used in the …


Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen Apr 2013

Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen

Theses and Dissertations

Latent Dirichlet Allocation (LDA) is widely used for automatic discovery of latent topics in document corpora. However, output from analysis using an LDA topic model suffers from a lack of identifiability between topics not only across corpora, but across runs of the algorithm. The output is also isolated from enriching information from knowledge sources such as Wikipedia and is difficult for humans to interpret due to a lack of meaningful topic labels. This thesis introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). LDA-STWD …


Artificial Immune Systems And Particle Swarm Optimization For Solutions To The General Adversarial Agents Problem, Jeremy Mange Apr 2013

Artificial Immune Systems And Particle Swarm Optimization For Solutions To The General Adversarial Agents Problem, Jeremy Mange

Dissertations

The general adversarial agents problem is an abstract problem description touching on the fields of Artificial Intelligence, machine learning, decision theory, and game theory. The goal of the problem is, given one or more mobile agents, each identified as either “friendly" or “enemy", along with a specified environment state, to choose an action or series of actions from all possible valid choices for the next “timestep" or series thereof, in order to lead toward a specified outcome or set of outcomes. This dissertation explores approaches to this problem utilizing Artificial Immune Systems, Particle Swarm Optimization, and hybrid approaches, along with …


Object Detection And Recognition In Natural Settings, George William Dittmar Jan 2013

Object Detection And Recognition In Natural Settings, George William Dittmar

Dissertations and Theses

Much research as of late has focused on biologically inspired vision models that are based on our understanding of how the visual cortex processes information. One prominent example of such a system is HMAX [17]. HMAX attempts to simulate the biological process for object recognition in cortex based on the model proposed by Hubel & Wiesel [10]. This thesis investigates the ability of an HMAX-like system (GLIMPSE [20]) to perform object-detection in cluttered natural scenes. I evaluate these results using the StreetScenes database from MIT [1, 8]. This thesis addresses three questions: (1) Can the GLIMPSE-based object detection system replicate …


Human Intention Recognition Based Assisted Telerobotic Grasping Of Objects In An Unstructured Environment, Karan Hariharan Khokar Jan 2013

Human Intention Recognition Based Assisted Telerobotic Grasping Of Objects In An Unstructured Environment, Karan Hariharan Khokar

USF Tampa Graduate Theses and Dissertations

In this dissertation work, a methodology is proposed to enable a robot to identify an object to be grasped and its intended grasp configuration while a human is teleoperating a robot towards the desired object. Based on the detected object and grasp configuration, the human is assisted in the teleoperation task. The environment is unstructured and consists of a number of objects, each with various possible grasp configurations. The identification of the object and the grasp configuration is carried out in real time, by recognizing the intention of the human motion. Simultaneously, the human user is assisted to preshape over …


A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi Jan 2013

A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi

CMC Senior Theses

I will investigate applications of machine learning algorithms to medical data, adaptations of differences in data collection, and the use of ensemble techniques.

Focusing on the binary classification problem of Parkinson’s Disease (PD) diagnosis, I will apply machine learning algorithms to a primary dataset consisting of voice recordings from healthy and PD subjects. Specifically, I will use Artificial Neural Networks, Support Vector Machines, and an Ensemble Learning algorithm to reproduce results from [MS12] and [GM09].

Next, I will adapt a secondary regression dataset of PD recordings and combine it with the primary binary classification dataset, testing various techniques to consolidate …


On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj Jan 2013

On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj

LSU Doctoral Dissertations

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …