Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Facilitating Corpus Annotation By Improving Annotation Aggregation, Paul L. Felt Dec 2015

Facilitating Corpus Annotation By Improving Annotation Aggregation, Paul L. Felt

Theses and Dissertations

Annotated text corpora facilitate the linguistic investigation of language as well as the automation of natural language processing (NLP) tasks. NLP tasks include problems such as spam email detection, grammatical analysis, and identifying mentions of people, places, and events in text. However, constructing high quality annotated corpora can be expensive. Cost can be reduced by employing low-cost internet workers in a practice known as crowdsourcing, but the resulting annotations are often inaccurate, decreasing the usefulness of a corpus. This inaccuracy is typically mitigated by collecting multiple redundant judgments and aggregating them (e.g., via majority vote) to produce high quality consensus …


Data Selection Using Topic Adaptation For Statistical Machine Translation, Hitokazu Matsushita Nov 2015

Data Selection Using Topic Adaptation For Statistical Machine Translation, Hitokazu Matsushita

Theses and Dissertations

Statistical machine translation (SMT) requires large quantities of bitexts (i.e., bilingual parallel corpora) as training data to yield good quality translations. While obtaining a large amount of training data is critical, the similarity between training and test data also has a significant impact on SMT performance. Many SMT studies define data similarity in terms of domain-overlap, and domains are defined to be synonymous with data sources. Consequently, the SMT community has focused on domain adaptation techniques that augment small (in-domain) datasets with large datasets from other sources (hence, out-of-domain, per the definition). However, many training datasets consist of topically diverse …


Building 3d-Printed Widgets To Incorporate Into Prototypes, David E. Brandt Nov 2015

Building 3d-Printed Widgets To Incorporate Into Prototypes, David E. Brandt

Theses and Dissertations

Creating interactive prototypes can be a long and difficult process. It requires expertise in various fields. Prior work in developing interactive prototypes minimize time required to make a prototype, but generally sacrifice fidelity for fluidity. Advances in 3D printing create new opportunities to prototype with greater fidelity and fluidity. We investigate the use of several kinds of sensors, including IR photo interrupters, IR photo reflectors, push button switches, and potentiometers, to create interactive prototypes. We first design a library of 3D printable interaction components, buttons, sliders, and knobs using those sensors then we develop software to transform interaction events into …


Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron Nov 2015

Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron

Theses and Dissertations

As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to …


Interactive Machine Assistance: A Case Study In Linking Corpora And Dictionaries, Kevin P. Black Nov 2015

Interactive Machine Assistance: A Case Study In Linking Corpora And Dictionaries, Kevin P. Black

Theses and Dissertations

Machine learning can provide assistance to humans in making decisions, including linguistic decisions such as determining the part of speech of a word. Supervised machine learning methods derive patterns indicative of possible labels (decisions) from annotated example data. For many problems, including most language analysis problems, acquiring annotated data requires human annotators who are trained to understand the problem and to disambiguate among multiple possible labels. Hence, the availability of experts can limit the scope and quantity of annotated data. Machine-learned pre-annotation assistance, which suggests probable labels for unannotated items, can enable expert annotators to work more quickly and thus …


Fast Inference For Interactive Models Of Text, Jeffrey A. Lund Sep 2015

Fast Inference For Interactive Models Of Text, Jeffrey A. Lund

Theses and Dissertations

Probabilistic models of text are a useful tool for enabling the analysis of large collections of digital text. For example, Latent Dirichlet Allocation can quickly produce topical summaries of large collections of text documents. Many important uses cases of such models include human interaction during the inference process for these models of text. For example, the Interactive Topic Model extends Latent Dirichlet Allocation to incorporate human expertiese during inference in order to produce topics which are better suited to individual user needs. However, interactive use cases of probabalistic models of text introduce new constraints on inference - the inference procedure …


An Improved Classifier Chain Ensemble For Multi-Dimensionalclassification With Conditional Dependence, Joseph Ethan Heydorn Jul 2015

An Improved Classifier Chain Ensemble For Multi-Dimensionalclassification With Conditional Dependence, Joseph Ethan Heydorn

Theses and Dissertations

We focus on multi-dimensional classification (MDC) problems with conditional dependence, which we call multiple output dependence (MOD) problems. MDC is the task of predicting a vector of categorical outputs for each input. Conditional dependence in MDC means that the choice for one output value affects the choice for others, so it is not desirable to predict outputs independently. We show that conditional dependence in MDC implies that a single input can map to multiple correct output vectors. This means it is desirable to find multiple correct output vectors per input. Current solutions for MOD problems are not sufficient because they …


Feature Identification And Reduction For Improved Generalization Accuracy In Secondary-Structure Prediction Using Temporal Context Inputs In Machine-Learning Models, Matthew Benjamin Seeley May 2015

Feature Identification And Reduction For Improved Generalization Accuracy In Secondary-Structure Prediction Using Temporal Context Inputs In Machine-Learning Models, Matthew Benjamin Seeley

Theses and Dissertations

A protein's properties are influenced by both its amino-acid sequence and its three-dimensional conformation. Ascertaining a protein's sequence is relatively easy using modern techniques, but determining its conformation requires much more expensive and time-consuming techniques. Consequently, it would be useful to identify a method that can accurately predict a protein's secondary-structure conformation using only the protein's sequence data. This problem is not trivial, however, because identical amino-acid subsequences in different contexts sometimes have disparate secondary structures, while highly dissimilar amino-acid subsequences sometimes have identical secondary structures. We propose (1) to develop a set of metrics that facilitates better comparisons between …


Gaseous Particulate Interaction In A 3-Phase Granular Simulation, Kevin W. Munns May 2015

Gaseous Particulate Interaction In A 3-Phase Granular Simulation, Kevin W. Munns

Theses and Dissertations

As computer generated special effects play an increasingly integral role in the development of films and other media, simulating granular material continues to be a challenging and resource intensive process. Solutions tend to be pieced together in order to address the complex and different behaviors of granular flow. As such, these solutions tend to be brittle, overly specific, and unnatural. With the introduction of a holistic 3-phase granular simulation, we can now create a reliable and adaptable granular simulation.Our solution improves upon this hybrid solution by addressing the issue of particle flow and correcting interpenetration amongst particles while maintaining the …


An Extensible Technology Framework For Cyber Security Education, Frank Jordan Sheen Apr 2015

An Extensible Technology Framework For Cyber Security Education, Frank Jordan Sheen

Theses and Dissertations

Cyber security education has evolved over the last decade to include new methods of teaching and technology to prepare students. Instructors in this field of study often deal with a subject matter that has rigid principles, but changing ways of applying those principles. This makes maintaining courses difficult. This case study explored the kind of teaching methods, technology, and means used to explain these concepts. This study shows that generally, cyber security courses require more time to keep up to date. It also evaluates one effort, the NxSecLab, on how it attempted to relieve the administrative issues in teaching these …


Authentication Melee: A Usability Analysis Of Seven Web Authentication Systems, Scott Ruoti Apr 2015

Authentication Melee: A Usability Analysis Of Seven Web Authentication Systems, Scott Ruoti

Theses and Dissertations

Passwords continue to dominate the authentication landscape in spite of numerous proposals to replace them. Even though usability is a key factor in replacing passwords, very few alternatives have been subjected to formal usability studies and even fewer have been analyzed using a standard metric. We report the results of four within-subjects usability studies for seven web authentication systems. These systems span federated, smartphone, paper tokens, and email-based approaches. Our results indicate that participants prefer single sign-on systems. We utilize the Systems Usability Scale (SUS) as a standard metric for empirical analysis and find that it produces reliable, replicable results. …


Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith Apr 2015

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith

Theses and Dissertations

As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of …


Color Relationship Transfer For Digital Painting, Gregory Eric Philbrick Apr 2015

Color Relationship Transfer For Digital Painting, Gregory Eric Philbrick

Theses and Dissertations

A digital painter uses reference photography to add realism to a scene. This involves making n colors in a painting relate to each other more like n corresponding colors in a photograph, in terms of value and temperature. Doing this manually requires either experience or tedious experimentation. Color relationship transfer performs the task automatically, recoloring n regions of a painting so they relate in value and temperature more like n corresponding regions of a photograph. Relationship transfer also has applications in computational photography. In fact, it introduces a new paradigm for image editing in general, based on treating an image's …


Judicious Use Of Communication For Inherently Parallel Optimization, Andrew W. Mcnabb Mar 2015

Judicious Use Of Communication For Inherently Parallel Optimization, Andrew W. Mcnabb

Theses and Dissertations

Function optimization---finding the minimum or maximum of a given function---is an extremely challenging problem with applications in physics, economics, machine learning, engineering, and many other fields. While optimization is an active area of research, only a portion of this work acknowledges parallel computation, which is now widely available. Today, anyone with a modest budget can buy a cluster with hundreds of cores, pay for access to a supercomputer with thousands of processors, or at least purchase a laptop with 8 cores. Thus, an algorithm that works well in serial but cannot be parallelized is needlessly inefficient in real-life computationalenvironments.We address …


Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park Jan 2015

Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park

Theses and Dissertations

The tasks of entity recognition through ontological commitment, fact extraction and organization with respect to a target schema, and entity deduplication have all been examined in recent years, and systems exist that can perform each individual task. A framework combining all these tasks, however, is still needed to accomplish the goal of automatically extracting and organizing biographical facts about persons found in historical documents into disambiguated entity records. We introduce FROntIER (Fact Recognizer for Ontologies with Inference and Entity Resolution) as the framework to recognize and extract facts using an ontology and organize facts of interest through inferring implicit facts …