Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass Dec 2013

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass

Theses and Dissertations

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …


Extracting The Structure And Conformations Of Biological Entities From Large Datasets, Ali Dashti Dec 2013

Extracting The Structure And Conformations Of Biological Entities From Large Datasets, Ali Dashti

Theses and Dissertations

In biology, structure determines function, which often proceeds via changes in conformation. Efficient means for determining structure exist, but mapping conformations continue to present a serious challenge. Single-particles approaches, such as cryogenic electron microscopy (cryo-EM) and emerging "diffract & destroy" X-ray techniques are, in principle, ideally positioned to overcome these challenges. But the algorithmic ability to extract information from large heterogeneous datasets consisting of "unsorted" snapshots - each emanating from an unknown orientation of an object in an unknown conformation - remains elusive.

It is the objective of this thesis to describe and validate a powerful suite of manifold-based algorithms …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Economic Perspective On Cloud Computing: Three Essays, Abhijit Dutt Aug 2013

Economic Perspective On Cloud Computing: Three Essays, Abhijit Dutt

Theses and Dissertations

Improvements in Information Technology (IT) infrastructure and standardization of interoperability standards among heterogeneous Information System (IS) applications have brought a paradigm shift in the way an IS application could be used and delivered. Not only an IS application can be built using standardized component but also parts of it can be hosted by different organizations in different locations provided it can be accessed using the Internet. This dissertation is an attempt to uncover unique aspects of this phenomenon known as Software as a Service (SaaS).

The first essay examines design decision making by SaaS providers by analyzing effects of two …


Efficient Computation Of K-Nearest Neighbor Graphs For Large High-Dimensional Data Sets On Gpu Clusters, Ali Dashti Aug 2013

Efficient Computation Of K-Nearest Neighbor Graphs For Large High-Dimensional Data Sets On Gpu Clusters, Ali Dashti

Theses and Dissertations

The k-Nearest Neighbor Graph (k-NNG) and the related k-Nearest Neighbor (k-NN) methods have a wide variety of applications in areas such as bioinformatics, machine learning, data mining, clustering analysis, and pattern recognition. Our application of interest is manifold embedding. Due to the large dimensionality of the input data (<15k), spatial subdivision based techniques such OBBs, k-d tree, BSP etc., are not viable. The only alternative is the brute-force search, which has two distinct parts. The first finds distances between individual vectors in the corpus based on a pre-defined metric. Given the distance matrix, the second step selects k nearest neighbors for each member of the query data set.

This thesis presents the development and implementation of a distributed exact k-Nearest Neighbor Graph (k-NNG) construction method. The proposed method uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism for distributed computational systems using GPUs. It is scalable for different cluster sizes, with each compute node in the cluster …


Global Technical Communication And Content Management: A Study Of Multilingual Quality, Tatiana Batova May 2013

Global Technical Communication And Content Management: A Study Of Multilingual Quality, Tatiana Batova

Theses and Dissertations

The field of technical communication (TC) is facing a dilemma. Content management (CM) strategies and technologies that completely reshape writing and translation practices are adopted in an increasing number of TC work groups. One driving factor in CM adoption is the promise of improving quality of multilingual technical texts, all the while reducing time/cost of technical translation and localization. Yet, CM relies on automation and privileges consistency¯an approach that is problematic in global TC with its focus on adapting texts based on the characteristics of end-users.

To better understand the interdisciplinary dilemma of multilingual quality in CM, during my dissertation …


System For Detection Of Defects In Cables Of Bridge Structures, Emad Ismail Abdel Salam May 2013

System For Detection Of Defects In Cables Of Bridge Structures, Emad Ismail Abdel Salam

Theses and Dissertations

Over the last 75 years, many cable-supported bridges have been built in America, Europe, Asia and other parts of the world. However, over the years these bridges have aged and been exposed to environmental conditions such as rain, snow, de-icing and harmful chemicals. These conditions cause various levels of deterioration in bridges, particularly corrosion. Corrosion causes a loss of cross-section in the steel, adversely affecting the bridge's capacity to carry its service loads, and can possibly cause bridge failures. Although many methods have been attempted to inspect these bridges, most have offered limited success. In the recent years, it has …