Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 59

Full-Text Articles in Physical Sciences and Mathematics

Improving Library Searches Using Word-Correlation Factors And Folksonomies, Maria Soledad Pera Dec 2008

Improving Library Searches Using Word-Correlation Factors And Folksonomies, Maria Soledad Pera

Theses and Dissertations

Libraries, private and public, offer valuable resources to library patrons; however, formulating library queries to retrieve relevant results can be difficult. This occurs because when using a library catalog for library searches, patrons often do not know the exact keywords to be included in a query that match the rigid subject terms (chosen by the Library of Congress) or terms in other fields of a desired library catalog record. These improperly formulated queries often translate into a high percentage of failed searches that retrieve irrelevant results or no results at all. This explains why frustrated library patrons nowadays rely on …


Ontology Generation, Information Harvesting And Semantic Annotation For Machine-Generated Web Pages, Cui Tao Dec 2008

Ontology Generation, Information Harvesting And Semantic Annotation For Machine-Generated Web Pages, Cui Tao

Theses and Dissertations

The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. This frustrating problem motivates an approach to turn the web of pages into a web of knowledge, so that web users can query the information of interest directly. This dissertation provides a step in this direction and a way to partially overcome the challenges. Specifically, this dissertation shows how to turn machine-generated web …


Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, Mike Gashler, Christophe G. Giraud-Carrier, Tony R. Martinez Dec 2008

Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, Mike Gashler, Christophe G. Giraud-Carrier, Tony R. Martinez

Faculty Publications

Using decision trees that split on randomly selected attributes is one way to increase the diversity within an ensemble of decision trees. Another approach increases diversity by combining multiple tree algorithms. The random forest approach has become popular because it is simple and yields good results with common datasets. We present a technique that combines heterogeneous tree algorithms and contrast it with homogeneous forest algorithms. Our results indicate that random forests do poorly when faced with irrelevant attributes, while our heterogeneous technique handles them robustly. Further, we show that large ensembles of random trees are more susceptible to diminishing returns …


Learning-Based Fusion For Data Deduplication, Sabra Dinerstein, Parris K. Egbert, Stephen W. Clyde, Jared Dinerstein Dec 2008

Learning-Based Fusion For Data Deduplication, Sabra Dinerstein, Parris K. Egbert, Stephen W. Clyde, Jared Dinerstein

Faculty Publications

Rule-based deduplication utilizes expert domain knowledge to identify and remove duplicate data records. Achieving high accuracy in a rule-based system requires the creation of rules containing a good combination of discriminatory clues. Unfortunately, accurate rule-based deduplication often requires significant manual tuning of both the rules and the corresponding thresholds. This need for manual tuning reduces the efficacy of rule-based deduplication and its applicability to real-world data sets. No adequate solution exists for this problem. We propose a novel technique for rule-based deduplication. We apply individual deduplication rules, and combine the resultant match scores via learning-based information fusion. We show empirically …


Dynamic Load Balancing Of Virtual Machines Hosted On Xen, Terry Clyde Wilcox Dec 2008

Dynamic Load Balancing Of Virtual Machines Hosted On Xen, Terry Clyde Wilcox

Theses and Dissertations

Currently systems of virtual machines are load balanced statically which can create load imbalances for systems where the load changes dynamically over time. For throughput and response time of a system to be maximized it is necessary for load to be evenly distributed among each part of the system. We implement a prototype policy engine for the Xen virtual machine monitor which can dynamically load balance virtual machines. We compare the throughput and response time of our system using the cpu2000 and the WEB2005 benchmarks from SPEC. Under the loads we tested, dynamic load balancing had 5%-8% higher throughput than …


Nowhere To Hide: Finding Plagiarized Documents Based On Sentence Similarity, Nathaniel Gustafson, Yiu-Kai D. Ng, Maria Soledad Pera Dec 2008

Nowhere To Hide: Finding Plagiarized Documents Based On Sentence Similarity, Nathaniel Gustafson, Yiu-Kai D. Ng, Maria Soledad Pera

Faculty Publications

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of online publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) …


Sequence Alignment With Traceback On Reconfigurable Hardware, Scott Lloyd, Quinn O. Snell Dec 2008

Sequence Alignment With Traceback On Reconfigurable Hardware, Scott Lloyd, Quinn O. Snell

Faculty Publications

Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop …


Real-Time Automatic Price Prediction For Ebay Online Trading, Ilya Igorevitch Raykhel Nov 2008

Real-Time Automatic Price Prediction For Ebay Online Trading, Ilya Igorevitch Raykhel

Theses and Dissertations

While Machine Learning is one of the most popular research areas in Computer Science, there are still only a few deployed applications intended for use by the general public. We have developed an exemplary application that can be directly applied to eBay trading. Our system predicts how much an item would sell for on eBay based on that item's attributes. We ran our experiments on the eBay laptop category, with prior trades used as training data. The system implements a feature-weighted k-Nearest Neighbor algorithm, using genetic algorithms to determine feature weights. Our results demonstrate an average prediction error of 16%; …


An Analysis Of Document Retrieval And Clustering Using An Effective Semantic Distance Measure, Nathan Scott Davis Nov 2008

An Analysis Of Document Retrieval And Clustering Using An Effective Semantic Distance Measure, Nathan Scott Davis

Theses and Dissertations

As large amounts of digital information become more and more accessible, the ability to effectively find relevant information is increasingly important. Search engines have historically performed well at finding relevant information by relying primarily on lexical and word based measures. Similarly, standard approaches to organizing and categorizing large amounts of textual information have previously relied on lexical and word based measures to perform grouping or classification tasks. Quite often, however, these processes take place without respect to semantics, or word meanings. This is perhaps due to the fact that the idea of meaningful similarity is naturally qualitative, and thus difficult …


A See-Ability Metric To Improve Mini Unmanned Aerial Vehicle Operator Awareness Using Video Georegistered To Terrain Models, Cameron Howard Engh Nov 2008

A See-Ability Metric To Improve Mini Unmanned Aerial Vehicle Operator Awareness Using Video Georegistered To Terrain Models, Cameron Howard Engh

Theses and Dissertations

Search and rescue operations conducted in wilderness environments can be greatly aided by the use of video filmed from mini-UAVs. While lightweight, inexpensive and easily transportable, these small aircraft suffer from wind buffeting and may produce video that is difficult to search. To aid in the video search process, we have created a system to project video frames into a 3D representation of the search region. This projection allows us to tie each frame of video to a real-world location, enabling a myriad of novel views, mosaics and metrics that can be used to guide the search including a new …


The Hybrid Game Architecture: Distributing Bandwidth For Mmogs While Maintaining Central Control, Jared L. Jardine Nov 2008

The Hybrid Game Architecture: Distributing Bandwidth For Mmogs While Maintaining Central Control, Jared L. Jardine

Theses and Dissertations

Current Massively Multi-player Online Games (MMOGs) have enormous server-side bandwidth requirements. The costs of providing this bandwidth is in turn passed on to the consumer in the form of high monthly subscription fees. Prior work has primarily focused on distributing this bandwidth using peer-to-peer architectures, but these architectures have difficulty preventing cheating, overwhelming low resource peers, and maintaining consistent game state. We have developed a hybrid game architecture that combines client-server and peer-to-peer technologies to prevent cheating, maintain centralized and consistent game state, significantly reduce central server bandwidth, and prevent lower capacity players from being overwhelmed. By dramatically reducing the …


Using Vagueness Measures To Re-Rank Documents Retrieved By A Fuzzy Set Information Retrieval Model, Stephen Lynn, Yiu-Kai D. Ng Oct 2008

Using Vagueness Measures To Re-Rank Documents Retrieved By A Fuzzy Set Information Retrieval Model, Stephen Lynn, Yiu-Kai D. Ng

Faculty Publications

Traditional information retrieval (IR) systems evaluate user queries and retrieve/rank documents based on matching keywords in user queries with words in documents. These exact word-matching and ranking approaches ignore too many relevant documents that do not contain the exact keywords as specified in a user query. Instead of considering these traditional approaches, we propose to retrieve documents using a fuzzy set IR model and rank retrieved documents for any vague query using the “vagueness score” of the documents based on the word senses as defined in WordNet. Using the vagueness scores, we rank the most highest “relevant” documents of a …


Enhancement Of Unusual Color In Aerial Video Sequences For Assisting Wilderness Search And Rescue, Bryan S. Morse, Nathan D. Rasmussen, Daniel Thornton Oct 2008

Enhancement Of Unusual Color In Aerial Video Sequences For Assisting Wilderness Search And Rescue, Bryan S. Morse, Nathan D. Rasmussen, Daniel Thornton

Faculty Publications

The use of aerial video for search and surveillance has been popularized by the increased use of camera-equipped unmanned aerial vehicles. For many search applications, objects may also be missed by observers due to their small size, brief visibility, or the inherent monotony of the scene. This paper presents a novel method for automatically emphasizing unusually colored objects to improve their detectability. We use a hue histogram and a local saliency measure to find unusually colored objects, then boost the saturation of these objects while desaturating more common colors, thus drawing the observer’s attention and facilitating video search.


Scalable Multicast Routing For Ad Hoc Networks, Manoj Pandey, Daniel Zappala Oct 2008

Scalable Multicast Routing For Ad Hoc Networks, Manoj Pandey, Daniel Zappala

Faculty Publications

Routing in a mobile ad hoc network is challenging because nodes can move at any time, invalidating a previously-discovered route. Multicast routing is even more challenging, because a source needs to maintain a route to potentially many group members simultaneously. Providing scalable solutions to this problem typically requires building a hierarchy or an overlay network to reduce the cost of route discovery and maintenance. In this paper, we show that a much simpler alternative is possible, by using source specific semantics and relying on the unicast routing protocol to find all routes. This separation of concerns enables the multicast routing …


Hop-By-Hop Multicast Transport For Mobile Ad Hoc Wireless Networks, Manoj Pandey, Daniel Zappala Oct 2008

Hop-By-Hop Multicast Transport For Mobile Ad Hoc Wireless Networks, Manoj Pandey, Daniel Zappala

Faculty Publications

Multicast transport is a challenging problem because the source must provide congestion control and reliability for a tree, rather than a single path. This problem is made even more difficult in mobile ad hoc networks due to problems caused by contention, spatial reuse, and mobility. In this paper, we design a hop-by-hop multicast transport protocol, which pushes transport functionality into the core of the network. Although this requires per-flow state, a hop-by-hop approach simplifies congestion control, enables local recovery of lost packets, and provides low delay and efficient use of wireless capacity. We use a simulation study to demonstrate the …


Autonomous And Intelligent Radio Switching For Heterogeneous Wireless Networks, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala Sep 2008

Autonomous And Intelligent Radio Switching For Heterogeneous Wireless Networks, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala

Faculty Publications

As wireless devices continue to become more prevalent, heterogeneous wireless networks - in which communicating devices have at their disposal multiple types of radios - will become the norm. Communication between nodes in these networks ought to be as simple as possible; they should be able to seamlessly switch between different radios and network stacks on the fly in order to better serve the user. To make this a possibility, we consider the challenging problems of when two communicating devices should decide to switch to a different radio, and which radio they should choose. We design an Autonomous and Intelligent …


Improving Live Sequence Chart To Automata Transformation For Verification, Rahul Kumar, Eric G. Mercer Aug 2008

Improving Live Sequence Chart To Automata Transformation For Verification, Rahul Kumar, Eric G. Mercer

Faculty Publications

This paper presents a Live Sequence Chart (LSC) to automata transformation algorithm that enables the verification of communication protocol implementations. Using this LSC to automata transformation a communication protocol implementation can be verified using a single verification run as opposed to previous techniques that rely on a three stage verification approach. The novelty and simplicity of the transformation algorithm lies in its placement of accept states in the automata generated from the LSC. We present in detail an example of the transformation as well as the transformation algorithm. Further, we present a detailed analysis and an empirical study comparing the …


Biologically Relevant Multiple Sequence Alignment, Hyrum D. Carroll Aug 2008

Biologically Relevant Multiple Sequence Alignment, Hyrum D. Carroll

Theses and Dissertations

Researchers use multiple sequence alignment algorithms to detect conserved regions in genetic sequences and to identify drug docking sites for drug development. In this dissertation, a novel algorithm is presented for using physicochemical properties to increase the accuracy of multiple sequence alignments. Secondary structures are also incorporated in the evaluation function. Additionally, the location of the secondary structures is assimilated into the function. Multiple properties are combined with weights, determined from prediction accuracies of protein secondary structures using artificial neural networks. A new metric, the PPD Score is developed, that captures the average change in physicochemical properties. Using the physicochemical …


An Infrastructure For Performance Measurement And Comparison Of Information Retrieval Solutions, Gary Saunders Aug 2008

An Infrastructure For Performance Measurement And Comparison Of Information Retrieval Solutions, Gary Saunders

Theses and Dissertations

The amount of information available on both public and private networks continues to grow at a phenomenal rate. This information is contained within a wide variety of objects, including documents, e-mail archives, medical records, manuals, pictures and music. To be of any value, this data must be easily searchable and accessible. Information Retrieval (IR) is concerned with the ability to find and gain access to relevant information. As electronic data repositories continue to proliferate, so too, grows the variety of methods used to locate and access the information contained therein. Similarly, the introduction of innovative retrieval strategies—and the optimization of …


Autonomous And Intelligent Radio Switching, Quiyi Duan Aug 2008

Autonomous And Intelligent Radio Switching, Quiyi Duan

Theses and Dissertations

With the proliferation of mobile applications and the abundance of wireless devices, it is increasingly common for devices to support multiple radios. When two devices are communicating they should choose the best available radio based on user preference and application requirements. This type of “radio switching” should happen automatically, so that the system optimizes performance dynamically. To achieve this objective, we design an Autonomous and Intelligent Radio Switching (AIRS) system to leverage the radio heterogeneity common in today's wireless devices. The AIRS system consists of three key components. First, we design a radio preference evaluation module to dynamically select the …


On Autonomous Multi-Agent Control In Wilderness Search And Rescue: A Mixed Initiative Approach, Benjamin C. Hardin Aug 2008

On Autonomous Multi-Agent Control In Wilderness Search And Rescue: A Mixed Initiative Approach, Benjamin C. Hardin

Theses and Dissertations

Searching for lost people in a Wilderness Search and Rescue (WiSAR) scenario is a task that can benefit from large numbers of agents, some of whom may be robotic. These agents may have differing levels of autonomy, determined by the set of tasks they are performing. In addition, the level of autonomy that results in the best performance may change due to varying workload or other factors. Allowing a supervisor and a searcher to jointly decide the correct level of autonomy for a given situation (“mixed initiative”) results in better overall performance than giving an agent absolute control over their …


Watertight Trimmed Nurbs, Thomas W. Sederberg, Xin Li, Hongwei Lin, Heather Ipson Aug 2008

Watertight Trimmed Nurbs, Thomas W. Sederberg, Xin Li, Hongwei Lin, Heather Ipson

Faculty Publications

This paper addresses the long-standing problem of the unavoidable gaps that arise when expressing the intersection of two NURBS surfaces using conventional trimmed-NURBS representation. The solution converts each trimmed NURBS into an untrimmed T-Spline, and then merges the untrimmed T-Splines into a single, watertight model. The solution enables watertight fillets of NURBS models, as well as arbitrary feature curves that do not have to follow isoparameter curves. The resulting T-Spline representation can be exported without error as a collection of NURBS surfaces.


Simple, Secure, Selective Delegation In Online Identify Systems, Bryant Gordon Cutler Jul 2008

Simple, Secure, Selective Delegation In Online Identify Systems, Bryant Gordon Cutler

Theses and Dissertations

The ability to delegate privileges to others is so important to users of online identity systems that users create ad hoc delegation systems by sharing authentication credentials if no other easy delegation mechanism is available. With the rise of internet-scale relationship-based single sign-on protocols like OpenID, the security risks of password sharing are unacceptable. We therefore propose SimpleAuth, a simple modification to relationship-based authentication protocols that gives users a secure way to selectively delegate subsets of their privileges, making identity systems more flexible and increasing user security. We also present a proof-of-concept implementation of the SimpleAuth pattern using the sSRP …


Predicting The Longevity Of Dvdr Media By Periodic Analysis Of Parity, Jitter, And Ecc Performance Parameters, Daniel Patrick Wells Jul 2008

Predicting The Longevity Of Dvdr Media By Periodic Analysis Of Parity, Jitter, And Ecc Performance Parameters, Daniel Patrick Wells

Theses and Dissertations

For the last ten years, DVD-R media have played an important role in the storage of large amounts of digital data throughout the world. During this time it was assumed that the DVD-R was as long-lasting and stable as its predecessor, the CD-R. Several reports have surfaced over the last few years questioning the DVD-R's ability to maintain many of its claims regarding archival quality life spans. These reports have shown a wide range of longevity between the different brands. While some DVD-Rs may last a while, others may result in an early and unexpected failure. Compounding this problem is …


Using Live Sequence Chart Specifications For Formal Verification, Rahul Kumar Jul 2008

Using Live Sequence Chart Specifications For Formal Verification, Rahul Kumar

Theses and Dissertations

Formal methods play an important part in the development as well as testing stages of software and hardware systems. A significant and often overlooked part of the process is the development of specifications and correctness requirements for the system under test. Traditionally, English has been used as the specification language, which has resulted in verbose and difficult to use specification documents that are usually abandoned during product development. This research focuses on investigating the use of Live Sequence Charts (LSCs), a graphical and intuitive language directly suited for expressing communication behaviors of a system as the specification language for a …


Adapting Adtrees For Improved Performance On Large Datasets With High Arity Features, Robert D. Van Dam Jul 2008

Adapting Adtrees For Improved Performance On Large Datasets With High Arity Features, Robert D. Van Dam

Theses and Dissertations

The ADtree, a data structure useful for caching sufficient statistics, has been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. However, even these modified forms of the ADtree still exhibit inefficiencies in terms of both space usage and query time, particularly on datasets with very high dimensionality and with high arity features. We propose five modifications to the ADtree, each of which can be used to improve size and query time under specific types of datasets and features. These modifications also provide an increased ability to precisely control how an …


Arbitrary Degree T-Splines, Gordon Thomas Finnigan Jul 2008

Arbitrary Degree T-Splines, Gordon Thomas Finnigan

Theses and Dissertations

T-Splines is a freeform surface type similar to NURBS, that allows partial rows of control points. Up until now, T-Splines have only been formally defined for the degree three case. This paper extends the definition to support all odd, even, and mixed degree T-Spline surfaces, making T-Splines a proper superset of all standard NURBS surfaces.


Data-Driven Programming And Behavior For Autonomous Virtual Characters, Jonathan Dinerstein, Parris K. Egbert, Michael A. Goodrich, Dan A. Ventura Jul 2008

Data-Driven Programming And Behavior For Autonomous Virtual Characters, Jonathan Dinerstein, Parris K. Egbert, Michael A. Goodrich, Dan A. Ventura

Faculty Publications

In the creation of autonomous virtual characters, two levels of autonomy are common. They are often called motion synthesis (low-level autonomy) and behavior synthesis (high-level autonomy), where an action (i.e. motion) achieves a short-term goal and a behavior is a sequence of actions that achieves a long-term goal. There exists a rich literature addressing many aspects of this general problem (and it is discussed in the full paper). In this paper we present a novel technique for behavior (high-level) autonomy and utilize existing motion synthesis techniques. Creating an autonomous virtual character with behavior synthesis abilities frequently includes three stages: forming …


Reducing Seed Load In The Bittorrent File Sharing System, Brian T. Sanderson Jun 2008

Reducing Seed Load In The Bittorrent File Sharing System, Brian T. Sanderson

Theses and Dissertations

BitTorrent is an attractive peer-to-peer technology that attempts to reduce load on file sharers by allowing downloaders to share content between themselves. BitTorrent's current focus is to provide users with a fast download, which requires the file sharer to serve a disproportionate amount of the file. We present a modification to the BitTorrent seeding algorithm that reduces the load on BitTorrent file sharers. Essentially, if a block of a file is already available from a significant number of peers, the file sharer refuses to share that block, forcing peers to get it from each other. Using this modification, we show …


Link Quality Prediction For Wireless Devices With Multiple Radios, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala Jun 2008

Link Quality Prediction For Wireless Devices With Multiple Radios, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala

Faculty Publications

Communication between wireless devices ought to be as simple as possible; they should be able to seamlessly switch between different radios and network stacks on the fly in order to better serve the user. To make this a possibility, we consider the challenging problem of predicting link quality in a changing mobile environment. In this paper we present an algorithm that uses Weighted Least Squares Regression to predict whether a given link can meet application requirements in terms of throughput, delay, and jitter. We use a simulation study to demonstrate that our algorithm is able to predict link quality accurately …