Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Physical Sciences and Mathematics

Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, Mike Gashler, Christophe G. Giraud-Carrier, Tony R. Martinez Dec 2008

Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, Mike Gashler, Christophe G. Giraud-Carrier, Tony R. Martinez

Faculty Publications

Using decision trees that split on randomly selected attributes is one way to increase the diversity within an ensemble of decision trees. Another approach increases diversity by combining multiple tree algorithms. The random forest approach has become popular because it is simple and yields good results with common datasets. We present a technique that combines heterogeneous tree algorithms and contrast it with homogeneous forest algorithms. Our results indicate that random forests do poorly when faced with irrelevant attributes, while our heterogeneous technique handles them robustly. Further, we show that large ensembles of random trees are more susceptible to diminishing returns …


Learning-Based Fusion For Data Deduplication, Sabra Dinerstein, Parris K. Egbert, Stephen W. Clyde, Jared Dinerstein Dec 2008

Learning-Based Fusion For Data Deduplication, Sabra Dinerstein, Parris K. Egbert, Stephen W. Clyde, Jared Dinerstein

Faculty Publications

Rule-based deduplication utilizes expert domain knowledge to identify and remove duplicate data records. Achieving high accuracy in a rule-based system requires the creation of rules containing a good combination of discriminatory clues. Unfortunately, accurate rule-based deduplication often requires significant manual tuning of both the rules and the corresponding thresholds. This need for manual tuning reduces the efficacy of rule-based deduplication and its applicability to real-world data sets. No adequate solution exists for this problem. We propose a novel technique for rule-based deduplication. We apply individual deduplication rules, and combine the resultant match scores via learning-based information fusion. We show empirically …


Nowhere To Hide: Finding Plagiarized Documents Based On Sentence Similarity, Nathaniel Gustafson, Yiu-Kai D. Ng, Maria Soledad Pera Dec 2008

Nowhere To Hide: Finding Plagiarized Documents Based On Sentence Similarity, Nathaniel Gustafson, Yiu-Kai D. Ng, Maria Soledad Pera

Faculty Publications

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by authors (owners) of the original copies. Unfortunately, plagiarism is getting worse due to the increasing number of online publications on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) …


Sequence Alignment With Traceback On Reconfigurable Hardware, Scott Lloyd, Quinn O. Snell Dec 2008

Sequence Alignment With Traceback On Reconfigurable Hardware, Scott Lloyd, Quinn O. Snell

Faculty Publications

Biological sequence alignment is an essential tool used in molecular biology and biomedical applications. The growing volume of genetic data and the complexity of sequence alignment present a challenge in obtaining alignment results in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop …


Using Vagueness Measures To Re-Rank Documents Retrieved By A Fuzzy Set Information Retrieval Model, Stephen Lynn, Yiu-Kai D. Ng Oct 2008

Using Vagueness Measures To Re-Rank Documents Retrieved By A Fuzzy Set Information Retrieval Model, Stephen Lynn, Yiu-Kai D. Ng

Faculty Publications

Traditional information retrieval (IR) systems evaluate user queries and retrieve/rank documents based on matching keywords in user queries with words in documents. These exact word-matching and ranking approaches ignore too many relevant documents that do not contain the exact keywords as specified in a user query. Instead of considering these traditional approaches, we propose to retrieve documents using a fuzzy set IR model and rank retrieved documents for any vague query using the “vagueness score” of the documents based on the word senses as defined in WordNet. Using the vagueness scores, we rank the most highest “relevant” documents of a …


Hop-By-Hop Multicast Transport For Mobile Ad Hoc Wireless Networks, Manoj Pandey, Daniel Zappala Oct 2008

Hop-By-Hop Multicast Transport For Mobile Ad Hoc Wireless Networks, Manoj Pandey, Daniel Zappala

Faculty Publications

Multicast transport is a challenging problem because the source must provide congestion control and reliability for a tree, rather than a single path. This problem is made even more difficult in mobile ad hoc networks due to problems caused by contention, spatial reuse, and mobility. In this paper, we design a hop-by-hop multicast transport protocol, which pushes transport functionality into the core of the network. Although this requires per-flow state, a hop-by-hop approach simplifies congestion control, enables local recovery of lost packets, and provides low delay and efficient use of wireless capacity. We use a simulation study to demonstrate the …


Scalable Multicast Routing For Ad Hoc Networks, Manoj Pandey, Daniel Zappala Oct 2008

Scalable Multicast Routing For Ad Hoc Networks, Manoj Pandey, Daniel Zappala

Faculty Publications

Routing in a mobile ad hoc network is challenging because nodes can move at any time, invalidating a previously-discovered route. Multicast routing is even more challenging, because a source needs to maintain a route to potentially many group members simultaneously. Providing scalable solutions to this problem typically requires building a hierarchy or an overlay network to reduce the cost of route discovery and maintenance. In this paper, we show that a much simpler alternative is possible, by using source specific semantics and relying on the unicast routing protocol to find all routes. This separation of concerns enables the multicast routing …


Enhancement Of Unusual Color In Aerial Video Sequences For Assisting Wilderness Search And Rescue, Bryan S. Morse, Nathan D. Rasmussen, Daniel Thornton Oct 2008

Enhancement Of Unusual Color In Aerial Video Sequences For Assisting Wilderness Search And Rescue, Bryan S. Morse, Nathan D. Rasmussen, Daniel Thornton

Faculty Publications

The use of aerial video for search and surveillance has been popularized by the increased use of camera-equipped unmanned aerial vehicles. For many search applications, objects may also be missed by observers due to their small size, brief visibility, or the inherent monotony of the scene. This paper presents a novel method for automatically emphasizing unusually colored objects to improve their detectability. We use a hue histogram and a local saliency measure to find unusually colored objects, then boost the saturation of these objects while desaturating more common colors, thus drawing the observer’s attention and facilitating video search.


Autonomous And Intelligent Radio Switching For Heterogeneous Wireless Networks, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala Sep 2008

Autonomous And Intelligent Radio Switching For Heterogeneous Wireless Networks, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala

Faculty Publications

As wireless devices continue to become more prevalent, heterogeneous wireless networks - in which communicating devices have at their disposal multiple types of radios - will become the norm. Communication between nodes in these networks ought to be as simple as possible; they should be able to seamlessly switch between different radios and network stacks on the fly in order to better serve the user. To make this a possibility, we consider the challenging problems of when two communicating devices should decide to switch to a different radio, and which radio they should choose. We design an Autonomous and Intelligent …


Improving Live Sequence Chart To Automata Transformation For Verification, Rahul Kumar, Eric G. Mercer Aug 2008

Improving Live Sequence Chart To Automata Transformation For Verification, Rahul Kumar, Eric G. Mercer

Faculty Publications

This paper presents a Live Sequence Chart (LSC) to automata transformation algorithm that enables the verification of communication protocol implementations. Using this LSC to automata transformation a communication protocol implementation can be verified using a single verification run as opposed to previous techniques that rely on a three stage verification approach. The novelty and simplicity of the transformation algorithm lies in its placement of accept states in the automata generated from the LSC. We present in detail an example of the transformation as well as the transformation algorithm. Further, we present a detailed analysis and an empirical study comparing the …


Watertight Trimmed Nurbs, Thomas W. Sederberg, Xin Li, Hongwei Lin, Heather Ipson Aug 2008

Watertight Trimmed Nurbs, Thomas W. Sederberg, Xin Li, Hongwei Lin, Heather Ipson

Faculty Publications

This paper addresses the long-standing problem of the unavoidable gaps that arise when expressing the intersection of two NURBS surfaces using conventional trimmed-NURBS representation. The solution converts each trimmed NURBS into an untrimmed T-Spline, and then merges the untrimmed T-Splines into a single, watertight model. The solution enables watertight fillets of NURBS models, as well as arbitrary feature curves that do not have to follow isoparameter curves. The resulting T-Spline representation can be exported without error as a collection of NURBS surfaces.


Data-Driven Programming And Behavior For Autonomous Virtual Characters, Jonathan Dinerstein, Parris K. Egbert, Michael A. Goodrich, Dan A. Ventura Jul 2008

Data-Driven Programming And Behavior For Autonomous Virtual Characters, Jonathan Dinerstein, Parris K. Egbert, Michael A. Goodrich, Dan A. Ventura

Faculty Publications

In the creation of autonomous virtual characters, two levels of autonomy are common. They are often called motion synthesis (low-level autonomy) and behavior synthesis (high-level autonomy), where an action (i.e. motion) achieves a short-term goal and a behavior is a sequence of actions that achieves a long-term goal. There exists a rich literature addressing many aspects of this general problem (and it is discussed in the full paper). In this paper we present a novel technique for behavior (high-level) autonomy and utilize existing motion synthesis techniques. Creating an autonomous virtual character with behavior synthesis abilities frequently includes three stages: forming …


Link Quality Prediction For Wireless Devices With Multiple Radios, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala Jun 2008

Link Quality Prediction For Wireless Devices With Multiple Radios, Qiuyi Duan, Charles D. Knutson, Lei Wang, Daniel Zappala

Faculty Publications

Communication between wireless devices ought to be as simple as possible; they should be able to seamlessly switch between different radios and network stacks on the fly in order to better serve the user. To make this a possibility, we consider the challenging problem of predicting link quality in a changing mobile environment. In this paper we present an algorithm that uses Weighted Least Squares Regression to predict whether a given link can meet application requirements in terms of throughput, delay, and jitter. We use a simulation study to demonstrate that our algorithm is able to predict link quality accurately …


Or Best Offer: A Privacy Policy Negotiation Protocol, Eric G. Mercer, Kent E. Seamons, Daniel D. Walker Jun 2008

Or Best Offer: A Privacy Policy Negotiation Protocol, Eric G. Mercer, Kent E. Seamons, Daniel D. Walker

Faculty Publications

Privacy policy languages, such as P3P, allow websites to publish their privacy practices and policies in machine readable form. Currently, software agents designed to protect users’ privacy follow a “take it or leave it” approach that is inflexible and gives the server ultimate control. Privacy policy negotiation is one approach to leveling the playing field by allowing a client to negotiate with a server to determine how that server collects and uses the client’s data. We present a privacy policy negotiation protocol, “Or Best Offer”, that includes a formal model for specifying privacy preferences and reasoning about privacy policies. The …


Application And Evaluation Of Spatiotemporal Enhancement Of Live Aerial Video Using Temporally Local Mosaics, Dennis Eggett, Cameron Engh, Damon Gerhardt, Michael A. Goodrich, Bryan S. Morse, Nathan Rasmussen, Daniel Thornton Jun 2008

Application And Evaluation Of Spatiotemporal Enhancement Of Live Aerial Video Using Temporally Local Mosaics, Dennis Eggett, Cameron Engh, Damon Gerhardt, Michael A. Goodrich, Bryan S. Morse, Nathan Rasmussen, Daniel Thornton

Faculty Publications

Camera-equipped mini-UAVs are popular for many applications, including search and surveillance, but video from them is commonly plagued with distracting jittery motions and disorienting rotations that make it difficult for human viewers to detect objects of interest and infer spatial relationships. For time-critical search situations there are also inherent tradeoffs between detection and search speed. These problems make the use of dynamic mosaics to expand the spatiotemporal properties of the video appealing. However, for many applications it may not be necessary to maintain full mosaics of all of the video but to mosaic and retain only a number of recent …


Assessing The Costs Of Sampling Methods In Active Learning For Annotation, James Carroll, Robbie Haertel, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi Jun 2008

Assessing The Costs Of Sampling Methods In Active Learning For Annotation, James Carroll, Robbie Haertel, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi

Faculty Publications

Traditional Active Learning (AL) techniques assume that the annotation of each datum costs the same. This is not the case when annotating sequences; some sequences will take longer than others. We show that the AL technique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation user study, we approximate the amount of time necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% reduction in hours from a random …


Analysis Of Canonical Chinese Antonym Co-Occurrence, Eric K. Ringger, Guohui Liu, Shiping Liu, Xingfu Wang Mar 2008

Analysis Of Canonical Chinese Antonym Co-Occurrence, Eric K. Ringger, Guohui Liu, Shiping Liu, Xingfu Wang

Faculty Publications

PDF of Powerpoint Presentation on canonical Chinese antonym co-occurrence. This presentation was given at the Conference of the American Association for Corpus Linguistics in 2008.


Compiling And Annotating A Syriac Corpus, George Busby, James Carroll, Marc Carmen, Carl Griffin, Robbie Haertel, Kristian Heal, Joshua Heaton, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi, David Taylor Mar 2008

Compiling And Annotating A Syriac Corpus, George Busby, James Carroll, Marc Carmen, Carl Griffin, Robbie Haertel, Kristian Heal, Joshua Heaton, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi, David Taylor

Faculty Publications

PDF of Powerpoint Presentation on compiling and annotating a Syriac corpus. This presentation was given at the Conference of the American Association for Corpus Linguistics in 2008.


Accelerating Corpus Annotation Through Active Learning, George Busby, Marc Carmen, James Carroll, Robbie Haertel, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi Mar 2008

Accelerating Corpus Annotation Through Active Learning, George Busby, Marc Carmen, James Carroll, Robbie Haertel, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi

Faculty Publications

PDF of Powerpoint Presentation on accelerating corpus annotation through active learning. This presentation was given at the Conference of the American Association for Corpus Linguistics in 2008.


Sub-Symbolic Re-Representation To Facilitate Learning Transfer, Dan A. Ventura Mar 2008

Sub-Symbolic Re-Representation To Facilitate Learning Transfer, Dan A. Ventura

Faculty Publications

We consider the issue of knowledge (re-)representation in the context of learning transfer and present a subsymbolic approach for effecting such transfer. Given a set of data, manifold learning is used to automatically organize the data into one or more representational transformations, which are then learned with a set of neural networks. The result is a set of neural filters that can be applied to new data as re-representation operators. Encouraging preliminary empirical results elucidate the approach and demonstrate its feasibility, suggesting possible implications for the broader field of creativity.


Learning Policies For Embodied Virtual Agents Through Demonstration, Jonathan Dinerstein, Parris K. Egbert, Dan A. Ventura Jan 2008

Learning Policies For Embodied Virtual Agents Through Demonstration, Jonathan Dinerstein, Parris K. Egbert, Dan A. Ventura

Faculty Publications

Although many powerful AI and machine learning techniques exist, it remains difficult to quickly create AI for embodied virtual agents that produces visually lifelike behavior. This is important for applications (e.g., games, simulators, interactive displays) where an agent must behave in a manner that appears human-like. We present a novel technique for learning reactive policies that mimic demonstrated human behavior. The user demonstrates the desired behavior by dictating the agent’s actions during an interactive animation. Later, when the agent is to behave autonomously, the recorded data is generalized to form a continuous state-to-action mapping. Combined with an appropriate animation algorithm …


A Reductio Ad Absurdum Experiment In Sufficiency For Evaluating (Computational) Creative Systems, Dan A. Ventura Jan 2008

A Reductio Ad Absurdum Experiment In Sufficiency For Evaluating (Computational) Creative Systems, Dan A. Ventura

Faculty Publications

We consider a combination of two recent proposals for characterizing computational creativity and explore the sufficiency of the resultant framework. We do this in the form of a gedanken experiment designed to expose the nature of the framework, what it has to say about computational creativity, how it might be improved and what questions this raises.


Utilizing Phrase-Similarity Measures For Detecting And Clustering Informative Rss News Articles, Yiu-Kai D. Ng, Maria Soledad Pera Jan 2008

Utilizing Phrase-Similarity Measures For Detecting And Clustering Informative Rss News Articles, Yiu-Kai D. Ng, Maria Soledad Pera

Faculty Publications

As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge numbers of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-based phrase matching (CPM) model and a fuzzy compatibility clustering (FCC) model. CPM can detect RSS news articles containing phrases that are the same as well as semantically alike, and dictate the degrees …


Automatic Composition Of Themed Mood Pieces, Heather Chan, Dan A. Ventura Jan 2008

Automatic Composition Of Themed Mood Pieces, Heather Chan, Dan A. Ventura

Faculty Publications

Musical harmonization of a given melody is a nontrivial problem; slight variations in instrumentation, voicing, texture, and bass rhythm can lead to significant differences in the mood of the resulting piece. This study explores the possibility of automatic musical composition by using machine learning and statistical natural language processing to tailor a piece to a particular mood using an existing melody.


Adapting Adtrees For High Arity Features, Irene Langkilde-Geary, Robert Van Dam, Dan A. Ventura Jan 2008

Adapting Adtrees For High Arity Features, Irene Langkilde-Geary, Robert Van Dam, Dan A. Ventura

Faculty Publications

ADtrees, a data structure useful for caching sufficient statistics, have been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. For low arity symbolic features, ADtrees trade a slight increase in query time for a reduction in overall tree size. Unfortunately, for high arity features, the same technique can often result in a very large increase in query time and a nearly negligible tree size reduction. In the dynamic (lazy) version of the tree, both query time and tree size can increase for some applications. Here we present two modifications to …


Sentiment Regression: Using Real-Valued Scores To Summarize Overall Document Sentiment, Adam Drake, Eric K. Ringger, Dan A. Ventura Jan 2008

Sentiment Regression: Using Real-Valued Scores To Summarize Overall Document Sentiment, Adam Drake, Eric K. Ringger, Dan A. Ventura

Faculty Publications

In this paper, we consider a sentiment regression problem: summarizing the overall sentiment of a review with a real-valued score. Empirical results on a set of labeled reviews show that real-valued sentiment modeling is feasible, as several algorithms improve upon baseline performance. We also analyze performance as the granularity of the classification problem moves from two-class (positive vs. negative) towards infinite-class (real-valued).