Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Autonomous Estimates Of Horizontal Decorrelation Lengths For Digital Elevation Models, Andres Corrada-Emmanuel, Howard Schultz Jan 2008

Autonomous Estimates Of Horizontal Decorrelation Lengths For Digital Elevation Models, Andres Corrada-Emmanuel, Howard Schultz

Andrés Corrada-Emmanuel

The precision errors in a collection of digital elevation models (DEMs) can be estimated in the presence of large but sparse correlations even when no ground truth is known. We demonstrate this by considering the problem of how to estimate the horizontal decorrelation length of DEMs produced by an automatic photogrammetric process that relies on the epipolar constraint equations. The procedure is based on a set of autonomous elevation difference equations recently proposed by us. In this paper we show that these equations can only estimate the precision errors of DEMs. The accuracy errors are unknowable since there is no …


A Unified Approach For Schema Matching, Coreference And Canonicalization, Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew Mccallum Jan 2008

A Unified Approach For Schema Matching, Coreference And Canonicalization, Michael Wick, Khashayar Rohanimanesh, Karl Schultz, Andrew Mccallum

Andrew McCallum

The automatic consolidation of database records from many heterogeneous sources into a single repository requires solving several information integration tasks. Although tasks such as coreference, schema matching, and canonicalization are closely related, they are most commonly studied in isolation. Systems that do tackle multiple integration problems traditionally solve each independently, allowing errors to propagate from one task to another. In this paper, we describe a discriminatively-trained model that reasons about schema matching, coreference, and canonicalization jointly. We evaluate our model on a real-world data set of people and demonstrate that simultaneously solving these tasks reduces errors over a cascaded or …


Factorie: Efficient Probabilistic Programming For Relational Factor Graphs Via Imperative Declarations Of Structure, Inference And Learning, Andrew Mccallum, Khashayar Rohanimanesh, Michael Wick, Karl Schultz, Sameer Singh Jan 2008

Factorie: Efficient Probabilistic Programming For Relational Factor Graphs Via Imperative Declarations Of Structure, Inference And Learning, Andrew Mccallum, Khashayar Rohanimanesh, Michael Wick, Karl Schultz, Sameer Singh

Andrew McCallum

No abstract provided.


Map Inference In Large Factor Graphs With Reinforcement Learning, Khashayar Rohanimanesh, Michael Wick, Sameer Singh, Andrew Mccallum Jan 2008

Map Inference In Large Factor Graphs With Reinforcement Learning, Khashayar Rohanimanesh, Michael Wick, Sameer Singh, Andrew Mccallum

Andrew McCallum

Large, relational factor graphs with structure defined by first-order logic or other languages give rise to notoriously difficult inference problems. Because unrolling the structure necessary to represent distributions over all hypotheses has exponential blow-up, solutions are often derived from MCMC. However, because of limitations in the design and parameterization of the jump function, these sampling-based methods suffer from local minima--the system must transition through lower-scoring configurations before arriving at a better MAP solution. This paper presents a new method of explicitly selecting fruitful downward jumps by leveraging reinforcement learning (RL) to model delayed reward with a log-linear function approximation of …


Learning To Predict The Quality Of Contributions To Wikipedia, Gregory Druck, Gerome Miklau, Andrew Mccallum Jan 2008

Learning To Predict The Quality Of Contributions To Wikipedia, Gregory Druck, Gerome Miklau, Andrew Mccallum

Andrew McCallum

Although some have argued that Wikipedia's open edit policy is one of the primary reasons for its success, it also raises concerns about quality --- vandalism, bias, and errors can be problems. Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicated Wikipedia community and ``good Samaritan" users. As Wikipedia continues to grow, however, it becomes more difficult for these users to keep up with the increasing number of articles and edits. This motivates the development of tools to assist users in creating and maintaining quality. In this paper, we propose …


Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum Jan 2008

Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum

Andrew McCallum

Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent---because venues tend to focus on a few research areas---but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set …


Learning From Labeled Features Using Generalized Expectation Criteria, Gregory Druck, Gideon Mann, Andrew Mccallum Jan 2008

Learning From Labeled Features Using Generalized Expectation Criteria, Gregory Druck, Gideon Mann, Andrew Mccallum

Andrew McCallum

It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled …


Generalized Expectation Criteria For Semi-Supervised Learning Of Conditional Random Fields, Gideon S. Mann, Andrew Mccallum Jan 2008

Generalized Expectation Criteria For Semi-Supervised Learning Of Conditional Random Fields, Gideon S. Mann, Andrew Mccallum

Andrew McCallum

This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model's distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instance-labeling to feature-labeling, and the methods presented outperform traditional CRF …