Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 120

Full-Text Articles in Physical Sciences and Mathematics

Latent Relation Representations For Universal Schemas, Sebastian Riedel, Limin Yao, Andrew Mccallum Jan 2013

Latent Relation Representations For Universal Schemas, Sebastian Riedel, Limin Yao, Andrew Mccallum

Andrew McCallum

No abstract provided.


Relation Extraction With Matrix Factorization And Universal Schemas, Sebastian Riedel, Limin Yao, Andrew Mccallum, Benjamin M. Marlin Jan 2013

Relation Extraction With Matrix Factorization And Universal Schemas, Sebastian Riedel, Limin Yao, Andrew Mccallum, Benjamin M. Marlin

Andrew McCallum

Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing structured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all involved schemas (surface form predicates as in OpenIE, and relations in the schemas of pre-existing databases). This schema has an almost unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database …


Open Scholarship And Peer Review: A Time For Experimentation, David Soergel, Adam Saunders, Andrew Mccallum Jan 2013

Open Scholarship And Peer Review: A Time For Experimentation, David Soergel, Adam Saunders, Andrew Mccallum

Andrew McCallum

Across a wide range of scientific communities, there is growing interest in accelerating and improving the progress of scholarship by making the peer review process more open. Multiple new publication venues and services are arising, especially in the life sciences, but each represents a single point in the multi-dimensional landscape of paper and review access for authors, reviewers and readers. In this paper, we introduce a vocabulary for describing the landscape of choices regarding open access, formal peer review, and public commentary. We argue that the opportunities and pitfalls of open peer review warrant experimentation in these dimensions, and discuss …


Dynamic Knowledge-Base Alignment For Coreference Resolution, Jianping Zheng, Luke Vilnis, Sameer Singh, Jinho D. Choi, Andrew Mccallum Jan 2013

Dynamic Knowledge-Base Alignment For Coreference Resolution, Jianping Zheng, Luke Vilnis, Sameer Singh, Jinho D. Choi, Andrew Mccallum

Andrew McCallum

Coreference resolution systems can benefit greatly from inclusion of global con- text, and a number of recent approaches have demonstrated improvements when precomputing an alignment to external knowledge sources. However, since alignment itself is a challenging task and is often noisy, existing systems either align conservatively, resulting in very few links, or combine the attributes of multiple candidates, leading to a conflation of entities. Our approach instead maintains ranked lists of candidate entities that are dynamically merged and reranked during inference. Further, we incorporate a large set of surface string variations for each entity by using anchor texts from the …


Combining Joint Models For Biomedical Event Extraction, David Mcclosky, Sebastian Riedel, Mihai Surdeanu, Andrew Mccallum, Christopher D. Manning Jun 2012

Combining Joint Models For Biomedical Event Extraction, David Mcclosky, Sebastian Riedel, Mihai Surdeanu, Andrew Mccallum, Christopher D. Manning

Andrew McCallum

Background: We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Our primary focus is on stacking where the predictions from the Stanford system are used as features in the UMass system. For comparison, we look at simpler model combination techniques such as intersection and union which require only the outputs from each system and combine them directly. Results: First, we find that stacking substantially improves performance while …


Topic Models Conditioned On Arbitrary Features With Dirichlet-Multinomial Regression, David Mimno, Andrew Mccallum Jun 2012

Topic Models Conditioned On Arbitrary Features With Dirichlet-Multinomial Regression, David Mimno, Andrew Mccallum

Andrew McCallum

Although fully generative models have been successfully used to model the contents of text documents, they are often awkward to apply to combinations of text data and document metadata. In this paper we propose a Dirichlet-multinomial regression (DMR) topic model that includes a log-linear prior on document-topic distributions that is a function of observed features of the document, such as author, publication venue, references, and dates. We show that by selecting appropriate features, DMR topic models can meet or exceed the performance of several previously published topic models designed for specic data.


Inference By Minimizing Size, Divergence, Or Their Sum, Sebastian Riedel, David A. Smith, Andrew Mccallum Jan 2012

Inference By Minimizing Size, Divergence, Or Their Sum, Sebastian Riedel, David A. Smith, Andrew Mccallum

Andrew McCallum

We speed up marginal inference by ignoring factors that do not significantly contribute to overall accuracy. In order to pick a suitable subset of factors to ignore, we propose three schemes: minimizing the number of model factors under a bound on the KL divergence between pruned and full models; minimizing the KL divergence under a bound on factor count; and minimizing the weighted sum of KL divergence and factor count. All three problems are solved using an approximation of the KL divergence than can be calculated in terms of marginals computed on a simple seed graph. Applied to synthetic image …


Unsupervised Relation Discovery With Sense Disambiguation, Limin Yao, Sebastian Riedel, Andrew Mccallum Jan 2012

Unsupervised Relation Discovery With Sense Disambiguation, Limin Yao, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between …


A Discriminative Hierarchical Model For Fast Coreference At Large Scale, Michael Wick, Sameer Singh, Andrew Mccallum Jan 2012

A Discriminative Hierarchical Model For Fast Coreference At Large Scale, Michael Wick, Sameer Singh, Andrew Mccallum

Andrew McCallum

Methods that measure compatibility between mention pairs are currently the dominant approach to coreference. However, they suffer from a number of drawbacks including difficulties scaling to large numbers of mentions and limited representational power. As the severity of these drawbacks continue to progress with the growing demand for more data, the need to replace the pairwise approaches with a more expressive, highly scalable alternative is becoming increasingly urgent. In this paper we propose a novel discriminative hierarchical model that recursively structures entities into trees. These trees succinctly summarize the mentions providing a highly-compact information-rich structure for reasoning about entities and …


Map Inference In Chains Using Column Generation, David Belanger, Alexandre Passos, Sebastian Riedel, Andrew Mccallum Jan 2012

Map Inference In Chains Using Column Generation, David Belanger, Alexandre Passos, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

Linear chains and trees are basic building blocks in many applications of graphical models. Although exact inference in these models can be performed by dynamic programming, this computation can still be prohibitively expensive with non-trivial target variable domain sizes due to the quadratic dependence on this size. Standard message-passing algorithms for these problems are inefficient because they compute scores on hypotheses for which there is strong negative local evidence. For this reason there has been significant previous interest in beam search and its variants; however, these methods provide only approximate inference. This paper presents new efficient exact inference algorithms based …


Learning To Speed Up Map Decoding With Column Generation, D. Belanger, A. Passos, S. Riedel, Andrew Mccallum Jan 2012

Learning To Speed Up Map Decoding With Column Generation, D. Belanger, A. Passos, S. Riedel, Andrew Mccallum

Andrew McCallum

In this paper, we show how the connections between max-product message passing for max-product and linear programming relaxations allow for a more efficient exact algorithm for the MAP problem. Our proposed algorithm uses column generation to pass messages only on a small subset of the possible assignments to each variable, while guaranteeing to find the exact solution. This algorithm is three times faster than Viterbi decoding for part-of-speech tagging on WSJ data and equivalently fast as beam search with a beam of size two while being exact. The empirical performance of column generation depends on how quickly we can rule …


Monte Carlo Mcmc: Efficient Inference By Approximate Sampling, Sameer Singh, Michael Wick, Andrew Mccallum Jan 2012

Monte Carlo Mcmc: Efficient Inference By Approximate Sampling, Sameer Singh, Michael Wick, Andrew Mccallum

Andrew McCallum

Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure---higher tree-width, larger fan-out, more features, and more data---rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sampling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference …


Monte Carlo Mcmc: Efficient Inference By Sampling Factors, Sameer Singh, Michael Wick, Andrew Mccallum Jan 2012

Monte Carlo Mcmc: Efficient Inference By Sampling Factors, Sameer Singh, Michael Wick, Andrew Mccallum

Andrew McCallum

Discriminative graphical models such as conditional random fields and Markov logic net- works have achieved state of the art results in a variety of NLP and IE tasks including coreference and relation extraction. Increasingly, automated knowledge extraction is demanding models with more complex structure— higher tree-width, larger fan-out, more features, more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose a new MCMC sampling scheme where transition probabilities are approximated. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. For a task of author coreference …


Probabilistic Databases Of Universal Schema, Limin Yao, Sebastian Riedel, Andrew Mccallum Jan 2012

Probabilistic Databases Of Universal Schema, Limin Yao, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

In data integration we transform information from a source into a target schema. A general problem in this task is loss of fidelity and coverage: the source expresses more knowledge than can fit into the target schema, or knowledge that is hard to fit into any schema at all. This problem is taken to an extreme in information extraction (IE) where the source is natural language. To address this issue, one can either automatically learn a latent schema emergent in text (a brittle and ill-defined task), or manually extend schemas. We propose instead to store data in a probabilistic database …


Large-Scale Cross-Document Coreference Using Distributed Inference And Hierarchical Models, Sameer Singh, Amarnag Subramanya, Refnando Pereira, Andrew Mccallum Jan 2011

Large-Scale Cross-Document Coreference Using Distributed Inference And Hierarchical Models, Sameer Singh, Amarnag Subramanya, Refnando Pereira, Andrew Mccallum

Andrew McCallum

Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus with 1.5 million mentions from links to …


Towards Asynchronous Distributed Mcmc Inference For Large Graphical Models, Sameer Singh, Andrew Mccallum Jan 2011

Towards Asynchronous Distributed Mcmc Inference For Large Graphical Models, Sameer Singh, Andrew Mccallum

Andrew McCallum

With increasingly cheap availability of computational resources such as storage and bandwidth, access to large amount of data has become commonplace. To perform inference over these millions of variables, there is a need to distribute the inference; however the dense, loopy structure with long-range dependencies makes the problem non-trivial. There has been some recent work in distributed inference for graphical models; however they make strong synchronization assumptions that we do not desire in large-scale models. In this work, we explore a number of approaches for distributed MCMC inference for graphical models in an asynchronous manner. The overall architecture consists of …


Samplerank: Training Factor Graphs With Atomic Gradients, Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew Mccallum Jan 2011

Samplerank: Training Factor Graphs With Atomic Gradients, Michael Wick, Khashayar Rohanimanesh, Kedar Bellare, Aron Culotta, Andrew Mccallum

Andrew McCallum

We present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23% error reduction on noun-phrase coreference).


Structured Relation Discovery Using Generative Models, Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew Mccallum Jan 2011

Structured Relation Discovery Using Generative Models, Limin Yao, Aria Haghighi, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

We explore unsupervised approaches to relation extraction between two named entities; for instance, the semantic \emph{bornIn} relation between a person and location entity. Concretely, we propose a series of generative probabilistic models, broadly similar to topic models, each which generates a corpus of observed triples of entity mention pairs and the surface syntactic dependency path between them. The output of each model is a clustering of observed relation tuples and their associated textual expressions to underlying semantic relation types. Our proposed models exploit entity type constraints within a relation as well as features on the dependency path between entity mentions. …


Toward Interactive Training And Evaluation, Gregory Druck, Andrew Mccallum Jan 2011

Toward Interactive Training And Evaluation, Gregory Druck, Andrew Mccallum

Andrew McCallum

Machine learning often relies on costly labeled data, which impedes its application to new classification and information extraction problems. This motivates the development of methods that leverage our abundant prior knowledge about these problems in learning. Several recently proposed methods incorporate prior knowledge with constraints on the expectations of a probabilistic model. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample …


Selecting Actions For Resource-Bounded Information Extraction Using Reinforcement Learning, Andrew Mccallum Jan 2011

Selecting Actions For Resource-Bounded Information Extraction Using Reinforcement Learning, Andrew Mccallum

Andrew McCallum

Given a database with missing or uncertain information, our goal is to extract specific information from a large cor- pus such as the Web under limited resources. We formu- late the information gathering task as a series of alterna- tive, resource-consuming actions to choose from and use Re- inforcement Learning to select the best action to perform at each time step. We use temporal difference Q-learning method to train the function that selects these actions, and compare it to an online, error-driven algorithm called Sam- pleRank. We present a system that finds information such as email, job title and department …


Query-Aware Mcmc, Michael Wick, Andrew Mccallum Jan 2011

Query-Aware Mcmc, Michael Wick, Andrew Mccallum

Andrew McCallum

Traditional approaches to probabilistic inference, such as loopy belief propagation and Gibbs sampling, typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user's interests are more focused and may be specified by a query over a subset of the model's variables. In this case it would be wasteful to uniformly Gibbs sample one million variables in a model when the query concerns only ten variables. In this paper we propose a query-specific approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order …


Inducing Value Sparsity For Parallel Inference In Tree-Shaped Models, Sameer Singh, Brian Martin, Andrew Mccallum Jan 2011

Inducing Value Sparsity For Parallel Inference In Tree-Shaped Models, Sameer Singh, Brian Martin, Andrew Mccallum

Andrew McCallum

With easy access to multi-core parallelism, the machine learning community needs to take this additional form of flexibility into account. In this work, we study inference in tree-shaped models. Since the marginals of many variables at the end of inference are often peaked, detecting this value sparsity at earlier stages of inference can be utilizes to dynamically decompose the model into smaller pieces (“islands of certainty”). Given the computational constraints on time and number of cores, we can set a parameter for our inference algorithm to provide the best accuracy.


Learning To Select Actions For Resource-Bounded Information Extraction, P. Kinani, Andrew Mccallum Jan 2011

Learning To Select Actions For Resource-Bounded Information Extraction, P. Kinani, Andrew Mccallum

Andrew McCallum

Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes …


Discovering Issue-Based Voting Groups Within The Us Senate, Rachel Shorey, Andrew Mccallum, Hanna Wallach Dec 2010

Discovering Issue-Based Voting Groups Within The Us Senate, Rachel Shorey, Andrew Mccallum, Hanna Wallach

Andrew McCallum

Members of the US Senate cast votes on a wide array of issues. Understanding a senator's position on an issue is important to constituents, sources of campaign funding, and groups seeking to persuade senators or build consensus. Classifying senators' positions often falls into the hands of interest groups. Many lobbyists and issue-based organizations give senators scores based on the number of times senators vote in accordance with the organization's ideals. Organization staff must choose which bills to consider and then investigate their content manually. To produce more objective and replicable rankings, political scientists have developed statistical models to group and …


Distantly Labeling Data For Large Scale Cross-Document Coreference, Sameer Singh, Michael Wick, Andrew Mccallum May 2010

Distantly Labeling Data For Large Scale Cross-Document Coreference, Sameer Singh, Michael Wick, Andrew Mccallum

Andrew McCallum

Cross-document coreference, the problem of resolving entity mentions across multi-document collections, is crucial to automated knowledge base construction and data mining tasks. However, the scarcity of large labeled data sets has hindered supervised machine learning research for this task. In this paper we develop and demonstrate an approach based on “distantly-labeling” a data set from which we can train a discriminative cross-document coreference model. In particular we build a dataset of more than a million people mentions extracted from 3:5 years of New York Times articles, leverage Wikipedia for distant labeling with a generative model (and measure the reliability of …


Scalable Probabilistic Databases With Factor Graphs And Mcmc, Michael Wick, Andrew Mccallum, Gerome Miklau May 2010

Scalable Probabilistic Databases With Factor Graphs And Mcmc, Michael Wick, Andrew Mccallum, Gerome Miklau

Andrew McCallum

Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary …


High-Performance Semi-Supervised Learning Using Discriminatively Constrained Generative Models, Gregory Druck, Andrew Mccallum Jan 2010

High-Performance Semi-Supervised Learning Using Discriminatively Constrained Generative Models, Gregory Druck, Andrew Mccallum

Andrew McCallum

We develop a semi-supervised learning algorithm that encourages generative models to discover latent structure that is relevant to a prediction task. The method constrains the posterior distribution of latent variables under a generative model to satisfy a rich set of feature expectation constraints from labeled data. We focus on the application of this method to sequence labeling and estimate parameters with a modified EM algorithm. The E-step involves estimating the parameters of a log-linear model with an HMM as the base distribution. This HMM-CRF can be used for test time prediction. The approach is related to other semi-supervised methods, but …


Collective Cross-Document Relation Extraction Without Labelled Data, Limin Yao, Sebastian Riedel, Andrew Mccallum Jan 2010

Collective Cross-Document Relation Extraction Without Labelled Data, Limin Yao, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). For inference we run an efficient Gibbs sampler that leads to linear time joint inference. We evaluate our approach both for an in-domain (Wikipedia) and a more realistic out-of-domain (New York Times Corpus) setting. For the in-domain setting, our joint model leads to 4% …


Modeling Relations And Their Mentions Without Labeled Text, Sebastian Riedel, Limin Yao, Andrew Mccallum Jan 2010

Modeling Relations And Their Mentions Without Labeled Text, Sebastian Riedel, Limin Yao, Andrew Mccallum

Andrew McCallum

Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relation. Here we argue that this leads to noisy patterns that hurt precision, in particular if the knowledge base is not directly related to the text we are working with. We present a novel approach to distant supervision that can alleviate …


Constraint-Driven Rank-Based Learning For Information Extraction, Sameer Singh, Limin Yao, Sebastian Riedel, Andrew Mccallum Jan 2010

Constraint-Driven Rank-Based Learning For Information Extraction, Sameer Singh, Limin Yao, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

Most learning algorithms for factor graphs require complete inference over the dataset or an instance before making an update to the parameters. SampleRank is a rank-based learning framework that alleviates this problem by updating the parameters during inference. Most semi-supervised learning algorithms also rely on the complete inference, i.e. calculating expectations or MAP configurations. We extend the SampleRank framework to the semi-supervised learning, avoiding these inference bottlenecks. Different approaches for incorporating unlabeled data and prior knowledge into this framework are explored. We evaluated our method on a standard information extraction dataset. Our approach outperforms the supervised method significantly and matches …