Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Massachusetts Amherst

Andrew McCallum

Artificial intelligence

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Learning From Labeled Features Using Generalized Expectation Criteria, Gregory Druck, Gideon Mann, Andrew Mccallum Jan 2008

Learning From Labeled Features Using Generalized Expectation Criteria, Gregory Druck, Gideon Mann, Andrew Mccallum

Andrew McCallum

It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled …


Rapid Development Of Hindi Named Entity Recognition Using Conditional Random Fields And Feature Induction, Wei Li, Andrew Mccallum Jan 2003

Rapid Development Of Hindi Named Entity Recognition Using Conditional Random Fields And Feature Induction, Wei Li, Andrew Mccallum

Andrew McCallum

This paper describes our application of Conditional Random Fields (CRFs) with feature induction to a Hindi named entity recognition task. With only five days development time and little knowledge of this language, we automatically discover relevant features by providing a large array of lexical tests and using feature induction to automatically construct the features that most increase conditional likelihood. In an effort to reduce overfitting, we use a combination of a Gaussian prior and early-stopping based on the results of 10-fold cross validation.