Computer Sciences | Open Access Articles | Digital Commons Network™

A Conditional Model Of Deduplication For Multi-Type Relational Data, Aron Culotta, Andrew Mccallum Jan 2005

A Conditional Model Of Deduplication For Multi-Type Relational Data, Aron Culotta, Andrew Mccallum

Andrew McCallum

Record deduplication is the task of merging database records that refer to the same underlying entity. In relational databases, accurate deduplication for records of one type is often dependent on the merge decisions made for records of other types. Whereas nearly all previous approaches have merged records of different types independently, this work models these inter-dependencies explicitly to collectively deduplicate records of multiple types. We construct a conditional random field model of deduplication that captures these relational dependencies, and then employ a novel relational partitioning algorithm to jointly deduplicate records. We evaluate the system on two citation matching datasets, for …

Full-Text Articles in Computer Sciences

A Conditional Model Of Deduplication For Multi-Type Relational Data, Aron Culotta, Andrew Mccallum

Andrew McCallum

Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum

Andrew McCallum

Multi-Way Distributional Clustering Via Pairwise Interactions, Ron Bekkerman, Ran El-Yaniv, Andrew Mccallum

Andrew McCallum

A Note On Topical N-Grams, Xuerui Wang, Andrew Mccallum

Andrew McCallum

Automatic Categorization Of Email Into Folders: Benchmark Experiments On Enron And Sri Corpora, Ron Bekkerman, Andrew Mccallum, Gary Huang

Andrew McCallum

Composition Of Conditional Random Fields For Transfer Learning, Charles Sutton, Andrew Mccallum

Andrew McCallum

Feature Bagging: Preventing Weight Undertraining In Structured Discriminative Learning, Charles Sutton, Michael Sindelar, Andrew Mccallum

Andrew McCallum

Sparse Forward-Backward For Fast Training Of Conditional Random Fields, Charles Sutton, Chris Pal, Andrew Mccallum

Andrew McCallum

Direct Maximization Of Rank-Based Metrics For Information Retrieval, Donald A. Metzler, W. Bruce Croft, Andrew Mccallum

Andrew McCallum

Reducing Labeling Effort For Structured Prediction Tasks, Aron Culotta, Andrew Mccallum

Andrew McCallum

Joint Parsing And Semantic Role Labeling, Charles Sutton, Andrew Mccallum

Andrew McCallum

Gene Prediction With Conditional Random Fields, Aron Culotta, David Kulp, Andrew Mccallum

Andrew McCallum

Disambiguating Web Appearances Of People In A Social Network, Ron Bekkerman, Andrew Mccallum

Andrew McCallum

Reducing Weight Undertraining In Structured Discriminative Learning, Charles Sutton, Michael Sindelar, Andrew Mccallum

Andrew McCallum

Fast, Piecewise Training For Discriminative Finite-State And Parsing Models, Charles Sutton, Andrew Mccallum

Andrew McCallum

Piecewise Training For Undirected Models, Charles Sutton, Andrew Mccallum

Andrew McCallum