Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Entire DC Network

Chunk-Based Ebmt, Jae Dong Kim, Ralf D. Brown, Jaime G. Carbonell May 2013

Chunk-Based Ebmt, Jae Dong Kim, Ralf D. Brown, Jaime G. Carbonell

Jaime G. Carbonell

Corpus driven machine translation approaches such as Phrase-Based Statistical Machine Translation and Example-Based Machine Translation have been successful by using word alignment to find translation fragments for matched source parts in a bilingual training corpus. However, they still cannot properly deal with systematic translation for insertion or deletion words between two distant languages. In this work, we used syntactic chunks as translation units to alleviate this problem, improve alignments and show improvement in BLEU for Korean to English and Chinese to English translation tasks.


Cost Complexity Of Proactive Learning Via A Reduction To Realizable Active Learning, Liu Yang, Jaime G. Carbonell May 2013

Cost Complexity Of Proactive Learning Via A Reduction To Realizable Active Learning, Liu Yang, Jaime G. Carbonell

Jaime G. Carbonell

Proactive Learning is a generalized form of active learning with multiple oracles exhibiting different reliabilities (label noise) and costs. We propose a general approach for Proactive Learning that explicitly addresses the cost vs. reliability tradeoff for oracle and instance selection. We formulate the problem in the PAC learning framework with bounded noise, and transform it into realizable active learning via a reduction technique, while keeping the overall query cost small. We propose two types of sequential hypothesis tests (denoted as SeqHT) that estimate the label of a given query from the noisy replies of different oracles with varying reliabilities and …


Analysis Of Uncertain Data: Evaluation Of Given Hypotheses, Anatole Gershman, Eugene Fink, Bin Fu, Jaime G. Carbonell May 2013

Analysis Of Uncertain Data: Evaluation Of Given Hypotheses, Anatole Gershman, Eugene Fink, Bin Fu, Jaime G. Carbonell

Jaime G. Carbonell

We consider the problem of heuristic evaluation of given hypotheses based on limited observations, in situations when available data are insufficient for rigorous statistical analysis.


Active Learning And Crowd-Sourcing For Machine Translation, Vamshi Ambati, Stephan Vogel, Jaime G. Carbonell May 2013

Active Learning And Crowd-Sourcing For Machine Translation, Vamshi Ambati, Stephan Vogel, Jaime G. Carbonell

Jaime G. Carbonell

In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We …


A Probabilistic Framework To Learn From Multiple Annotators With Time-Varying Accuracy, Pinar Donmez, Jaime G. Carbonell, Jeff Schneider May 2013

A Probabilistic Framework To Learn From Multiple Annotators With Time-Varying Accuracy, Pinar Donmez, Jaime G. Carbonell, Jeff Schneider

Jaime G. Carbonell

This paper addresses the challenging problem of learning from multiple annotators whose labeling accuracy (reliability) differs and varies over time. We propose a framework based on Sequential Bayesian Estimation to learn the expected accuracy at each time step while simultaneously deciding which annotators to query for a label in an incremental learning framework. We develop a variant of the particle filtering method that estimates the expected accuracy at every time step by sets of weighted samples and performs sequential Bayes updates. The estimated expected accuracies are then used to decide which annotators to be queried at the next time step. …


Alternative Paths In Hiv-1 Targeted Human Signal Transduction Pathways, Sivaraman Balakrishnan, Oznur Tastan, Jaime Carbonell, Judith Klein-Seetharaman May 2013

Alternative Paths In Hiv-1 Targeted Human Signal Transduction Pathways, Sivaraman Balakrishnan, Oznur Tastan, Jaime Carbonell, Judith Klein-Seetharaman

Jaime G. Carbonell

Background:

Human immunodeficiency virus-1 (HIV-1) has a minimal genome of only 9 genes, which encode 15 proteins. HIV-1 thus depends on the human host for virtually every aspect of its life cycle. The universal language of communication in biological systems, including between pathogen and host, is via signal transduction pathways. The fundamental units of these pathways are protein protein interactions. Understanding the functional significance of HIV-1, human interactions requires viewing them in the context of human signal transduction pathways.

Results:

Integration of HIV-1, human interactions with known signal transduction pathways indicates that the majority of known human pathways have the …


Analysis Of Uncertain Data: Smoothing Of Histograms, Eugene Fink, Ankur Sarin, Jaime G. Carbonell May 2013

Analysis Of Uncertain Data: Smoothing Of Histograms, Eugene Fink, Ankur Sarin, Jaime G. Carbonell

Jaime G. Carbonell

We consider the problem of converting a set of numeric data points into a smoothed approximation ofthe underlying probability distribution. We describe arepresentation of distributions by histograms with variable-width bars, and give a greedy smoothing algorithm based on this representation.


Active Learning For Membrane Protein Structure Prediction, Hatice U. Osmanbeyoglu, Jessica A. Wehner, Jaime G. Carbonell, Madhavi K. Ganapathiraju May 2013

Active Learning For Membrane Protein Structure Prediction, Hatice U. Osmanbeyoglu, Jessica A. Wehner, Jaime G. Carbonell, Madhavi K. Ganapathiraju

Jaime G. Carbonell

Background: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few …


Temporal Collaborative Filtering With Bayesian Probabilistic Tensor Factorization, Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell May 2013

Temporal Collaborative Filtering With Bayesian Probabilistic Tensor Factorization, Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell

Jaime G. Carbonell

Real-world relational data are seldom stationary, yet traditional collaborative filtering algorithms generally rely on this assumption. Motivated by our sales prediction problem, we propose a factor-based algorithm that is able to take time into account. By introducing additional factors for time, we formalize this problem as a tensor factorization with a special constraint on the time dimension. Further, we provide a fully Bayesian treatment to avoid tuning parameters and achieve automatic model complexity control. To learn the model we develop an e±cient sampling procedure that is capable of analyzing large-scale data sets. This new algorithm, called Bayesian Probabilistic Tensor Factorization …


Pairwise Document Classification For Relevance Feedback, Jonathan L. Elsas, Pinar Donmez, Jaime Callan, Jaime G. Carbonell May 2013

Pairwise Document Classification For Relevance Feedback, Jonathan L. Elsas, Pinar Donmez, Jaime Callan, Jaime G. Carbonell

Jaime G. Carbonell

In this paper we present Carnegie Mellon University's submission to the TREC 2009 Relevance Feedback Track. In this submission we take a classi cation approach on document pairs to using relevance feedback information. We explore using textual and non-textual document-pair features to classify unjudged documents as relevant or non-relevant, and use this prediction to re-rank a baseline document retrieval. These features include co-citation measures, URL similarities, as well as features often used in machine learning systems for document ranking such as the difference in scores assigned by the baseline retrieval system.


Graph-Structured Multi-Task Regression And An Efficient Optimization Method For General Fused Lasso Manuscript, Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, Eric P. Xing May 2013

Graph-Structured Multi-Task Regression And An Efficient Optimization Method For General Fused Lasso Manuscript, Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, Eric P. Xing

Jaime G. Carbonell

We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as l1/l2 -regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty …