Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 125

Full-Text Articles in Entire DC Network

Integrating Planning And Learning: The Prodigy Architecture, Manuela M. Veloso, Jaime G. Carbonell, Alicia Pérez, Daniel Borrajo, Eugene Fink, Jim Blythe May 2013

Integrating Planning And Learning: The Prodigy Architecture, Manuela M. Veloso, Jaime G. Carbonell, Alicia Pérez, Daniel Borrajo, Eugene Fink, Jim Blythe

Jaime G. Carbonell

Planning is a complex reasoning task that is well suited for the study of improving performance and knowledge by learning, i.e. by accumulation and interpretation of planning experience. PRODIGY is an architecture that integrates planning with multiple learning mechanisms. Learning occurs at the planner’s decision points and integration in PRODIGY is achieved via mutually interpretable knowledge structures. This article describes the PRODIGY planner, briefly reports on several learning modules developed earlier along the project, and presents in more detail two recently explored methods to learn to generate plans of better quality. We introduce the techniques, illustrate them with comprehensive examples, …


Discourse Pragmatics In Task-Oriented Natural Language Interfaces, Jaime G. Carbonell May 2013

Discourse Pragmatics In Task-Oriented Natural Language Interfaces, Jaime G. Carbonell

Jaime G. Carbonell

This paper reviews discourse phenomena that occur frequently in task-oriented man-machine dialogs, reporting on an empirical study that demonstrates the necessity of handling ellipsis, anaphora, extragrammaticality, inter-sentential metalanguage, and other abbreviatory devices in order to achieve convivial user interaction. Invariably, users prefer to generate terse or fragmentary utterances instead of longer, more complete "stand-alone" expressions, even when given clear instructions tO the contrary. The XCALIBUR exbert system interface is designed to meet these needs, including generalized ellipsis resolution by means of a rule-based caseframe method superior to previous semantic grammar approaches.


Summarizing Text Documents: Sentence Selection And Evaluation Metrics, Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime G. Carbonell May 2013

Summarizing Text Documents: Sentence Selection And Evaluation Metrics, Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, Jaime G. Carbonell

Jaime G. Carbonell

Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. The statistical features were adapted from standard IR methods. The potential linguistic ones were derived from an analysis of news-wire summaries. To evaluate …


Topic Detection And Tracking Pilot Study Final Report, James Allan, Jaime G. Carbonell, George Doddington, Jonathan Yamron, Yiming Yang May 2013

Topic Detection And Tracking Pilot Study Final Report, James Allan, Jaime G. Carbonell, George Doddington, Jonathan Yamron, Yiming Yang

Jaime G. Carbonell

Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem consists of three major tasks: (1) segmenting a stream of data, especially recognized speech, into distinct stories; (2) identifying those news stories that are the first to discuss a new event occurring in the news; and (3) given a small number of sample news stories about an event, finding all following stories in the stream. The TDT Pilot Study ran from September 1996 through October 1997. The primary …


The Translation Correction Tool: English-Spanish User Studies, Ariadna Font-Llitjos, Jaime G. Carbonell May 2013

The Translation Correction Tool: English-Spanish User Studies, Ariadna Font-Llitjos, Jaime G. Carbonell

Jaime G. Carbonell

Machine translation systems should improve with feedback from post-editors, but none do beyond the very localized benefit of adding the corrected translation to parallel training data (for statistical and example-base MTS) or a memory data base. Rule based systems to date improve only via manual debugging. In contrast, we introduce a largely automated method for capturing more information from the human post-editor, so that corrections may be performed automatically to translation grammar rules and lexical entries. This paper focuses on the information capture phase and reports on an experiment with English-Spanish translation.


Chunk-Based Ebmt, Jae Dong Kim, Ralf D. Brown, Jaime G. Carbonell May 2013

Chunk-Based Ebmt, Jae Dong Kim, Ralf D. Brown, Jaime G. Carbonell

Jaime G. Carbonell

Corpus driven machine translation approaches such as Phrase-Based Statistical Machine Translation and Example-Based Machine Translation have been successful by using word alignment to find translation fragments for matched source parts in a bilingual training corpus. However, they still cannot properly deal with systematic translation for insertion or deletion words between two distant languages. In this work, we used syntactic chunks as translation units to alleviate this problem, improve alignments and show improvement in BLEU for Korean to English and Chinese to English translation tasks.


Prediction Of Interactions Between Hiv-1 And Human Proteins By Information Integration, Osnur Tastan, Yanjun Qi, Jaime G. Carbonell, Judith Klein-Seetharaman May 2013

Prediction Of Interactions Between Hiv-1 And Human Proteins By Information Integration, Osnur Tastan, Yanjun Qi, Jaime G. Carbonell, Judith Klein-Seetharaman

Jaime G. Carbonell

Human immunodeficiency virus-1 (HIV-1) in acquired immune deficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. Knowledge of the set of interacting human and viral proteins would greatly contribute to our understanding of the mechanisms of infection and subsequently to the design of new therapeutic approaches. This work is the first attempt to predict the global set of interactions between HIV-1 and human host cellular proteins. We propose a supervised learning framework, where multiple information data sources are utilized, including cooccurrence of functional motifs and their interaction domains and protein classes, gene …


The Universal Parser Architecture For Knowledge-Based Machine Translation, Masaru Tomita, Jaime G. Carbonell May 2013

The Universal Parser Architecture For Knowledge-Based Machine Translation, Masaru Tomita, Jaime G. Carbonell

Jaime G. Carbonell

Machine translation should be semanticalty-accurate, linguisticallyprincipled, user-interactive, and extensible to multiple languages and domains. This paper presents the universal parser architecture that strives to meet these objectives. In essence, linguistic knowledge bases (syntactic, semantic, lexical, pragmatic), encoded in theoretically-motivated formalisms such as lexical-functional grammars, are unified and precompiled into fast run-time grammars for parsing and generation. Thus, the universal parser provides principled run-time integration of syntax and semantics, while preserving the generality of domain-independent syntactic grammars, and language-independent domain knowledge bases; the optimized cross product is generated automatically in the precornpllation phase. Initial results for bi-directional English-Japanese translation show considerable …


Cost Complexity Of Proactive Learning Via A Reduction To Realizable Active Learning, Liu Yang, Jaime G. Carbonell May 2013

Cost Complexity Of Proactive Learning Via A Reduction To Realizable Active Learning, Liu Yang, Jaime G. Carbonell

Jaime G. Carbonell

Proactive Learning is a generalized form of active learning with multiple oracles exhibiting different reliabilities (label noise) and costs. We propose a general approach for Proactive Learning that explicitly addresses the cost vs. reliability tradeoff for oracle and instance selection. We formulate the problem in the PAC learning framework with bounded noise, and transform it into realizable active learning via a reduction technique, while keeping the overall query cost small. We propose two types of sequential hypothesis tests (denoted as SeqHT) that estimate the label of a given query from the noisy replies of different oracles with varying reliabilities and …


Scheduling With Uncertain Resources: Representation And Utility Function, Ulas Bardak, Eugene Fink, Jaime G. Carbonell May 2013

Scheduling With Uncertain Resources: Representation And Utility Function, Ulas Bardak, Eugene Fink, Jaime G. Carbonell

Jaime G. Carbonell

We describe the representation of uncertain knowledge in a conference-scheduling system, which may include incomplete information about available resources, conference events, and scheduling constraints. We then explain the use of this incomplete knowledge in the evaluation of schedule quality.


Analysis Of Uncertain Data: Evaluation Of Given Hypotheses, Anatole Gershman, Eugene Fink, Bin Fu, Jaime G. Carbonell May 2013

Analysis Of Uncertain Data: Evaluation Of Given Hypotheses, Anatole Gershman, Eugene Fink, Bin Fu, Jaime G. Carbonell

Jaime G. Carbonell

We consider the problem of heuristic evaluation of given hypotheses based on limited observations, in situations when available data are insufficient for rigorous statistical analysis.


Exploring Massive Structured Data In Argus, Jaime G. Carbonell, Eugene Fink, Chun Jin, Cenk Gazen May 2013

Exploring Massive Structured Data In Argus, Jaime G. Carbonell, Eugene Fink, Chun Jin, Cenk Gazen

Jaime G. Carbonell

Project Argus is focused on helping an analyst explore massive, structured data. This exploration includes exact and partial match queries, monitoring hypotheses and discovery of new patterns in both static and streaming data. We provide these facilities within the context of a workbench interface, called Data Explorer. We support exploration of data that is a collection of records, each of which is structured as several distinct fields. For instance, financial transfers are typically represented as structured records, with such fields as sending bank, sending account number, currency, amount, date, receiving account, etc. Most fields are well-defined, like a date, a …


Active Learning And Crowd-Sourcing For Machine Translation, Vamshi Ambati, Stephan Vogel, Jaime G. Carbonell May 2013

Active Learning And Crowd-Sourcing For Machine Translation, Vamshi Ambati, Stephan Vogel, Jaime G. Carbonell

Jaime G. Carbonell

In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We …


Metaphor And Common-Sense Reasoning, Jaime G(Jaime Guillermo) Carbonell, Steven Minton May 2013

Metaphor And Common-Sense Reasoning, Jaime G(Jaime Guillermo) Carbonell, Steven Minton

Jaime G. Carbonell

No abstract provided.


Cluster-Based Selection Of Statistical Answering Strategies, Lucian Vlad Lita, Jaime G. Carbonell May 2013

Cluster-Based Selection Of Statistical Answering Strategies, Lucian Vlad Lita, Jaime G. Carbonell

Jaime G. Carbonell

Question answering (QA) is a highly complex task that brings together classification, clustering, retrieval, and extraction. Question answering systems include various statistical and rule-based components that combine and form multiple strategies for finding answers. However, in real-life scenarios efficiency constraints make it infeasible to simultaneously use all available strategies in a QA system. To address this issue, we present an approach for carefully selecting answering strategies that are likely to benefit individual questions, without significantly reducing performance. We evaluate the impact of strategy selection on question answering performance at several important QA stages: document retrieval, answer extraction, and answer merging. …


Machine Learning: A Historical And Methodological Analysis, Jaime G. Carbonell, Ryszard S. Michalski, Tom M. Mitchell May 2013

Machine Learning: A Historical And Methodological Analysis, Jaime G. Carbonell, Ryszard S. Michalski, Tom M. Mitchell

Jaime G. Carbonell

Machine learning has always been an integral part of artificial intelligence, and its methodology has evolved in concert with the major concerns of the field. In response to the difficulties of encoding ever-increasing volumes of knowledge in modern AI systems, many researchers have recently turned their attention to machine learning as a means to overcome the knowledge acquisition bottleneck. This article presents a taxonomic analysis of machine learning organized primarily by learning strategies and secondarily by knowledge representation and application areas. A historical survey outlining the development of various approaches to machine learning is presented from early neural networks to …


Active Sampling For Rank Learning Via Optimizing The Area Under The Roc Curve, Pinar Donmez, Jaime G. Carbonell May 2013

Active Sampling For Rank Learning Via Optimizing The Area Under The Roc Curve, Pinar Donmez, Jaime G. Carbonell

Jaime G. Carbonell

Learning ranking functions is crucial for solving many problems, ranging from document retrieval to building recommendation systems based on an individual user’s preferences or on collaborative filtering. Learning-to-rank is particularly necessary for adaptive or personalizable tasks, including email prioritization, individualized recommendation systems, personalized news clipping services and so on. Whereas the learning-to-rank challenge has been addressed in the literature, little work has been done in an active-learning framework, where requisite user feedback is minimized by selecting only the most informative instances to train the rank learner. This paper addresses active rank-learning head on, proposing a new sampling strategy based on …


A Probabilistic Framework To Learn From Multiple Annotators With Time-Varying Accuracy, Pinar Donmez, Jaime G. Carbonell, Jeff Schneider May 2013

A Probabilistic Framework To Learn From Multiple Annotators With Time-Varying Accuracy, Pinar Donmez, Jaime G. Carbonell, Jeff Schneider

Jaime G. Carbonell

This paper addresses the challenging problem of learning from multiple annotators whose labeling accuracy (reliability) differs and varies over time. We propose a framework based on Sequential Bayesian Estimation to learn the expected accuracy at each time step while simultaneously deciding which annotators to query for a label in an incremental learning framework. We develop a variant of the particle filtering method that estimates the expected accuracy at every time step by sets of weighted samples and performs sequential Bayes updates. The estimated expected accuracies are then used to decide which annotators to be queried at the next time step. …


Towards A General Scientific Reasoning Engine, Jaime G.(Jaime Guillermo) Carbonell, Jill H. Larkin, F Reif May 2013

Towards A General Scientific Reasoning Engine, Jaime G.(Jaime Guillermo) Carbonell, Jill H. Larkin, F Reif

Jaime G. Carbonell

No abstract provided.


Developing Language Resources For A Transnational Digital Government System, Violetta Cavalli-Sforza, Jaime G. Carbonell, Peter J. Jansen May 2013

Developing Language Resources For A Transnational Digital Government System, Violetta Cavalli-Sforza, Jaime G. Carbonell, Peter J. Jansen

Jaime G. Carbonell

We describe ongoing efforts towards developing language resources for a transnational digital government project aimed at applying information technology (IT) to a problem of international concern: detecting and monitoring activities related to the transnational movement of illicit drugs. The project seeks to support information sharing, coordination and collaboration among government agencies within a country and across national boundaries by combining a variety of technologies including a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The prototype system is being developed by U.S. universities …


Learning By Analogy : Formulating And Generalizing Plans From Past Experience, Jaime G.(Jaime Guillermo) Carbonell May 2013

Learning By Analogy : Formulating And Generalizing Plans From Past Experience, Jaime G.(Jaime Guillermo) Carbonell

Jaime G. Carbonell

No abstract provided.


Experiments With A Hindi-To-English Transfer-Based Mt System Under A Miserly Data Scenario, Alon Lavie, Stephan Vogel, Lori Levin, Erik Peterson, Katharina Probst, Ariadna Font-Llitjos, Rachel Reynolds, Jaime G. Carbonell, Richard Cohen May 2013

Experiments With A Hindi-To-English Transfer-Based Mt System Under A Miserly Data Scenario, Alon Lavie, Stephan Vogel, Lori Levin, Erik Peterson, Katharina Probst, Ariadna Font-Llitjos, Rachel Reynolds, Jaime G. Carbonell, Richard Cohen

Jaime G. Carbonell

We describe an experiment designed to evaluate the capabilities of our trainable transfer-based (Xfer) machine translation approach, as applied to the task of Hindi-to-English translation, and trained under an extremely limited data scenario. We compare the performance of the Xfer approach with two corpus-based approaches---Statistical MT (SMT) and Example-based MT (EBMT)---under the limited data scenario. The results indicate that the Xfer system significantly outperforms both EBMT and SMT in this scenario. Results also indicate that automatically learned transfer rules are effective in improving translation performance, compared with a baseline word-to-word translation version of the system. Xfer system performance with a …


A Pairwise Ensemble Approach For Accurate Genre Classification, Yan Liu, Jaime G. Carbonell, Rong Jin May 2013

A Pairwise Ensemble Approach For Accurate Genre Classification, Yan Liu, Jaime G. Carbonell, Rong Jin

Jaime G. Carbonell

Text classification, whether by topic or genre, is an important task that contributes to text extraction, retrieval, summarization and question answering. In this paper we present a new pairwise ensemble approach, which uses pairwise Support Vector Machine (SVM) classifiers as base classifiers and “input-dependent latent variable” method for model combination. This new approach better captures the characteristics of genre classification, including its heterogeneous nature. Our experiments on two multi-genre collections and one topic-based classification datasets show that the pairwise ensemble method outperforms both boosting, which has been demonstrated as a powerful ensemble approach, and Error-Correcting Output Codes (ECOC), which applies …


Suppressing Outliers In Pairwise Preference Ranking, Vitor S. Cavalho, Jonathan L. Elsas, William W. Cohen, Jaime G. Carbonell May 2013

Suppressing Outliers In Pairwise Preference Ranking, Vitor S. Cavalho, Jonathan L. Elsas, William W. Cohen, Jaime G. Carbonell

Jaime G. Carbonell

Many of the recently proposed algorithms for learning feature-based ranking functions are based on the pairwise preference framework, in which instead of taking documents in isolation, document pairs are used as instances in the learning process [3, 5]. One disadvantage of this process is that a noisy relevance judgment on a single document can lead to a large number of mislabeled document pairs. This can jeopardize robustness and deteriorate overall ranking performance. In this paper we study the effects of outlying pairs in rank learning with pairwise preferences and introduce a new meta-learning algorithm capable of suppressing these undesirable effects. …


Symmetric Probabilistic Alignment For Example-Based Translation, Jae Dong Kim, Ralf D. Brown, Peter J. Jansen, Jaime G. Carbonell May 2013

Symmetric Probabilistic Alignment For Example-Based Translation, Jae Dong Kim, Ralf D. Brown, Peter J. Jansen, Jaime G. Carbonell

Jaime G. Carbonell

Since subsentential alignment is critically important to the translation quality of an Example-Based Machine Translation (EBMT) system which operates by finding and combining phrase-level matches against the training examples, we recently decided to develop a new alignment algorithm for the purpose of improving the EBMT system’s performance. Unlike most algorithms in the literature, this new Symmetric Probabilistic Alignment (SPA) algorithm treats the source and target languages in a symmetric fashion. In this paper, we describe our basic algorithm and some extensions for using context and positional information, compare its alignment accuracy with IBM Model 4, and report on experiments in …


Optimizing Estimated Loss Reduction For Active Sampling In Rank Learning, Pinar Donmez, Jaime G. Carbonell May 2013

Optimizing Estimated Loss Reduction For Active Sampling In Rank Learning, Pinar Donmez, Jaime G. Carbonell

Jaime G. Carbonell

Learning to rank is becoming an increasingly popular research area in machine learning. The ranking problem aims to induce an ordering or preference relations among a set of instances in the input space. However, collecting labeled data is growing into a burden in many rank applications since labeling requires eliciting the relative ordering over the set of alternatives. In this paper, we propose a novel active learning framework for SVM-based and boosting-based rank learning. Our approach suggests sampling based on maximizing the estimated loss differential over unlabeled data. Experimental results on two benchmark corpora show that the proposed model substantially …


A Prototype System For Transnational Information Sharing And Process Coordination, S Su, T R. Kasad, M Patil, A Matsunaga, M Tsugawa, V Cavalli-Sforza, Jaime G. Carbonell, Peter J. Jansen, W Ward, R Cole, D Towsley, W. Chen, A I. Anton, Q He, C. Mcsweeney, L De Brens, J Ventura, P Taveras, R Connolly, C Ortega, B Piñeres, O Brooks, M Herrera May 2013

A Prototype System For Transnational Information Sharing And Process Coordination, S Su, T R. Kasad, M Patil, A Matsunaga, M Tsugawa, V Cavalli-Sforza, Jaime G. Carbonell, Peter J. Jansen, W Ward, R Cole, D Towsley, W. Chen, A I. Anton, Q He, C. Mcsweeney, L De Brens, J Ventura, P Taveras, R Connolly, C Ortega, B Piñeres, O Brooks, M Herrera

Jaime G. Carbonell

Global problems such as disease detection and control, terrorism, immigration and border control, illicit drug trafficking, etc. require information sharing, coordination and collaboration among government agencies within a country and across national boundaries. This paper presents a prototype of a transnational information system which aims at achieving information sharing, process coordination and enforcement of policies, constraints, regulations, and security and privacy rules by integrating a distributed query processor with form-based and conversational user interfaces, a language translation system, an event server for event filtering and notification, and an event-trigger-rule server. The Web-services infrastructure is used to achieve the interoperation of …


Rare And Frequent N-Grams In Whole-Genome Protein Sequences, Madhavi Ganapathiraju, Judith Klein-Seetharaman, Roni Rosenfeld, Jaime G. Carbonell, Raj Reddy May 2013

Rare And Frequent N-Grams In Whole-Genome Protein Sequences, Madhavi Ganapathiraju, Judith Klein-Seetharaman, Roni Rosenfeld, Jaime G. Carbonell, Raj Reddy

Jaime G. Carbonell

The precise relationship between a primary protein sequence, its three-dimensional structure and its function in a complex cellular environment is one of the most fundamental unanswered questions in biology. Unprecedented amounts of genomic and proteomic data create an opportunity for attacking the sequence-structure-function mapping problem with data-driven methods. The mapping of biological sequences to form and function of proteins is conceptually similar to the mapping of words to meaning. This analogy is being studied by a growing body of research ([1] and pointers thereof). Thus, n-gram analysis (statistical analysis of co-occurrence of words in a text) has found applications to …


Final Report On The Automated Classification And Retrieval Project : Medsort-1, Jaime G.(Jaime Guillermo) Carbonell, David A. (David Andreoff) Evans, Dana S. Scott, Richmond H. Thomason May 2013

Final Report On The Automated Classification And Retrieval Project : Medsort-1, Jaime G.(Jaime Guillermo) Carbonell, David A. (David Andreoff) Evans, Dana S. Scott, Richmond H. Thomason

Jaime G. Carbonell

No abstract provided.


Chinese Sentence Generation In A Knowledge-Based Machine Translation System, Tangqiu Li, Eric H. Nyberg, Jaime G. Carbonell May 2013

Chinese Sentence Generation In A Knowledge-Based Machine Translation System, Tangqiu Li, Eric H. Nyberg, Jaime G. Carbonell

Jaime G. Carbonell

This paper presents a technique for generating Chinese sentences from the Interlingua expressions used in the KANT knowledge-based machine translation system. Chinese sentences are generated directly from the semantic representation using a unification-based generation formalization which takes advantage of certain linguistic features of Chinese. Direct generation from the semantic form eliminates the need for an intermediate syntactic structure, thus simplifying the generation procedure. The generation algorithm is top-down, data-driven and recursive. The descriptive nature of the pseudo-unification grammar formalism used in KANT allows the grammar developer to write very straightforward semantic grammar rules. We also discuss some of the crucial …