Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Government (3)
- Technology (3)
- Digital Libraries (2)
- Active learning (1)
- Artificial Intelligence (1)
-
- Book Alignment (1)
- Centrality (1)
- Clustering (1)
- Content distribution network (1)
- Data mining (1)
- Database Applications (1)
- Database Management (1)
- DieHard (1)
- Document Quality (1)
- Document and Text processing (1)
- Electronic Publishing (1)
- Find-Similar (1)
- Graphical Models (1)
- Graphical models (1)
- Hash chain (1)
- Information Retrieval (1)
- Information Storage and Retrieval (1)
- Information extraction (1)
- Key distribution (1)
- Key regression (1)
- Key rotation (1)
- Knowledge discovery in graphs (1)
- Language Models (1)
- Lazy revocation (1)
- Manifolds (1)
- Publication
- Publication Type
Articles 1 - 30 of 40
Full-Text Articles in Physical Sciences and Mathematics
Enacting Technology In Networked Governance: Developmental Processes Of Cross-Agency Arrangements, Jane E. Fountain
Enacting Technology In Networked Governance: Developmental Processes Of Cross-Agency Arrangements, Jane E. Fountain
National Center for Digital Government
This paper discusses the technology enactment framework, an analytical framework to guide exploration and examination of information-based change in governments.1 The original technology enactment framework is extended in this paper to delineate the distinctive roles played by key actors in technology enactment. I then examine institutional change in government by drawing from current initiatives in the U.S. federal government to build cross-agency relationships and systems. The U.S. government is one of the first central states to undertake not only back office integration within the government but also integration of systems and processes across agencies. For this reason its experience during …
Icts And Political Accountability: An Assessment Of The Impact Of Digitization In Government On Political Accountability In Connecticut, Massachusetts And New York State, Albert Meijer
National Center for Digital Government
This report presents a first analysis of the results of empirical research into the impact of digitization on political accountability in Connecticut, Massachusetts and New York State. The report focuses on presenting the empirical findings and these data still require further analysis.
Statistical Models And Analysis Techniques For Learning In Relational Data, Jennifer Neville
Statistical Models And Analysis Techniques For Learning In Relational Data, Jennifer Neville
Computer Science Department Faculty Publication Series
Many data sets routinely captured by organizations are relational in nature— from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information. This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency …
A Framework To Predict The Quality Of Answers With Nontextual, University Of Massachusetts Amherst
A Framework To Predict The Quality Of Answers With Nontextual, University Of Massachusetts Amherst
Computer Science Department Faculty Publication Series
New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to pre- dict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a signi¯cant improvement over our baseline.
Spectral Methods Based On Prolate Spheroidal Wave Functions For Hyperbolic Pdes, Qian-Yong Chen, D. Gottlieb, J. S. Hesthaven
Spectral Methods Based On Prolate Spheroidal Wave Functions For Hyperbolic Pdes, Qian-Yong Chen, D. Gottlieb, J. S. Hesthaven
Qian-Yong Chen
We examine the merits of using prolate spheroidal wave functions (PSWFs) as basis functions when solving hyperbolic PDEs using pseudospectral methods. The relevant approximation theory is reviewed and some new approximation results in Sobolev spaces are established. An optimal choice of the band-limit parameter for PSWFs is derived for single-mode functions. Our conclusion is that one might gain from using the PSWFs over the traditional Chebyshev or Legendre methods in terms of accuracy and efficiency for marginally resolved broadband solutions.
Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann
Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann
Computer Science Department Faculty Publication Series
Measurements of the impact and history of research literature provide a useful complement to scientific digital library collections. Bibliometric indicators have been extensively studied, mostly in the context of journals. However, journal-based metrics poorly capture topical distinctions in fast-moving fields, and are increasingly problematic with the rise of open-access publishing. Recent developments in latent topic models have produced promising results for automatic sub-field discovery. The fine-grained, faceted topics produced by such models provide a clearer view of the topical divisions of a body of research literature and the interactions between those divisions. We demonstrate the usefulness of topic models in …
A Hierarchical, Hmmbased Accuracy For A Digital Library Of Books, Shaolei Feng
A Hierarchical, Hmmbased Accuracy For A Digital Library Of Books, Shaolei Feng
Computer Science Department Faculty Publication Series
A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a …
Challenges To Organizational Change: Multi-Level Integrated Information Structures (Miis), Jane E. Fountain
Challenges To Organizational Change: Multi-Level Integrated Information Structures (Miis), Jane E. Fountain
National Center for Digital Government
From introduction: Governments are extraordinary information creators, users, and disseminators. I-government focuses attention on the flow and structuring of information within government (Mayer-Schoenberger and Lazer, this volume). Government actors engage in knowledge work, specifically, in the creation, sharing, and communication of information. They design and redesign processes by which information flows according to legislative mandate, organizational practice and public need. Recently, they have sought to rethink information flows in order to leverage benefits from information and communication technologies. When public sector actors seek to change these information flows at any appreciable level of complexity, they inevitably engage in complex organizational …
A Continuous-Time Model Of Topic Co-Occurrence Trends, Wei Li, Xuerui Wang, Andrew Mccallum
A Continuous-Time Model Of Topic Co-Occurrence Trends, Wei Li, Xuerui Wang, Andrew Mccallum
Andrew McCallum
Recent work in statistical topic models has investigated richer structures to capture either temporal or inter-topic correlations. This paper introduces a topic model that combines the advantages of two recently proposed models: (1) The Pachinko Allocation model (PAM), which captures arbitrary topic correlations with a directed acyclic graph (DAG), and (2) the Topics over Time model (TOT), which captures time-localized shifts in topic prevalence with a continuous distribution over timestamps. Our model can thus capture not only temporal patterns in individual topics, but also the temporal patterns in their co-occurrences. We present results on a research paper corpus, showing interesting …
Pachinko Allocation: Dag-Structured Mixture Models Of Topic Correlations, Wei Li, Andrew Mccallum
Pachinko Allocation: Dag-Structured Mixture Models Of Topic Correlations, Wei Li, Andrew Mccallum
Andrew McCallum
Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures …
Combining Generative And Discriminative Methods For Pixel Classification With Multi-Conditional Learning, B. Michael Kelm, Chris Pal, Andrew Mccallum
Combining Generative And Discriminative Methods For Pixel Classification With Multi-Conditional Learning, B. Michael Kelm, Chris Pal, Andrew Mccallum
Andrew McCallum
It is possible to broadly characterize two approaches to probabilistic modeling in terms of generative and discriminative methods. Provided with sufficient training data the discriminative approach is expected to yield superior accuracy as compared to the analogous generative model since no modeling power is expended on the marginal distribution of the features. Conversely, if the model is accurate the generative approach can perform better with less data. In general it is less vulnerable to overfitting and allows one to more easily specify meaningful priors on the model parameters. We investigate multi-conditional learning--a method combining the merits of both approaches. Through …
Tractable Learning And Inference With High-Order Representations, Aron Culotta, Andrew Mccallum
Tractable Learning And Inference With High-Order Representations, Aron Culotta, Andrew Mccallum
Andrew McCallum
Representing high-order interactions in data often results in large models with an intractable number of hidden variables. In these models, inference and learning must operate without instantiating the entire set of variables. This paper presents a Metropolis-Hastings sampling approach to address this issue, and proposes new methods to discriminatively estimate the proposal and target distribution of the sampler using a ranking function over configurations. We demonstrate our approach on the task of paper and author deduplication, showing that our method enables complex, advantageous representations of the data while maintaining tractable learning and inference procedures.
Group And Topic Discovery From Relations And Their Attributes, Xuerui Wang, Natasha Mohanty, Andrew Mccallum
Group And Topic Discovery From Relations And Their Attributes, Xuerui Wang, Natasha Mohanty, Andrew Mccallum
Andrew McCallum
We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, …
Learning Field Compatibilities To Extract Database Records From Unstructured Text, Michael Wick, Aron Culotta, Andrew Mccallum
Learning Field Compatibilities To Extract Database Records From Unstructured Text, Michael Wick, Aron Culotta, Andrew Mccallum
Andrew McCallum
Named-entity recognition systems extract entities in text by type, such as people, organizations, and locations from unstructured text. Rather than extract these fields in isolation, in this paper we present a record extraction system that clusters fields together into records (i.e. database tuples). We construct a probabilistic model of the compatibility of field values, then employ graph partitioning algorithms to partition fields into cohesive records. We also investigate compatibility functions over sets of fields, rather than simply pairs of fields, to examine how higher representational power can impact performance. We apply our techniques to the task of extracting contact records …
Cc Prediction With Graphical Models, Chris Pal, Andrew Mccallum
Cc Prediction With Graphical Models, Chris Pal, Andrew Mccallum
Andrew McCallum
We address the problem of suggesting who to add as an additional recipient (i.e. cc, or carbon copy) for an email under composition. We address the problem using graphical models for words in the body and subject line of the email as well as the recipients given so far on the email. The problem of cc prediction is closely related to the problem of expert finding in an organization. We show that graphical models present a variety of solutions to these problems. We present results using naively structured models and introduce a powerful new modeling tool: plated factor graphs.
Multi-Conditional Learning For Joint Probability Models With Latent Variables, Chris Pal, Xuerui Wang, Michael Kelm, Andrew Mccallum
Multi-Conditional Learning For Joint Probability Models With Latent Variables, Chris Pal, Xuerui Wang, Michael Kelm, Andrew Mccallum
Andrew McCallum
We introduce Multi-Conditional Learning, a framework for optimizing graphical models based not on joint likelihood, or on conditional likelihood, but based on a product of several marginal conditional likelihoods each relying on common sets of parameters from an underlying joint model and predicting different subsets of variables conditioned on other subsets. When applied to undirected models with latent variables, such as the Harmonium, this approach can result in powerful, structured latent variable representations that combine some of the advantages of conditional random fields with the unsupervised clustering ability of popular topic models, such as latent Dirichlet allocation and its successors. …
On Discriminative And Semi-Supervised Dimensionality Reduction, Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck, Andrew Mccallum
On Discriminative And Semi-Supervised Dimensionality Reduction, Chris Pal, Michael Kelm, Xuerui Wang, Greg Druck, Andrew Mccallum
Andrew McCallum
We are interested in using the goal of making predictions to influence dimensionality reduction procedures. A number of new methods are emerging aimed at combining attributes of generative and discriminative approaches to data modeling. New approaches to semi-supervised learning have also been emerging. We present and apply some new methods to non-linear and richly structured problems comparing and contrasting models designed for computer vision with those designed for text processing and discuss essential properties that need to be preserved when reducing dimensionality.
Integrating Probabilistic Extraction Models And Data Mining To Discover Relations And Patterns In Text, Aron Culotta, Andrew Mccallum, Jonathon Betz
Integrating Probabilistic Extraction Models And Data Mining To Discover Relations And Patterns In Text, Aron Culotta, Andrew Mccallum, Jonathon Betz
Andrew McCallum
In order for relation extraction systems to obtain human-level performance, they must be able to incorporate relational patterns inherent in the data (for example, that one's sister is likely one's mother's daughter, or that children are likely to attend the same college as their parents). Hand-coding such knowledge can be time-consuming and inadequate. Additionally, there may exist many interesting, unknown relational patterns that both improve extraction performance and provide insight into text. We describe a probabilistic extraction model that provides mutual benefits to both ``top-down'' relational pattern discovery and ``bottom-up'' relation extraction.
Joint Group And Topic Discovery From Relations And Text, Andrew Mccallum, Xuerui Wang, Natasha Mohanty
Joint Group And Topic Discovery From Relations And Text, Andrew Mccallum, Xuerui Wang, Natasha Mohanty
Andrew McCallum
We present a probabilistic generative model of entity relationships and textual attributes; the model simultaneously discovers groups among the entities and topics among the corresponding text. Block models of relationship data have been studied in social network analysis for some time, however here we cluster in multiple modalities at once. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and 43 years of similar data from the …
Multi-Conditional Learning: Generative/Discriminative Training For Clustering And Classification, Andrew Mccallum, Chris Pal, Greg Druck, Xuerui Wang
Multi-Conditional Learning: Generative/Discriminative Training For Clustering And Classification, Andrew Mccallum, Chris Pal, Greg Druck, Xuerui Wang
Andrew McCallum
This paper presents multi-conditional learning MCL), a training criterion based on a product of multiple conditional likelihoods. When combining the traditional conditional probability of "label given input" with a generative probability of "input given label" the later acts as a surprisingly effective regularizer. When applied to models with latent variables, MCL combines the structure-discovery capabilities of generative topic models, such as latent Dirichlet allocation and the exponential family harmonium, with the accuracy and robustness of discriminative classifiers, such as logistic regression and conditional random fields. We present results on several standard text data sets showing significant reductions in classification error …
Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum
Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum
Andrew McCallum
This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, …
Corrective Feedback And Persistent Learning For Information Extraction, Aron Culotta, Trausti Kristjansson, Andrew Mccallum, Paul Viola
Corrective Feedback And Persistent Learning For Information Extraction, Aron Culotta, Trausti Kristjansson, Andrew Mccallum, Paul Viola
Andrew McCallum
To successfully embed statistical machine learning models in real world applications, two post-deployment capabilities must be provided: (1) the ability to solicit user corrections and (2) the ability to update the model from these corrections. We refer to the former capability as corrective feedback and the latter as persistent learning. While these capabilities have a natural implementation for simple classification tasks such as spam filtering, we argue that a more careful design is required for structured classification tasks. One example of a structured classification task is information extraction, in which raw text is analyzed to automatically populate a database. In …
Practical Markov Logic Containing First-Order Quantifiers With Application To Identity Uncertainty, Aron Culotta, Andrew Mccallum
Practical Markov Logic Containing First-Order Quantifiers With Application To Identity Uncertainty, Aron Culotta, Andrew Mccallum
Andrew McCallum
Markov logic is a highly expressive language recently introduced to specify the connectivity of a Markov network using first-order logic. While Markov logic is capable of constructing arbitrary first-order formulae over the data, the complexity of these formulae is often limited in practice because of the size and connectivity of the resulting network. In this paper, we present approximate inference and estimation methods that incrementally instantiate portions of the network as needed to enable first-order existential and universal quantifiers in Markov logic networks. When applied to the problem of identity uncertainty, this approach results in a conditional probabilistic model that …
First-Order Probabilistic Models For Coreference Resolution, Aron Culotta, Michael Wick, Robert Hall, Andrew Mccallum
First-Order Probabilistic Models For Coreference Resolution, Aron Culotta, Michael Wick, Robert Hall, Andrew Mccallum
Andrew McCallum
Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a first-order probabilistic model for coreference. We outline a set of approximations that make this approach practical, and apply our method to the ACE coreference dataset, achieving an 11% error reduction over a comparable method that only considers features of pairs of noun phrases. This result demonstrates an example of how a powerful representation language can be incorporated into a probabilistic model and be scaled efficiently.
Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann, David Mimno, Andrew Mccallum
Bibliometric Impact Measures Leveraging Topic Analysis, Gideon S. Mann, David Mimno, Andrew Mccallum
Andrew McCallum
Measurements of the impact and history of research literature provide a useful complement to scientific digital library collections. Bibliometric indicators have been extensively studied, mostly in the context of journals. However, journal-based metrics poorly capture topical distinctions in fast-moving fields, and are increasingly problematic in the context of open-access publishing. Recent developments in latent topic models have produced promising results for automatic sub-field discovery. The fine-grained, faceted topics produced by such models provide a more clear view of the topical divisions of a body of research literature and the interactions between those divisions. We demonstrate the usefulness of topic models …
Flux: A Language For Programming High-Performance Servers, Brendan Burns, Kevin Grimaldi, Alexander Kostadinov, Emery D. Berger, Mark D. Corner
Flux: A Language For Programming High-Performance Servers, Brendan Burns, Kevin Grimaldi, Alexander Kostadinov, Emery D. Berger, Mark D. Corner
Computer Science Department Faculty Publication Series
Programming high-performance server applications is challenging: it is both complicated and error-prone to write the concurrent code required to deliver high performance and scalability. Server performance bottlenecks are difficult to identify and correct. Finally, it is difficult to predict server performance prior to deployment. This paper presents Flux, a language that dramatically simplifies the construction of scalable high-performance server applications. Flux lets programmers compose offthe- shelf, sequential C or C++ functions into concurrent servers. Flux programs are type-checked and guaranteed to be deadlock-free. We have built a number of servers in Flux, including a web server with PHP support, an …
Autonomous Shaping: Knowledge Transfer In Reinforcement Learning, George Konidaris
Autonomous Shaping: Knowledge Transfer In Reinforcement Learning, George Konidaris
Computer Science Department Faculty Publication Series
We introduce the use of learned shaping rewards in reinforcement learning tasks, where an agent uses prior experience on a sequence of tasks to learn a portable predictor that estimates intermediate rewards, resulting in accelerated learning in later tasks that are related but distinct. Such agents can be trained on a sequence of relatively easy tasks in order to develop a more informative measure of reward that can be transferred to improve performance on more difficult tasks without requiring a hand coded shaping function. We use a rod positioning task to show that this significantly improves performance even after a …
Hierarchical Power Management In Disruption Tolerant Networks With Trafficaware, Hyewon Jun
Hierarchical Power Management In Disruption Tolerant Networks With Trafficaware, Hyewon Jun
Computer Science Department Faculty Publication Series
Recent efforts in Disruption Tolerant Networks (DTNs) have shown that mobility can be a powerful means for delivering messages in highly-challenged environments. DTNs are wireless mobile networks that are particularly useful in sparse environments where the density of nodes is insufficient to support direct end-to-end communication. Unfortunately, many mobility scenarios depend on untethered devices with limited energy supplies. Without careful management depleted energy supplies will degrade network connectivity and counteract the robustness gained by mobility. A primary concern is the energy consumed by wireless communication, and in particular the energy consumed in searching for other nodes to communicate with. In …
Using Structure Indices For Efficient Approximation Of Network Properties, Matthew J. Rattigan, Marc Maier, David Jensen
Using Structure Indices For Efficient Approximation Of Network Properties, Matthew J. Rattigan, Marc Maier, David Jensen
Computer Science Department Faculty Publication Series
Statistics on networks have become vital to the study of relational data drawn from areas including bibliometrics, fraud detection, bioinformatics, and the Internet. Calculating many of the most important measures—such as betweenness centrality, closeness centrality, and graph diameter—requires identifying short paths in these networks. However, finding these short paths can be intractable for even moderate-size networks. We introduce the concept of a network structure index (NSI), a composition of (1) a set of annotations on every node in the network and (2) a function that uses the annotations to estimate graph distance between pairs of nodes. We present several varieties …
Oasis: An Overlayaware, Harsha V. Madhyastha
Oasis: An Overlayaware, Harsha V. Madhyastha
Computer Science Department Faculty Publication Series
Overlays have enabled several new and popular distributed applications such as Akamai, Kazaa, and Bittorrent. However, the lack of an overlay-aware network stack has hindered the widespread use of general purpose overlay packet delivery services [16, 29, 26]. In this paper, we describe the design and implementation of Oasis, a system and toolkit that enables legacy operating systems to access overlay-based packet delivery services. Oasis combines a set of ideas – network address translation, name resolution, packet capture, dynamic code execution – to provide greater user choice. We are in the process of making the Oasis toolkit available for public …