Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

Learning With Aggregate Data, Tao Sun Mar 2019

Learning With Aggregate Data, Tao Sun

Doctoral Dissertations

Various real-world applications involve directly dealing with aggregate data. In this work, we study Learning with Aggregate Data from several perspectives and try to address their combinatorial challenges. At first, we study the problem of learning in Collective Graphical Models (CGMs), where only noisy aggregate observations are available. Inference in CGMs is NP- hard and we proposed an approximate inference algorithm. By solving the inference problems, we are empowered to build large-scale bird migration models, and models for human mobility under the differential privacy setting. Secondly, we consider problems given bags of instances and bag-level aggregate supervisions. Specifically, we study …


Comparison Mining From Text, Maksim Tkachenko Dec 2018

Comparison Mining From Text, Maksim Tkachenko

Dissertations and Theses Collection (Open Access)

Online product reviews are important factors of consumers' purchase decisions. They invade more and more spheres of our life, we have reviews on books, electronics, groceries, entertainments, restaurants, travel experiences, etc. More than 90 percent of consumers read online reviews before they purchase products as reported by various consumers surveys. This observation suggests that product review information enhances consumer experience and helps them to make better-informed purchase decisions. There is an enormous amount of online reviews posted on e-commerce platforms, such as Amazon, Apple, Yelp, TripAdvisor. They vary in information and may be written with different experiences and preferences.

If …


Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm Nov 2016

Towards Deeper Understanding In Neuroimaging, Rex Devon Hjelm

Computer Science ETDs

Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in …


Profiling Social Media Users With Selective Self-Disclosure Behavior, Wei Gong Aug 2016

Profiling Social Media Users With Selective Self-Disclosure Behavior, Wei Gong

Dissertations and Theses Collection

Social media has become a popular platform for millions of users to share activities and thoughts. Many applications are now tapping on social media to disseminate information (e.g., news), to promote products (e.g., advertisements), to manage customer relationship (e.g., customer feedback), and to source for investment (e.g., crowdfunding). Many of these applications require user profile knowledge to select the target social media users or to personalize messages to users. Social media user profiling is a task of constructing user profiles such as demographical labels, interests, and opinions, etc., using social media data. Among the social media user profiling research works, …


Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick Mar 2015

Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick

Doctoral Dissertations

Knowledge bases (KB) facilitate real world decision making by providing access to structured relational information that enables pattern discovery and semantic queries. Although there is a large amount of data available for populating a KB; the data must first be gathered and assembled. Traditionally, this integration is performed automatically by storing the output of an information extraction pipeline directly into a database as if this prediction were the ``truth.'' However, the resulting KB is often not reliable because (a) errors accumulate in the integration pipeline, and (b) they persist in the KB even after new information arrives that could rectify …


Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad Mar 2015

Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad

Doctoral Dissertations

Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system's output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of "telephone", combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each …


Polyhedral Problems In Combinatorial Convex Geometry, Liam Solus Jan 2015

Polyhedral Problems In Combinatorial Convex Geometry, Liam Solus

Theses and Dissertations--Mathematics

In this dissertation, we exhibit two instances of polyhedra in combinatorial convex geometry. The first instance arises in the context of Ehrhart theory, and the polyhedra are the central objects of study. The second instance arises in algebraic statistics, and the polyhedra act as a conduit through which we study a nonpolyhedral problem.

In the first case, we examine combinatorial and algebraic properties of the Ehrhart h*-polynomial of the r-stable (n,k)-hypersimplices. These are a family of polytopes which form a nested chain of subpolytopes within the (n,k)-hypersimplex. We show that a well-studied unimodular triangulation of the (n,k)-hypersimplex restricts to a …


Causal Discovery For Relational Domains: Representation, Reasoning, And Learning, Marc Maier Nov 2014

Causal Discovery For Relational Domains: Representation, Reasoning, And Learning, Marc Maier

Doctoral Dissertations

Many domains are currently experiencing the growing trend to record and analyze massive, observational data sets with increasing complexity. A commonly made claim is that these data sets hold potential to transform their corresponding domains by providing previously unknown or unexpected explanations and enabling informed decision-making. However, only knowledge of the underlying causal generative process, as opposed to knowledge of associational patterns, can support such tasks. Most methods for traditional causal discovery—the development of algorithms that learn causal structure from observational data—are restricted to representations that require limiting assumptions on the form of the data. Causal discovery has almost exclusively …


Scaling Mcmc Inference And Belief Propagation To Large, Dense Graphical Models, Sameer Singh Aug 2014

Scaling Mcmc Inference And Belief Propagation To Large, Dense Graphical Models, Sameer Singh

Doctoral Dissertations

With the physical constraints of semiconductor-based electronics becoming increasingly limiting in the past decade, single-core CPUs have given way to multi-core and distributed computing platforms. At the same time, access to large data collections is progressively becoming commonplace due to the lowering cost of storage and bandwidth. Traditional machine learning paradigms that have been designed to operate sequentially on single processor architectures seem destined to become obsolete in this world of multi-core, multi-node systems and massive data sets. Inference for graphical models is one such example for which most existing algorithms are sequential in nature and are difficult to scale …