Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 35

Full-Text Articles in Physical Sciences and Mathematics

Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan Mar 2024

Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan

Doctoral Dissertations

The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has proven to be an invaluable tool for the mineralogical analysis of the Martian surface. It has been crucial in identifying and mapping the spatial extents of various minerals. Primarily, the identification and mapping of these mineral spectral-shapes have been performed manually. Given the size of the CRISM image dataset, manual analysis of the full dataset would be arduous/infeasible. This dissertation attempts to address this issue by describing an (machine learning based) automated processing pipeline for CRISM data that can be used to identify and map the unique mineral signatures present in …


Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia Mar 2024

Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia

Doctoral Dissertations

AI has the potential to accelerate scientific discovery by enabling scientists to analyze vast datasets more efficiently than traditional methods. For example, this thesis considers the detection of star clusters in high-resolution images of galaxies taken from space telescopes, as well as studying bird migration from RADAR images. In these applications, the goal is to make measurements to answer scientific questions, such as how the star formation rate is affected by mass, or how the phenology of bird migration is influenced by climate change. However, current computer vision systems are far from perfect for conducting these measurements directly. They may …


Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna Nov 2023

Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna

Doctoral Dissertations

Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to …


Quantifying And Enhancing The Security Of Federated Learning, Virat Vishnu Shejwalkar Nov 2023

Quantifying And Enhancing The Security Of Federated Learning, Virat Vishnu Shejwalkar

Doctoral Dissertations

Federated learning is an emerging distributed learning paradigm that allows multiple users to collaboratively train a joint machine learning model without having to share their private data with any third party. Due to many of its attractive properties, federated learning has received significant attention from academia as well as industry and now powers major applications, e.g., Google's Gboard and Assistant, Apple's Siri, Owkin's health diagnostics, etc. However, federated learning is yet to see widespread adoption due to a number of challenges. One such challenge is its susceptibility to poisoning by malicious users who aim to manipulate the joint machine learning …


Models And Machine Learning Techniques For Improving The Planning And Operation Of Electricity Systems In Developing Regions, Santiago Correa Cardona Jun 2022

Models And Machine Learning Techniques For Improving The Planning And Operation Of Electricity Systems In Developing Regions, Santiago Correa Cardona

Doctoral Dissertations

The enormous innovation in computational intelligence has disrupted the traditional ways we solve the main problems of our society and allowed us to make more data-informed decisions. Energy systems and the ways we deliver electricity are not exceptions to this trend: cheap and pervasive sensing systems and new communication technologies have enabled the collection of large amounts of data that are being used to monitor and predict in real-time the behavior of this infrastructure. Bringing intelligence to the power grid creates many opportunities to integrate new renewable energy sources more efficiently, facilitate grid planning and expansion, improve reliability, optimize electricity …


Incremental Non-Greedy Clustering At Scale, Nicholas Monath Mar 2022

Incremental Non-Greedy Clustering At Scale, Nicholas Monath

Doctoral Dissertations

Clustering is the task of organizing data into meaningful groups. Modern clustering applications such as entity resolution put several demands on clustering algorithms: (1) scalability to massive numbers of points as well as clusters, (2) incremental additions of data, (3) support for any user-specified similarity functions. Hierarchical clusterings are often desired as they represent multiple alternative flat clusterings (e.g., at different granularity levels). These tree-structured clusterings provide for both fine-grained clusters as well as uncertainty in the presence of newly arriving data. Previous work on hierarchical clustering does not fully address all three of the aforementioned desiderata. Work on incremental …


High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki Oct 2021

High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki

Doctoral Dissertations

Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants …


3d Shape Understanding And Generation, Matheus Gadelha Oct 2021

3d Shape Understanding And Generation, Matheus Gadelha

Doctoral Dissertations

In recent years, Machine Learning techniques have revolutionized solutions to longstanding image-based problems, like image classification, generation, semantic segmentation, object detection and many others. However, if we want to be able to build agents that can successfully interact with the real world, those techniques need to be capable of reasoning about the world as it truly is: a tridimensional space. There are two main challenges while handling 3D information in machine learning models. First, it is not clear what is the best 3D representation. For images, convolutional neural networks (CNNs) operating on raster images yield the best results in virtually …


Improving Evaluation Methods For Causal Modeling, Amanda Gentzel Jun 2021

Improving Evaluation Methods For Causal Modeling, Amanda Gentzel

Doctoral Dissertations

Causal modeling is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. Active communities of researchers in machine learning, statistics, social science, and other fields develop and enhance algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from the experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for expanding …


Utilizing Graph Structure For Machine Learning, Stefan Dernbach Apr 2021

Utilizing Graph Structure For Machine Learning, Stefan Dernbach

Doctoral Dissertations

The information age has led to an explosion in the size and availability of data. This data often exhibits graph-structure that is either explicitly defined, as in the web of a social network, or is implicitly defined and can be determined by measuring similarity between objects. Utilizing this graph-structure allows for the design of machine learning algorithms that reflect not only the attributes of individual objects but their relationships to every other object in the domain as well. This thesis investigates three machine learning problems and proposes novel methods that leverage the graph-structure inherent in the tasks. Quantum walk neural …


Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren Dec 2020

Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren

Doctoral Dissertations

Intelligent, automated systems that are intertwined with everyday life---such as Google Search and virtual assistants like Amazon’s Alexa or Apple’s Siri---are often powered in part by knowledge bases (KBs), i.e., structured data repositories of entities, their attributes, and the relationships among them. Despite a wealth of research focused on automated KB construction methods, KBs are inevitably imperfect, with errors stemming from various points in the construction pipeline. Making matters more challenging, new data is created daily and must be integrated with existing KBs so that they remain up-to-date. As the primary consumers of KBs, human users have tremendous potential to …


A Framework For Performance-Based Facade Design: Approach For Automated And Multi-Objective Simulation And Optimization, Mahsa Minaei Jul 2020

A Framework For Performance-Based Facade Design: Approach For Automated And Multi-Objective Simulation And Optimization, Mahsa Minaei

Doctoral Dissertations

Buildings have a considerable impact on the environment, and it is crucial to consider environmental and energy performance in building design. Buildings account for about 40% of the global energy consumption and contribute over 30% of the CO2 emissions. A large proportion of this energy is used for meeting occupants’ thermal comfort in buildings, followed by lighting. The building facade forms a barrier between the exterior and interior environments; therefore, it has a crucial role in improving energy efficiency and building performance. In this regard, decision-makers are required to establish an optimal solution, considering multi-objective problems that are usually competitive …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor Mar 2020

Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor

Doctoral Dissertations

A supervised machine learning model is trained with a large set of labeled training data, and evaluated on a smaller but still large set of test data. Especially with deep neural networks (DNNs), the complexity of the model requires that an extremely large data set is collected to prevent overfitting. It is often the case that these models do not take into account specific attributes of the training set examples, but instead treat each equally in the process of model training. This is due to the fact that it is difficult to model latent traits of individual examples at the …


Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani Oct 2019

Neural Models For Information Retrieval Without Labeled Data, Hamed Zamani

Doctoral Dissertations

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, …


Extracting And Representing Entities, Types, And Relations, Patrick Verga Oct 2019

Extracting And Representing Entities, Types, And Relations, Patrick Verga

Doctoral Dissertations

Making complex decisions in areas like science, government policy, finance, and clinical treatments all require integrating and reasoning over disparate data sources. While some decisions can be made from a single source of information, others require considering multiple pieces of evidence and how they relate to one another. Knowledge graphs (KGs) provide a natural approach for addressing this type of problem: they can serve as long-term stores of abstracted knowledge organized around concepts and their relationships, and can be populated from heterogeneous sources including databases and text. KGs can facilitate higher level reasoning, influence the interpretation of new data, and …


From Optimization To Equilibration: Understanding An Emerging Paradigm In Artificial Intelligence And Machine Learning, Ian Gemp Jul 2019

From Optimization To Equilibration: Understanding An Emerging Paradigm In Artificial Intelligence And Machine Learning, Ian Gemp

Doctoral Dissertations

Many existing machine learning (ML) algorithms cannot be viewed as gradient descent on some single objective. The solution trajectories taken by these algorithms naturally exhibit rotation, sometimes forming cycles, a behavior that is not expected with (full-batch) gradient descent. However, these algorithms can be viewed more generally as solving for the equilibrium of a game with possibly multiple competing objectives. Moreover, some recent ML models, specifically generative adversarial networks (GANs) and its variants, are now explicitly formulated as equilibrium problems. Equilibrium problems present challenges beyond those encountered in optimization such as limit-cycles and chaotic attractors and are able to abstract …


Learning With Aggregate Data, Tao Sun Mar 2019

Learning With Aggregate Data, Tao Sun

Doctoral Dissertations

Various real-world applications involve directly dealing with aggregate data. In this work, we study Learning with Aggregate Data from several perspectives and try to address their combinatorial challenges. At first, we study the problem of learning in Collective Graphical Models (CGMs), where only noisy aggregate observations are available. Inference in CGMs is NP- hard and we proposed an approximate inference algorithm. By solving the inference problems, we are empowered to build large-scale bird migration models, and models for human mobility under the differential privacy setting. Secondly, we consider problems given bags of instances and bag-level aggregate supervisions. Specifically, we study …


Machine Learning Methods For Activity Detection In Wearable Sensor Data Streams, Roy Adams Oct 2018

Machine Learning Methods For Activity Detection In Wearable Sensor Data Streams, Roy Adams

Doctoral Dissertations

Wearable wireless sensors have the potential for transformative impact on the fields of health and behavioral science. Recent advances in wearable sensor technology have made it possible to simultaneously collect multiple streams of physiological and context data from individuals in natural environments; however, extracting reliable high-level inferences from these raw data streams remains a key data analysis challenge. In this dissertation, we address three challenges that arise when trying to perform activity detection from wearable sensor streams. First, we address the challenge of learning from small amounts of noisy data by proposing a class of conditional random field models for …


Transfer Learning With Mixtures Of Manifolds, Thomas Boucher Jul 2018

Transfer Learning With Mixtures Of Manifolds, Thomas Boucher

Doctoral Dissertations

Advances in scientific instrumentation technology have increased the speed of data acquisition and the precision of sampling, creating an abundance of high-dimensional data sets. The ability to combine these disparate data sets and to transfer information between them is critical to accurate scientific analysis. Many modern-day instruments can record data at many thousands of channels, far greater than the actual degrees of freedom in the sample data. This makes manifold learning, a class of methods that exploit the observation that high-dimensional data tend to lie on lower-dimensional manifolds, especially well-suited to this transfer learning task. Existing manifold-based transfer learning methods …


Using Latent Variable Models To Improve Causal Estimation, Huseyin Oktay Mar 2018

Using Latent Variable Models To Improve Causal Estimation, Huseyin Oktay

Doctoral Dissertations

Estimating the causal effect of a treatment from data has been a key goal for a large number of studies in many domains. Traditionally, researchers use carefully designed randomized experiments for causal inference. However, such experiments can not only be costly in terms of time and money but also infeasible for some causal questions. To overcome these challenges, causal estimation methods from observational data have been developed by researchers from diverse disciplines and increasingly studies using such methods account for a large share in empirical work. Such growing interest has also brought together two arguably separate fields: machine learning and …


Deep-Learned Generative Representations Of 3d Shape Families, Haibin Huang Nov 2017

Deep-Learned Generative Representations Of 3d Shape Families, Haibin Huang

Doctoral Dissertations

Digital representations of 3D shapes are becoming increasingly useful in several emerging applications, such as 3D printing, virtual reality and augmented reality. However, traditional modeling softwares require users to have extensive modeling experience, artistic skills and training to handle their complex interfaces and perform the necessary low-level geometric manipulation commands. Thus, there is an emerging need for computer algorithms that help novice and casual users to quickly and easily generate 3D content. In this work, I will present deep learning algorithms that are capable of automatically inferring parametric representations of shape families, which can be used to generate new 3D …


Deep Energy-Based Models For Structured Prediction, David Belanger Nov 2017

Deep Energy-Based Models For Structured Prediction, David Belanger

Doctoral Dissertations

We introduce structured prediction energy networks (SPENs), a flexible frame- work for structured prediction. A deep architecture is used to define an energy func- tion over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discrim- inative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design predic- tion and learning methods for each model. This is because all of our prediction and learning methods interact with the energy …


Problems In Graph-Structured Modeling And Learning, James Atwood Jul 2017

Problems In Graph-Structured Modeling And Learning, James Atwood

Doctoral Dissertations

This thesis investigates three problems in graph-structured modeling and learning. We first present a method for efficiently generating large instances from nonlinear preferential attachment models of network structure. This is followed by a description of diffusion-convolutional neural networks, a new model for graph-structured data which is able to outperform probabilistic relational models and kernel-on-graph methods at node classification tasks. We conclude with an optimal privacy-protection method for users of online services that remains effective when users have poor knowledge of an adversary's behavior.


Learning From Pairwise Proximity Data, Hamid Dadkhahi Nov 2016

Learning From Pairwise Proximity Data, Hamid Dadkhahi

Doctoral Dissertations

In many areas of machine learning, the characterization of the input data is given by a form of proximity measure between data points. Examples of such representations are pairwise differences, pairwise distances, and pairwise comparisons. In this work, we investigate different learning problems on data represented in terms of such pairwise proximities. More specifically, we consider three problems: masking (feature selection) for dimensionality reduction, extension of the dimensionality reduction for time series, and online collaborative filtering. For each of these problems, we start with a form of pairwise proximity which is relevant in the problem at hand. We evaluate the …


Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu Mar 2016

Algorithms For First-Order Sparse Reinforcement Learning, Bo Liu

Doctoral Dissertations

This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational …


Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick Mar 2015

Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick

Doctoral Dissertations

Knowledge bases (KB) facilitate real world decision making by providing access to structured relational information that enables pattern discovery and semantic queries. Although there is a large amount of data available for populating a KB; the data must first be gathered and assembled. Traditionally, this integration is performed automatically by storing the output of an information extraction pipeline directly into a database as if this prediction were the ``truth.'' However, the resulting KB is often not reliable because (a) errors accumulate in the integration pipeline, and (b) they persist in the KB even after new information arrives that could rectify …


Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad Mar 2015

Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad

Doctoral Dissertations

Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system's output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of "telephone", combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each …


Causal Discovery For Relational Domains: Representation, Reasoning, And Learning, Marc Maier Nov 2014

Causal Discovery For Relational Domains: Representation, Reasoning, And Learning, Marc Maier

Doctoral Dissertations

Many domains are currently experiencing the growing trend to record and analyze massive, observational data sets with increasing complexity. A commonly made claim is that these data sets hold potential to transform their corresponding domains by providing previously unknown or unexpected explanations and enabling informed decision-making. However, only knowledge of the underlying causal generative process, as opposed to knowledge of associational patterns, can support such tasks. Most methods for traditional causal discovery—the development of algorithms that learn causal structure from observational data—are restricted to representations that require limiting assumptions on the form of the data. Causal discovery has almost exclusively …


Adaptive Step-Sizes For Reinforcement Learning, William C. Dabney Nov 2014

Adaptive Step-Sizes For Reinforcement Learning, William C. Dabney

Doctoral Dissertations

The central theme motivating this dissertation is the desire to develop reinforcement learning algorithms that “just work” regardless of the domain in which they are applied. The largest impediment to this goal is the sensitivity of reinforcement learning algorithms to the step-size parameter used to rescale incremental updates. Adaptive step-size algorithms attempt to reduce this sensitivity or eliminate the step-size parameter entirely by automatically adjusting the step size throughout the learning process. Such algorithms provide an alternative to the standard “guess-and-check” methods used to find parameters known as parameter tuning. However, the problems with parameter tuning are currently masked by …