Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 47

Full-Text Articles in Physical Sciences and Mathematics

Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren Dec 2020

Reasoning About User Feedback Under Identity Uncertainty In Knowledge Base Construction, Ariel Kobren

Doctoral Dissertations

Intelligent, automated systems that are intertwined with everyday life---such as Google Search and virtual assistants like Amazon’s Alexa or Apple’s Siri---are often powered in part by knowledge bases (KBs), i.e., structured data repositories of entities, their attributes, and the relationships among them. Despite a wealth of research focused on automated KB construction methods, KBs are inevitably imperfect, with errors stemming from various points in the construction pipeline. Making matters more challenging, new data is created daily and must be integrated with existing KBs so that they remain up-to-date. As the primary consumers of KBs, human users have tremendous potential to …


Understanding The Dynamic Visual World: From Motion To Semantics, Huaizu Jiang Dec 2020

Understanding The Dynamic Visual World: From Motion To Semantics, Huaizu Jiang

Doctoral Dissertations

We live in a dynamic world, which is continuously in motion. Perceiving and interpreting the dynamic surroundings is an essential capability for an intelligent agent. Human beings have the remarkable capability to learn from limited data, with partial or little annotation, in sharp contrast to computational perception models that rely on large-scale, manually labeled data. Reliance on strongly supervised models with manually labeled data inherently prohibits us from modeling the dynamic visual world, as manual annotations are tedious, expensive, and not scalable, especially if we would like to solve multiple scene understanding tasks at the same time. Even worse, in …


Algorithms For Massive, Expensive, Or Otherwise Inconvenient Graphs, David Tench Dec 2020

Algorithms For Massive, Expensive, Or Otherwise Inconvenient Graphs, David Tench

Doctoral Dissertations

A long-standing assumption common in algorithm design is that any part of the input is accessible at any time for unit cost. However, as we work with increasingly large data sets, or as we build smaller devices, we must revisit this assumption. In this thesis, I present some of my work on graph algorithms designed for circumstances where traditional assumptions about inputs do not apply.
1. Classical graph algorithms require direct access to the input graph and this is not feasible when the graph is too large to fit in memory. For computation on massive graphs we consider the dynamic …


System Design For Digital Experimentation And Explanation Generation, Emma Tosch Dec 2020

System Design For Digital Experimentation And Explanation Generation, Emma Tosch

Doctoral Dissertations

Experimentation increasingly drives everyday decisions in modern life, as it is considered by some to be the gold standard for determining cause and effect within any system. Digital experiments have expanded the scope and frequency of experiments, which can range in complexity from classic A/B tests to contextual bandits experiments, which share features with reinforcement learning. Although there exists a large body of prior work on estimating treatment effects using experiments, this prior work did not anticipate the new challenges and opportu- nities introduced by digital experimentation. Novel errors and threats to validity arise at the intersection of software and …


Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal Dec 2020

Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal

Doctoral Dissertations

Deep learning (DL) has emerged as the leading paradigm for predictive modeling in a variety of domains, especially those involving large volumes of high-dimensional spatio-temporal data such as images and text. With the rise of big data in scientific and engineering problems, there is now considerable interest in the research and development of DL for scientific applications. The scientific domain, however, poses unique challenges for DL, including special emphasis on interpretability and robustness. In particular, a priority of the Department of Energy (DOE) is the research and development of probabilistic ML methods that are robust to overfitting and offer reliable …


Exploration Of Mid To Late Paleozoic Tectonics Along The Cincinnati Arch Using Gis And Python To Automate Geologic Data Extraction From Disparate Sources, Kenneth Steven Boling Dec 2020

Exploration Of Mid To Late Paleozoic Tectonics Along The Cincinnati Arch Using Gis And Python To Automate Geologic Data Extraction From Disparate Sources, Kenneth Steven Boling

Doctoral Dissertations

Structure contour maps are one of the most common methods of visualizing geologic horizons as three-dimensional surfaces. In addition to their practical applications in the oil and gas and mining industries, these maps can be used to evaluate the relationships of different geologic units in order to unravel the tectonic history of an area. The construction of high-resolution regional structure contour maps of a particular geologic horizon requires a significant volume of data that must be compiled from all available surface and subsurface sources. Processing these data using conventional methods and even basic GIS tools can be tedious and very …


Benchmarks And Controls For Optimization With Quantum Annealing, Erica Kelley Grant Dec 2020

Benchmarks And Controls For Optimization With Quantum Annealing, Erica Kelley Grant

Doctoral Dissertations

Quantum annealing (QA) is a metaheuristic specialized for solving optimization problems which uses principles of adiabatic quantum computing, namely the adiabatic theorem. Some devices implement QA using quantum mechanical phenomena. These QA devices do not perfectly adhere to the adiabatic theorem because they are subject to thermal and magnetic noise. Thus, QA devices return statistical solutions with some probability of success where this probability is affected by the level of noise of the system. As these devices improve, it is believed that they will become less noisy and more accurate. However, some tuning strategies may further improve that probability of …


Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based Lu Factorization And Iterative Refinement For Hermitian Eigenvalue Problem, Yaohung Tsai Dec 2020

Mixed-Precision Numerical Linear Algebra Algorithms: Integer Arithmetic Based Lu Factorization And Iterative Refinement For Hermitian Eigenvalue Problem, Yaohung Tsai

Doctoral Dissertations

Mixed-precision algorithms are a class of algorithms that uses low precision in part of the algorithm in order to save time and energy with less accurate computation and communication. These algorithms usually utilize iterative refinement processes to improve the approximate solution obtained from low precision to the accuracy we desire from doing all the computation in high precision. Due to the demand of deep learning applications, there are hardware developments offering different low-precision formats including half precision (FP16), Bfloat16 and integer operations for quantized integers, which uses integers with a shared scalar to represent a set of equally spaced numbers. …


Modeling User-Affected Software Properties For Open Source Software Supply Chains, Tapajit Dey Dec 2020

Modeling User-Affected Software Properties For Open Source Software Supply Chains, Tapajit Dey

Doctoral Dissertations

Background: Open Source Software development community relies heavily on users of the software and contributors outside of the core developers to produce top-quality software and provide long-term support. However, the relationship between a software and its contributors in terms of exactly how they are related through dependencies and how the users of a software affect many of its properties are not very well understood.

Aim: My research covers a number of aspects related to answering the overarching question of modeling the software properties affected by users and the supply chain structure of software ecosystems, viz. 1) Understanding how software usage …


Leveraging Conventional Internet Routing Protocol Behavior To Defeat Ddos And Adverse Networking Conditions, Jared M. Smith Aug 2020

Leveraging Conventional Internet Routing Protocol Behavior To Defeat Ddos And Adverse Networking Conditions, Jared M. Smith

Doctoral Dissertations

The Internet is a cornerstone of modern society. Yet increasingly devastating attacks against the Internet threaten to undermine the Internet's success at connecting the unconnected. Of all the adversarial campaigns waged against the Internet and the organizations that rely on it, distributed denial of service, or DDoS, tops the list of the most volatile attacks. In recent years, DDoS attacks have been responsible for large swaths of the Internet blacking out, while other attacks have completely overwhelmed key Internet services and websites. Core to the Internet's functionality is the way in which traffic on the Internet gets from one destination …


Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng Aug 2020

Mobile Location Data Analytics, Privacy, And Security, Yunhe Feng

Doctoral Dissertations

Mobile location data are ubiquitous in the digital world. People intentionally and unintentionally generate numerous location data when connecting to cellular networks or sharing posts on social networks. As mobile devices normally choose to communicate with nearby cell towers outdoor, it is reasonable to infer human locations based on cell tower coordinates. Many social networking platforms, such as Twitter, allow users to geo-tag their posts optionally, publishing personal locations to friends or everyone. These location data are particularly useful for understanding mobile usage behaviors and human mobility patterns. Meanwhile, the public expresses great concern about the privacy and security of …


Using Applications To Guide Data Management For Emerging Memory Technologies, Timothy C. Effler Aug 2020

Using Applications To Guide Data Management For Emerging Memory Technologies, Timothy C. Effler

Doctoral Dissertations

A number of promising new memory technologies, such as non-volatile, storage-class memories and high-bandwidth, on-chip RAMs, are emerging. Since each of these new technologies present tradeoffs distinct from conventional DRAMs, many high performance and scientific computing systems have begun to include multiple tiers of memory storage, each with their own type of devices. To efficiently utilize the available hardware, such systems will need to alter their data management strategies to consider the performance and capabilities provided by each tier. This work aims to understand and increase the effectiveness of application data management for emerging complex memory systems. A key realization …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


A Framework For Performance-Based Facade Design: Approach For Automated And Multi-Objective Simulation And Optimization, Mahsa Minaei Jul 2020

A Framework For Performance-Based Facade Design: Approach For Automated And Multi-Objective Simulation And Optimization, Mahsa Minaei

Doctoral Dissertations

Buildings have a considerable impact on the environment, and it is crucial to consider environmental and energy performance in building design. Buildings account for about 40% of the global energy consumption and contribute over 30% of the CO2 emissions. A large proportion of this energy is used for meeting occupants’ thermal comfort in buildings, followed by lighting. The building facade forms a barrier between the exterior and interior environments; therefore, it has a crucial role in improving energy efficiency and building performance. In this regard, decision-makers are required to establish an optimal solution, considering multi-objective problems that are usually competitive …


Deep Neural Networks For 3d Processing And High-Dimensional Filtering, Hang Su Jul 2020

Deep Neural Networks For 3d Processing And High-Dimensional Filtering, Hang Su

Doctoral Dissertations

Deep neural networks (DNN) have seen tremendous success in the past few years, advancing state of the art in many AI areas by significant margins. Part of the success can be attributed to the wide adoption of convolutional filters. These filters can effectively capture the invariance in data, leading to faster training and more compact representations, and at the same can leverage efficient parallel implementations on modern hardware. Since convolution operates on regularly structured grids, it is a particularly good fit for texts and images where there are inherent rigid 1D or 2D structures. However, extending DNNs to 3D or …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


Design And Implementation Of Path Finding And Verification In The Internet, Hao Cai Jul 2020

Design And Implementation Of Path Finding And Verification In The Internet, Hao Cai

Doctoral Dissertations

In the Internet, network traffic between endpoints typically follows one path that is determined by the control plane. Endpoints have little control over the choice of which path their network traffic takes and little ability to verify if the traffic indeed follows a specific path. With the emergence of software-defined networking (SDN), more control over connections can be exercised, and thus the opportunity for novel solutions exists. However, there remain concerns about the attack surface exposed by fine-grained control, which may allow attackers to inject and redirect traffic. To address these opportunities and concerns, we consider two specific challenges: (1) …


Learning From Irregularly-Sampled Time Series, Steven Cheng-Xian Li Jul 2020

Learning From Irregularly-Sampled Time Series, Steven Cheng-Xian Li

Doctoral Dissertations

Irregularly-sampled time series are characterized by non-uniform time intervals between successive measurements. Such time series naturally occur in application areas including climate science, ecology, biology, and medicine. Irregular sampling poses a great challenge for modeling this type of data as there can be substantial uncertainty about the values of the underlying temporal processes. Moreover, different time series are not necessarily synchronized or of the same length, which makes it difficult to deal with using standard machine learning methods that assume fixed-dimensional data spaces. The goal of this thesis is to develop scalable probabilistic tools for modeling a large collection of …


Integrating Recognition And Decision Making To Close The Interaction Loop For Autonomous Systems, Richard Freedman Jul 2020

Integrating Recognition And Decision Making To Close The Interaction Loop For Autonomous Systems, Richard Freedman

Doctoral Dissertations

Intelligent systems are becoming increasingly ubiquitous in daily life. Mobile devices are providing machine-generated support to users, robots are "coming out of their cages" in manufacturing to interact with co-workers, and cars with various degrees of self-driving capabilities operate amongst pedestrians and the driver. However, these interactive intelligent systems' effectiveness depends on their understanding and recognition of human activities and goals, as well as their responses to people in a timely manner. The average person does not follow instructions step-by-step or act in a formulaic manner, but instead varies the order of actions and timing when performing a given task. …


Improving Reinforcement Learning Techniques By Leveraging Prior Experience, Francisco M. Garcia Jul 2020

Improving Reinforcement Learning Techniques By Leveraging Prior Experience, Francisco M. Garcia

Doctoral Dissertations

In this dissertation we develop techniques to leverage prior knowledge for improving the learning speed of existing reinforcement learning (RL) algorithms. RL systems can be expensive to train, which limits its applicability when a large number of agents need to be trained to solve a large number of tasks; a situation that often occurs in industry and is often ignored in the RL literature. In this thesis, we develop three methods to leverage the experience obtained from solving a small number of tasks to improve an agent's ability to learn on new tasks the agent might face in the future. …


Improving Visual Recognition With Unlabeled Data, Aruni Roy Chowdhury Jul 2020

Improving Visual Recognition With Unlabeled Data, Aruni Roy Chowdhury

Doctoral Dissertations

The success of deep neural networks has resulted in computer vision systems that obtain high accuracy on a wide variety of tasks such as image classification, object detection, semantic segmentation, etc. However, most state-of-the-art vision systems are dependent upon large amounts of labeled training data, which is not a scalable solution in the long run. This work focuses on improving existing models for visual object recognition and detection without being dependent on such large-scale human-annotated data. We first show how large numbers of hard examples (cases where an existing model makes a mistake) can be obtained automatically from unlabeled video …


Intelligent Tutoring Systems, Pedagogical Agent Design, And Hispanic English Language Learners, Danielle Allessio May 2020

Intelligent Tutoring Systems, Pedagogical Agent Design, And Hispanic English Language Learners, Danielle Allessio

Doctoral Dissertations

According to the most recent data from the National Center of Education Statistics (NCES) there were approximately 5 million English Language Learners (ELLs) in the U.S. public schools in the Fall of 2016, representing about 10% of the student population (2019). Spanish is the primary language for most ELL students, by a large margin. As a group, ELLs have faced a deeply rooted and persistent math achievement gap (U.S. Department of Education, 2015). Despite research indicating that intelligent tutors and animated pedagogical agents enhance learning, many tutors are not designed with ELLs in mind. As a result, Hispanic ELL students …


Finding Critical And Gradient-Flat Points Of Deep Neural Network Loss Functions, Charles Gearhart Frye '09 Apr 2020

Finding Critical And Gradient-Flat Points Of Deep Neural Network Loss Functions, Charles Gearhart Frye '09

Doctoral Dissertations

Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. This makes neural networks easy to train, which, combined with their high representational capacity and implicit and explicit regularization strategies, leads to machine-learned algorithms of high quality with reasonable computational cost in a wide variety of domains.

One thread of work has focused on explaining this phenomenon by numerically characterizing the local curvature at critical points of the loss function, where gradients are zero. Such studies have reported that the loss functions …


Probabilistic Inference With Generating Functions For Population Dynamics Of Unmarked Individuals, Kevin Winner Mar 2020

Probabilistic Inference With Generating Functions For Population Dynamics Of Unmarked Individuals, Kevin Winner

Doctoral Dissertations

Modeling the interactions of different population dynamics (e.g. reproduction, migration) within a population is a challenging problem that underlies numerous ecological research questions. Powerful, interpretable models for population dynamics are key to developing intervention tactics, allocating limited conservation resources, and predicting the impact of uncertain environmental forces on a population. Fortunately, probabilistic graphical models provide a robust mechanistic framework for these kinds of problems. However, in the relatively common case where individuals in the population are unmarked (i.e. indistinguishable from one another), models of the population dynamics naturally contain a deceptively challenging statistical feature: discrete latent variables with unbounded/countably infinite …


Dynamic Composition Of Functions For Modular Learning, Clemens Gb Rosenbaum Mar 2020

Dynamic Composition Of Functions For Modular Learning, Clemens Gb Rosenbaum

Doctoral Dissertations

Compositionality is useful to reduce the complexity of machine learning models and increase their generalization capabilities, because new problems can be linked to the composition of existing solutions. Recent work has shown that compositional approaches can offer substantial benefits over a wide variety of tasks, from multi-task learning over visual question-answering to natural language inference, among others. A key variant is functional compositionality, where a meta-learner composes different (trainable) functions into complex machine learning models. In this thesis, I generalize existing approaches to functional compositionality under the umbrella of the routing paradigm, where trainable arbitrary functions are 'stacked' to form …


Higher-Order Representations For Visual Recognition, Tsung-Yu Lin Mar 2020

Higher-Order Representations For Visual Recognition, Tsung-Yu Lin

Doctoral Dissertations

In this thesis, we present a simple and effective architecture called Bilinear Convolutional Neural Networks (B-CNNs). These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner. B-CNNs generalize classical orderless texture-based image models such as bag-of-visual-words and Fisher vector representations. However, unlike prior work, they can be trained in an end-to-end manner. In the experiments, we demonstrate that these representations generalize well to novel domains by fine-tuning and achieve excellent results on fine-grained, texture and scene recognition tasks. The visualization of fine-tuned convolutional filters …


Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor Mar 2020

Learning Latent Characteristics Of Data And Models Using Item Response Theory, John P. Lalor

Doctoral Dissertations

A supervised machine learning model is trained with a large set of labeled training data, and evaluated on a smaller but still large set of test data. Especially with deep neural networks (DNNs), the complexity of the model requires that an extremely large data set is collected to prevent overfitting. It is often the case that these models do not take into account specific attributes of the training set examples, but instead treat each equally in the process of model training. This is due to the fact that it is difficult to model latent traits of individual examples at the …


Improving Face Clustering In Videos, Souyoung Jin Mar 2020

Improving Face Clustering In Videos, Souyoung Jin

Doctoral Dissertations

Human faces represent not only a challenging recognition problem for computer vision, but are also an important source of information about identity, intent, and state of mind. These properties make the analysis of faces important not just as algorithmic challenges, but as a gateway to developing computer vision methods that can better follow the intent and goals of human beings. In this thesis, we are interested in face clustering in videos. Given a raw video, with no caption or annotation, we want to group all detected faces by their identity. We address three problems in the area of face clustering …


Optimization And Training Of Generational Garbage Collectors, Nicholas Jacek Mar 2020

Optimization And Training Of Generational Garbage Collectors, Nicholas Jacek

Doctoral Dissertations

Garbage collectors are nearly ubiquitous in modern programming languages, and we want to minimize the cost they impose in terms of time and space. Generally, a collector waits until its space is full and then performs a collection to reclaim needed memory. However, this is not the only option; a collection could be performed early when some free space remains. For copying collectors, which are what we consider here, the system must traverse the graph of live objects and copy them, so the cost of a collection is proportional to the volume of objects that are live. Since this value …


An Empirical Assessment Of The Effectiveness Of Deception For Cyber Defense, Kimberly J. Ferguson-Walter Mar 2020

An Empirical Assessment Of The Effectiveness Of Deception For Cyber Defense, Kimberly J. Ferguson-Walter

Doctoral Dissertations

The threat of cyber attacks is a growing concern across the world, leading to an increasing need for sophisticated cyber defense techniques. The Tularosa Study, was designed and conducted to understand how defensive deception, both cyber and psychological, affects cyber attackers Ferguson-Walter et al. [2019c]. More specifically, for this empirical study, cyber deception refers to a decoy system and psychological deception refers to false information of the presence of defensive deception techniques on the network. Over 130 red teamers participated in a network penetration test over two days in which we controlled both the presence of and explicit mention of …