Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Learning From Irregularly-Sampled Time Series, Steven Cheng-Xian Li Jul 2020

Learning From Irregularly-Sampled Time Series, Steven Cheng-Xian Li

Doctoral Dissertations

Irregularly-sampled time series are characterized by non-uniform time intervals between successive measurements. Such time series naturally occur in application areas including climate science, ecology, biology, and medicine. Irregular sampling poses a great challenge for modeling this type of data as there can be substantial uncertainty about the values of the underlying temporal processes. Moreover, different time series are not necessarily synchronized or of the same length, which makes it difficult to deal with using standard machine learning methods that assume fixed-dimensional data spaces. The goal of this thesis is to develop scalable probabilistic tools for modeling a large collection of …


A Comparison Of Techniques For Handling Missing Data In Longitudinal Studies, Alexander R. Bogdan Nov 2016

A Comparison Of Techniques For Handling Missing Data In Longitudinal Studies, Alexander R. Bogdan

Masters Theses

Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not …


Learning To Select Actions For Resource-Bounded Information Extraction, P. Kinani, Andrew Mccallum Jan 2011

Learning To Select Actions For Resource-Bounded Information Extraction, P. Kinani, Andrew Mccallum

Andrew McCallum

Given a database with missing or uncertain information, our goal is to extract specific information from a large corpus such as the Web under limited resources. We cast the information gathering task as a series of alternative, resource-consuming actions to choose from and propose a new algorithm for learning to select the best action to perform at each time step. The function that selects these actions is trained using an online, error-driven algorithm called SampleRank. We present a system that finds the faculty directory pages of top Computer Science departments in the U.S. and show that the learning-based approach accomplishes …