Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

PDF

2019

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 42

Full-Text Articles in Applied Statistics

Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas Dec 2019

Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas

SMU Data Science Review

In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley Oct 2019

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari Oct 2019

Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari

Electronic Thesis and Dissertation Repository

In western Canada, there has been an increase in seismic activity linked to anthropogenic energy-related operations including conventional hydrocarbon production, wastewater fluid injection and more recently hydraulic fracturing (HF). Statistical modeling and characterization of the space, time and magnitude distributions of the seismicity clusters is vital for a better understanding of induced earthquake processes and development of predictive models. In this work, a statistical analysis of the seismicity in the Western Canada Sedimentary Basin was performed across past and present time periods by utilizing a compiled earthquake catalogue for Alberta and eastern British Columbia. Specifically, the frequency-magnitude statistics were analyzed …


Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir Aug 2019

Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir

Theses and Dissertations

This thesis conducts Value at Risk (VaR) and Expected Shortfall (ES) estimation using GARCH modeling and Bayesian Model Averaging (BMA). BMA considers multiple models weighted by some information criterion. Through BMA, this thesis finds that VaR and ES estimates can be improved through enhanced modeling of the data generation process.


Stability Of Single-Parent Gene Expression Complementation In Maize Hybrids Upon Water Deficit Stress, Caroline Marcon, Anja Paschold, Waqas Ahmed Malik, Andrew Lithio, Jutta A. Baldauf, Lena Altrogge, Nina Opitz, Christa Lanz, Heiko Schoof, Dan Nettleton, Hans-Peter Piepho, Frank Hochholdinger Jul 2019

Stability Of Single-Parent Gene Expression Complementation In Maize Hybrids Upon Water Deficit Stress, Caroline Marcon, Anja Paschold, Waqas Ahmed Malik, Andrew Lithio, Jutta A. Baldauf, Lena Altrogge, Nina Opitz, Christa Lanz, Heiko Schoof, Dan Nettleton, Hans-Peter Piepho, Frank Hochholdinger

Dan Nettleton

Heterosis is the superior performance of F1 hybrids compared with their homozygous, genetically distinct parents. In this study, we monitored the transcriptomic divergence of the maize (Zea mays) inbred lines B73 and Mo17 and their reciprocal F1 hybrid progeny in primary roots under control and water deficit conditions simulated by polyethylene glycol treatment. Single-parent expression (SPE) of genes is an extreme instance of gene expression complementation, in which genes are active in only one of two parents but are expressed in both reciprocal hybrids. In this study, 1,997 genes only expressed in B73 and 2,024 genes …


Genomic Neighborhoods For Arabidopsisretrotransposons: A Role For Targeted Integration In The Distribution Of The Metaviridae, Brooke D. Peterson-Burch, Dan Nettleton, Daniel F. Voytas Jul 2019

Genomic Neighborhoods For Arabidopsisretrotransposons: A Role For Targeted Integration In The Distribution Of The Metaviridae, Brooke D. Peterson-Burch, Dan Nettleton, Daniel F. Voytas

Dan Nettleton

Background: Retrotransposons are an abundant component of eukaryotic genomes. The high quality of the Arabidopsis thaliana genome sequence makes it possible to comprehensively characterize retroelement populations and explore factors that contribute to their genomic distribution.

Results: We identified the full complement of A. thaliana long terminal repeat (LTR) retroelements using RetroMap, a software tool that iteratively searches genome sequences for reverse transcriptases and then defines retroelement insertions. Relative ages of full-length elements were estimated by assessing sequence divergence between LTRs: the Pseudoviridae were significantly younger than the Metaviridae. All retroelement insertions were mapped onto the genome sequence and their distribution …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Empirical Bayes Analysis Of Rna-Seq Data For Detection Of Gene Expression Heterosis, Jarad Niemi, Eric Mittman, Will Landau, Dan Nettleton Jun 2019

Empirical Bayes Analysis Of Rna-Seq Data For Detection Of Gene Expression Heterosis, Jarad Niemi, Eric Mittman, Will Landau, Dan Nettleton

Dan Nettleton

An important type of heterosis, known as hybrid vigor, refers to the enhancements in the phenotype of hybrid progeny relative to their inbred parents. Although hybrid vigor is extensively utilized in agriculture, its molecular basis is still largely unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers are measuring transcript abundance levels of thousands of genes in parental inbred lines and their hybrid offspring using RNA sequencing (RNA-seq) technology. The resulting data allow researchers to search for evidence of gene expression heterosis as one potential molecular mechanism underlying heterosis of agriculturally important traits. The null hypotheses …


A Statistical Analysis Of The Roulette Martingale System: Examples, Formulas And Simulations With R, Peter Pflaumer May 2019

A Statistical Analysis Of The Roulette Martingale System: Examples, Formulas And Simulations With R, Peter Pflaumer

International Conference on Gambling & Risk Taking

Some gamblers use a martingale or doubling strategy as a way of improving their chances of winning. This paper derives important formulas for the martingale strategy, such as the distribution, the expected value, the standard deviation of the profit, the risk of a loss or the expected bet of one or multiple martingale rounds. A computer simulation study with R of the doubling strategy is presented. The results of doubling to gambling with a constant sized bet on simple chances (red or black numbers, even or odd numbers, and low (1 – 18) or high (19 – 36) numbers) and …


Quantifying Sleep Architecture For Pediatric Hypersomnia Conditions, Alicia K. Colclasure May 2019

Quantifying Sleep Architecture For Pediatric Hypersomnia Conditions, Alicia K. Colclasure

Biology and Medicine Through Mathematics Conference

No abstract provided.


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong May 2019

A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong

Graduate Theses and Dissertations

Because earthquakes have a large impact on human society, statistical methods for better studying earthquakes are required. One characteristic of earthquakes is the arrival time of seismic waves at a seismic signal sensor. Once we can estimate the earthquake arrival time accurately, the earthquake location can be triangulated, and assistance can be sent to that area correctly. This study presents a Bayesian framework to predict the arrival time of seismic waves with associated uncertainty. We use a change point framework to model the different conditions before and after the seismic wave arrives. To evaluate the performance of the model, we …


Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison May 2019

Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison

Graduate Theses and Dissertations

Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online …


Deep Neural Network Architectures For Music Genre Classification, Kai Middlebrook, Shyam Sudhakaran, Kunal Sonar, David Guy Brizan Apr 2019

Deep Neural Network Architectures For Music Genre Classification, Kai Middlebrook, Shyam Sudhakaran, Kunal Sonar, David Guy Brizan

Creative Activity and Research Day - CARD

With the recent advancements in technology, many tasks in fields such as computer vision, natural language processing, and signal processing have been solved using deep learning architectures. In the audio domain, these architectures have been used to learn musical features of songs to predict: moods, genres, and instruments. In the case of genre classification, deep learning models were applied to popular datasets--which are explicitly chosen to represent their genres--and achieved state-of-the-art results. However, these results have not been reproduced on less refined datasets. To this end, we introduce an un-curated dataset which contains genre labels and 30-second audio previews for …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve Mar 2019

Assessment And Correction Of Lidar-Derived Dems In The Coastal Marshes Of Louisiana, William M. Lauve

LSU Master's Theses

The onset of airborne light detection and ranging (lidar) has resulted in expansive, precise digital elevation models (DEMs). DEMs are essential for modeling complex systems, such as the coastal land margin of Louisiana. They are used for many applications (e.g. tide, storm surge, and ecological modeling) and by diverse groups (e.g. state and federal agencies, NGOs, and academia). However, in a marsh environment, it is difficult for airborne lidar to produce accurate bare-earth measurements and even accurate elevations are rarely verified by ground truth data. The accuracy of lidar in marshes is limited by the sensor’s resolution …


Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens Mar 2019

Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens

FIU Electronic Theses and Dissertations

The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to …


Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder Mar 2019

Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder

Doctoral Dissertations

Understanding the factors influencing the likelihood of persistence of real-world populations requires both an accurate understanding of the traits and behaviors of individuals within those populations (e.g., movement, habitat selection, survival, fecundity, dispersal) but also an understanding of how those traits and behaviors are influenced by landscape features. The federally threatened eastern indigo snake (EIS, Drymarchon couperi) has declined throughout its range primarily due to anthropogenically-induced habitat loss and fragmentation making spatially-explicit assessments of population viability and connectivity essential for understanding its current status and directing future conservation efforts. The primary goal of my dissertation was to understand how …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson Feb 2019

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, …


Level Crossing Simulation Of A Queueing Model, Zhanxuan Ding Jan 2019

Level Crossing Simulation Of A Queueing Model, Zhanxuan Ding

Major Papers

Simulation of the level crossing method will be used to find approximations of the distribution of the workload for several queueing models. In particular, three different type of queueing models, with different methods of handling workload bound thresholds, will be considered. Simulation applied to workload bound thresholds is new work.


Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas Jan 2019

Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas

SMU Data Science Review

In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen Jan 2019

Automatic 13C Chemical Shift Reference Correction Of Protein Nmr Spectral Data Using Data Mining And Bayesian Statistical Modeling, Xi Chen

Theses and Dissertations--Molecular and Cellular Biochemistry

Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially of biomacromolecules such as proteins. However, due to the intrinsic properties of NMR experiments, results from the NMR instruments require a refencing step before the down-the-line analysis. Poor chemical shift referencing, especially for 13C in protein Nuclear Magnetic Resonance (NMR) experiments, fundamentally limits and even prevents effective study of biomacromolecules via NMR. There is no available method that can rereference carbon chemical shifts from protein NMR without secondary experimental information such as structure or resonance assignment.

To solve this problem, we …


Methods For Evaluating Dropout Attrition In Survey Data, Camille J. Hochheimer Jan 2019

Methods For Evaluating Dropout Attrition In Survey Data, Camille J. Hochheimer

Theses and Dissertations

As researchers increasingly use web-based surveys, the ease of dropping out in the online setting is a growing issue in ensuring data quality. One theory is that dropout or attrition occurs in phases that can be generalized to phases of high dropout and phases of stable use. In order to detect these phases, several methods are explored. First, existing methods and user-specified thresholds are applied to survey data where significant changes in the dropout rate between two questions is interpreted as the start or end of a high dropout phase. Next, survey dropout is considered as a time-to-event outcome and …


Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish Jan 2019

Controlling For Confounding Via Propensity Score Methods Can Result In Biased Estimation Of The Conditional Auc: A Simulation Study, Hadiza I. Galadima, Donna K. Mcclish

Community & Environmental Health Faculty Publications

In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not random in observational studies, comparisons of outcomes between exposed and nonexposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of conditional odds ratio and hazard ratio. However, research is lacking on the performance of propensity score methods for covariate adjustment when estimating the …