Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Discipline
Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 121 - 150 of 1371

Full-Text Articles in Statistical Models

Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, Abdulrasheed Bello Badawaire, Issam Dawoud, Adewale Folaranmi Lukman, Victoria Laoye, Arowolo Olatunji Jan 2023

Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, Abdulrasheed Bello Badawaire, Issam Dawoud, Adewale Folaranmi Lukman, Victoria Laoye, Arowolo Olatunji

Al-Bahir Journal for Engineering and Pure Sciences

A new two-parameter estimator was developed to combat the threat of multicollinearity for the linear regression model. Some necessary and sufficient conditions for the dominance of the proposed estimator over ordinary least squares (OLS) estimator, ridge regression estimator, Liu estimator, KL estimator, and some two-parameter estimators are obtained in the matrix mean square error sense. Theory and simulation results show that, under some conditions, the proposed two-parameter estimator consistently dominates other estimators considered in this study. The real-life application result follows suit.


On Partially Observed Tensor Regression, Dinara Miftyakhetdinova Jan 2023

On Partially Observed Tensor Regression, Dinara Miftyakhetdinova

Major Papers

Tensor data is widely used in modern data science. The interest lies in identifying and characterizing the relationship between tensor datasets and external covariates. These datasets, though, are often incomplete. An efficient nonconvex alternating updating algorithm proposed by J. Zhou et al. in the paper "Partially Observed Dynamic Tensor Response Regression" provides a novel approach. The algorithm handles the problem of unobserved entries by solving an optimization problem of a loss function under the low-rankness, sparsity, and fusion constraints. This analysis aims to understand in detail the proposed algorithms and their theoretical proofs with, potentially, dropping some of the assumptions …


Uniformity Test Based On The Empirical Bernstein Distribution, Ran Sun Jan 2023

Uniformity Test Based On The Empirical Bernstein Distribution, Ran Sun

Major Papers

In this paper, we firstly review the origin of Bernstein polynomial and the various application of it. Then we review the importance of goodness-of-fit test, especially the uniformity test, and we examine lots of different test statistics proposed by far. After that we suggest two new statistics for testing the uniformity. These two statistics are based on Komogorov-Smirnov test type and Cramér-Von Mises test type, respectively. Also we embed Bernstein polynomial into those test type and take advantage of great approximation performance of this polynomial. Finally, we run a Monte-Carlo simulation to compare the performance of our statistics to those …


Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp Jan 2023

Potential Alzheimer's Disease Plasma Biomarkers, Taylor Estepp

Theses and Dissertations--Epidemiology and Biostatistics

In this series of studies, we examined the potential of a variety of blood-based plasma biomarkers for the identification of Alzheimer's disease (AD) progression and cognitive decline. With the end goal of studying these biomarkers via mixture modeling, we began with a literature review of the methodology. An examination of the biomarkers with demographics and other health factors found evidence of minimal risk of confounding along the causal pathway from biomarkers to cognitive performance. Further study examined the usefulness of linear combinations of biomarkers, achieved via partial least squares (PLS) analysis, as predictors of various cognitive assessment scores and clinical …


Modeling Growth And Stress Factors For Converted Silvopasture Systems In The Missouri Ozarks, Bailee N. Suedmeyer Jan 2023

Modeling Growth And Stress Factors For Converted Silvopasture Systems In The Missouri Ozarks, Bailee N. Suedmeyer

MSU Graduate Theses

Silvopasture systems are becoming increasingly popular among sustainable agriculture ranchers, due to the increase in knowledge of benefits to the cattle and ability to grow cool season grasses beneath the canopy. This project focuses on the forest crop aspect of silvopasture systems from monitoring of the health of the trees over time to recommendations for thinning management to keep it functioning as viable silvopasture. The study site consists of five acres of upland hardwood forest area in Southern Missouri with 18 monumented fixed area plots. Arial and ground data was collected at each plot throughout the growing season, along with …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan Jan 2023

Finite Mixtures Of Mean-Parameterized Conway-Maxwell-Poisson Models, Dongying Zhan

Theses and Dissertations--Statistics

For modeling count data, the Conway-Maxwell-Poisson (CMP) distribution is a popular generalization of the Poisson distribution due to its ability to characterize data over- or under-dispersion. While the classic parameterization of the CMP has been well-studied, its main drawback is that it is does not directly model the mean of the counts. This is mitigated by using a mean-parameterized version of the CMP distribution. In this work, we are concerned with the setting where count data may be comprised of subpopulations, each possibly having varying degrees of data dispersion. Thus, we propose a finite mixture of mean-parameterized CMP distributions. An …


Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu Jan 2023

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Shallow Water Coral Distribution And Its Response To Climate Change, Amaury De Jesus Jan 2023

Shallow Water Coral Distribution And Its Response To Climate Change, Amaury De Jesus

Dissertations and Theses

Shallow water corals are one of the main reef-building organisms that secrete carbonates as their skeletons, and therefore, are one of the major sinks of CO2 in the ocean. These reef builders are also very crucial to marine environments and human society. As the global energy demand continues rising, fossil fuel burning increases at a faster pace despite the increase in energy supply using clean and renewable energy. The increase of CO2 in the atmosphere has been shown to exacerbate global warming and may cause ocean acidification, threatening the habitat of shallow-water corals. Many recent observations show alarming signs of …


Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky Jan 2023

Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky

Basic Science Engineering

In this paper, Weibull-Linear Exponential distribution (WLED) has been investigated whether being it is a well-fit distribution to a clinical real data. These data represent the duration of remission achieved by a certain drug used in the treatment of leukemia for a group of patients. The statistical inference approach is used to estimate the parameters of the WLED through the set of the fitted data. The estimated parameters are utilized to evaluate the survival and hazard functions and hence assessing the treatment method through forecasting the duration of remission times of patients. A two-sample prediction approach has been applied to …


Carnivore And Ungulate Occurrence In A Fire-Prone Region, Sara J. Moriarty-Graves Jan 2023

Carnivore And Ungulate Occurrence In A Fire-Prone Region, Sara J. Moriarty-Graves

Cal Poly Humboldt theses and projects

Increasing fire size and severity in the western United States causes changes to ecosystems, species’ habitat use, and interspecific interactions. Wide-ranging carnivore and ungulate mammalian species and their interactions may be influenced by an increase in fire activity in northern California. Depending on the fire characteristics, ungulates may benefit from burned habitat due to an increase in forage availability, while carnivore species may be differentially impacted, but ultimately driven by bottom-up processes from a shift in prey availability. I used a three-step approach to estimate the single-species occupancy of four large mammal species: mountain lion (Puma concolor), coyote …


Exploring Information Leakage In Historical Stock Market Data, Edison Hua Jan 2023

Exploring Information Leakage In Historical Stock Market Data, Edison Hua

Dissertations and Theses

Information leakage is a major concern for traders who want to execute large orders without affecting the market price. In this paper, we explore the sources and effects of information leakage in historical stock market data using various methods and metrics. We first define information leakage as a pattern caused by a trader that would otherwise not occur without the trader’s activity. Using historical data, the direct impact of a potential large trade cannot be measured, but we consider a minimal impact large trade to be one that minimizes changes to the established trading data. We then analyze how information …


Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi Jan 2023

Utilizing Markov Chains To Estimate Allele Progression Through Generations, Ronit Gandhi

Honors Theses

All populations display patterns in allele frequencies over time. Some alleles cease to exist, while some grow to become the norm. These frequencies can shift or stay constant based on the conditions the population lives in. If in Hardy-Weinberg equilibrium, the allele frequencies stay constant. Most populations, however, have bias from environmental factors, sexual preferences, other organisms, etc. We propose a stochastic Markov chain model to study allele progression across generations. In such a model, the allele frequencies in the next generation depend only on the frequencies in the current one.

We use this model to track a recessive allele …


Classification Of Adult Income Using Decision Tree, Roland Fiagbe Jan 2023

Classification Of Adult Income Using Decision Tree, Roland Fiagbe

Data Science and Data Mining

Decision tree is a commonly used data mining methodology for performing classification tasks. It is a tree-based supervised machine learning algorithm that is used to classify or make predictions in a path of how previous questions are answered. Generally, the decision tree algorithm categorizes data into branch-like segments that develop into a tree that contains a root, nodes, and leaves. This project seeks to explore the decision tree methodology and apply it to the Adult Income dataset from the UCI Machine Learning Repository, to determine whether a person makes over 50K per year and determine the necessary factors that improve …


Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi Jan 2023

Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi

Electronic Theses and Dissertations

This thesis delves into cybersecurity by applying Deep Reinforcement(DRL) Learning in network intrusion detection. One advantage of DRL is the ability to adapt to changing network conditions and evolving attack methods, making it a promising solution for addressing the challenges involved in intrusion detection. The thesis will also discuss the obstacles and benefits of using Classification methods for network intrusion detection and the need for high-quality training data. To train and test our proposed method, the NSL-KDD dataset was used and then adjusted by converting it from a multi-classification to a binary classification, achieved by joining all attacks into one. …


The Influence Of Urban Forms And Street Infrastructure On Pedestrian-Motorist Collisions, Taylor J. Foreman Jan 2023

The Influence Of Urban Forms And Street Infrastructure On Pedestrian-Motorist Collisions, Taylor J. Foreman

Electronic Theses and Dissertations

Unwalkable cities are afflicted by serious issues such as increasing rates of pedestrian traffic accidents, public health concerns, and the denied right to have an accessible city. This study examines how different types of urban forms and street infrastructure contribute to the prevalence of traffic accidents in two major metropolitan cities in the United States: Atlanta, Georgia, and Boston, Massachusetts. This study utilizes geospatial analysis through the Average Nearest Neighbor and Optimized Hot Spot Analysis tools to determine the spatial distribution of traffic accidents throughout both cities. Additionally, statistical tests were conducted to explore the relationships between the number of …


Enhancing Control Room Operator Decision Making: An Application Of Dynamic Influence Diagrams In Formaldehyde Manufacturing, Joseph Mietkiewicz, Anders L. Madsen Jan 2023

Enhancing Control Room Operator Decision Making: An Application Of Dynamic Influence Diagrams In Formaldehyde Manufacturing, Joseph Mietkiewicz, Anders L. Madsen

Articles

Intoday’s rapidly evolving industrial landscape, control room operators must grapple with an ever-growing array of tasks and respon sibilities. One major challenge facing these operators is the potential for task overload, which can lead to decision fatigue and increased reliance on cognitive biases. To address this issue, we propose the use of dynamic influence diagrams (DID) as the core of our decision support system. By monitoring the process over time and identifying anomalies, DIDs can recommend the most effective course of action based on a probabilistic assessment of future outcomes. Instead of letting the operator choose or search for the …


Graph Learning On Multi-Modality Medical Data To Generate Clinical Predictions, Justin Jiang Jan 2023

Graph Learning On Multi-Modality Medical Data To Generate Clinical Predictions, Justin Jiang

HMC Senior Theses

There exist petabytes of data pertaining to medical visits – everything from blood pressure recordings, X-rays, and doctor’s notes. Electronic health records (EHRs) organize this data into databases, providing an exciting opportunity for machine learning researchers to dive deeper into analyzing human health. There already exist machine learning models that aim to expedite the process of hospital visits; for example, summary models can digest a patient’s medical history and highlight certain parts of their past that merit attention. The current frontier of medical machine learning is combining the various formats of data to generate a clinical prediction – much like …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan Jan 2023

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


Statistical Models For Decision-Making In Professional Soccer, Sean Hellingman Jan 2023

Statistical Models For Decision-Making In Professional Soccer, Sean Hellingman

Theses and Dissertations (Comprehensive)

As soccer is widely regarded as the most popular sport in the world there is high interest in methods of improving team performances. There are many ways teams and individual athletes can influence their own performances during competition. This thesis focuses on developing statistical methodologies for improving competition-based decision-making for soccer so as to allow professional soccer teams to make better informed decisions regarding player selection and in-game decision-making.

To properly capture the dynamic actions of professional soccer, Markov chains with increasing complexity are proposed. These models allow for the inclusion of potential changes in the process caused by goals …


Aircraft Damage Classification By Using Machine Learning Methods, Tüzün Tolga İnan Jan 2023

Aircraft Damage Classification By Using Machine Learning Methods, Tüzün Tolga İnan

International Journal of Aviation, Aeronautics, and Aerospace

Safety is the most significant factor that affected incidents (non-fatal) and accidents (fatal) in civil aviation history related to scheduled flights. In the history of scheduled flights, the total incident and accident number until 2022 is 1988. In this study, 677 of them are taken into consideration since 11 September 2001. The purpose of this study is to reveal the factors that can classify type of aircraft damages such as none, minor and substantial in all-time incidents and accidents. ML algorithms with different configurations are applied for the classification process. The RFE and PCA are used to find the most …


The Birds And The Trees: Quantifying The Drivers Of Whitebark Pine Decline And Clark's Nutcracker Habitat Use In Glacier National Park, Vladimir Kovalenko Jan 2023

The Birds And The Trees: Quantifying The Drivers Of Whitebark Pine Decline And Clark's Nutcracker Habitat Use In Glacier National Park, Vladimir Kovalenko

Graduate Student Theses, Dissertations, & Professional Papers

Whitebark pine (Pinus albicaulis), recently listed as threatened under the Endangered Species Act, is in steep decline in Glacier National Park, Montana, USA due to the non-native pathogen Cronartium ribicola, causal agent of the fatal disease white pine blister rust. A sample of the park’s population suggests that approximately 70 percent of whitebark pines have died, while 65 percent of the remaining trees are infected. Using landscape and climate variables, we show how geographic location, elevation, aspect, solar radiation, relative humidity, and snowpack interact with tree diameter to affect mortality, disease incidence, cone production, and regeneration. We also examine how …


Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty Jan 2023

Applications Of Transfer Learning From Malicious To Vulnerable Binaries, Sean Patrick Mcnulty

Graduate Student Theses, Dissertations, & Professional Papers

Malware detection and vulnerability detection are important cybersecurity tasks. Previous research has successfully applied a variety of machine learning methods to both. However, despite their potential synergies, previous research has yet to unite these two tasks. Given the recent success of transfer learning in many domains, such as language modeling and image recognition, this thesis investigated the use of transfer learning to improve vulnerability detection. Specifically, we pre-trained a series of models to detect malicious binaries and used the weights from those models to kickstart the detection of vulnerable binaries. In our study, we also investigated five different data representations …


An Assessment Of "Long-Thin" Airline Routes: Network Structure And Emissions Implications For Environmental Policy, Porter Burns Jan 2023

An Assessment Of "Long-Thin" Airline Routes: Network Structure And Emissions Implications For Environmental Policy, Porter Burns

All Master's Theses

The purpose of this research was to define, map, and quantify the network and environmental implications of “long-thin” routes (LTRs) – a route structure that has been discussed in the aviation industry but not formally studied in literature. LTRs were defined through the use of global OAG scheduling data from 1998 to 2018 to identify trends in air traffic growth and network dynamics. Flights were separated into seven aircraft class sizes (e.g., 75–150 seats, 150–225 seats) to measure LTRs at multiple scales. Routes were considered “long” if the stage length was at or above the 75th percentile in each …


Statistical Methods For Gene Selection And Genetic Association Studies, Xuewei Cao Jan 2023

Statistical Methods For Gene Selection And Genetic Association Studies, Xuewei Cao

Dissertations, Master's Theses and Master's Reports

This dissertation includes five Chapters. A brief description of each chapter is organized as follows.

In Chapter One, we propose a signed bipartite genotype and phenotype network (GPN) by linking phenotypes and genotypes based on the statistical associations. It provides a new insight to investigate the genetic architecture among multiple correlated phenotypes and explore where phenotypes might be related at a higher level of cellular and organismal organization. We show that multiple phenotypes association studies by considering the proposed network are improved by incorporating the genetic information into the phenotype clustering.

In Chapter Two, we first illustrate the proposed GPN …


The Impact Of Subjective Risk Analysis On Real Estate Prices In The Nisqually Region Following The 2001 Nisqually Earthquake, Ryan Espedal Jan 2023

The Impact Of Subjective Risk Analysis On Real Estate Prices In The Nisqually Region Following The 2001 Nisqually Earthquake, Ryan Espedal

All Master's Theses

Earthquakes are an environmental hazard that pose great risks to communities almost every day. With earthquakes, the main cause of concern is physical destruction of property, however, there are also psychological effects that are researched and discussed much less. In 2001, the Nisqually area of western Washington experienced a substantial earthquake that produced minimal physical damage but caused a significant decrease in real estate prices. Studying single-family homes from 1986-2012, this research utilizes hedonic property models to measure the change in consumer’s subjective risk calculations with reference to real estate purchases after the Nisqually earthquake, measure the relationship between earthquake …


Stochastic Optimization To Reduce Aircraft Taxi-In Time At Igia, New Delhi, Rajib Das, Saileswar Ghosh, Rajendra Desai, Pijus Kanti Bhuin, Stuti Agarwal Jan 2023

Stochastic Optimization To Reduce Aircraft Taxi-In Time At Igia, New Delhi, Rajib Das, Saileswar Ghosh, Rajendra Desai, Pijus Kanti Bhuin, Stuti Agarwal

International Journal of Aviation, Aeronautics, and Aerospace

Since there is an uncertainty in the arrival times of flights, pre-scheduled allocation of runways and stands and the subsequent first-come-first-served treatment results in a sub-optimal allocation of runways and stands, this is the prime reason for the unusual delays in taxi-in times at IGIA, New Delhi.

We simulated the arrival pattern of aircraft and utilized stochastic optimization to arrive at the best runway-stands allocation for a day. Optimization is done using a GRG Non-Linear algorithm in the Frontline Systems Analytic Solver platform. We applied this model to eight representative scenarios of two different days. Our results show that without …


Bayesian Structural Time Series Methods For Modeling Cattle Body Temperature In Heat-Stressed Animals, Lacey Quandt Jan 2023

Bayesian Structural Time Series Methods For Modeling Cattle Body Temperature In Heat-Stressed Animals, Lacey Quandt

Murray State Theses and Dissertations

Climate change has had devastating effects globally, most commonly talked about during natural disasters and rising temperatures. Notably, the climate concern is turning towards agriculture and livestock. With rising temperatures, the prolonged amount of heat stress put on animals, specifically cattle, is becoming more apparent. Heat stress has been linked to a reduction in cattle growing and fattening, feed intake, productivity, reproduction, and fertility; increased heart rates and respiration; changes in behavior; and mortality in severe cases. There are abatement strategies put in place to lower heat stress in cattle, such as improvements in shading and cooling, nutritional management, and …


Study On Innovation Networks And Its Spillover Effect Of China’S New Energy Automobile Industry, Zhifei Xiong, Wenzhong Zhang Dec 2022

Study On Innovation Networks And Its Spillover Effect Of China’S New Energy Automobile Industry, Zhifei Xiong, Wenzhong Zhang

Bulletin of Chinese Academy of Sciences (Chinese Version)

The network spillover effect of knowledge has been playing an increasingly significant role in the development of industrial innovation. The urban cooperation matrix of China’s new energy automobile industry is built based on new energy automobile patent data, and the structure and evolution process of China’s new energy automobile industry are depicted. On this basis, the spatial Dubin model (SDM) is used to calculate the network spillover effect, and its results are compared with the results of spillover effect based on the relationship of spatial contiguity and distance of cities. The results show that the innovation activities of China’s new …