Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Applied Statistics (33)
- Statistical Methodology (23)
- Biostatistics (18)
- Social and Behavioral Sciences (15)
- Applied Mathematics (12)
-
- Multivariate Analysis (11)
- Business (9)
- Life Sciences (9)
- Engineering (8)
- Probability (8)
- Computer Sciences (7)
- Longitudinal Data Analysis and Time Series (7)
- Medicine and Health Sciences (7)
- Statistical Theory (7)
- Data Science (6)
- Environmental Sciences (6)
- Mathematics (5)
- Other Applied Mathematics (5)
- Other Statistics and Probability (5)
- Social Statistics (5)
- Earth Sciences (4)
- Economics (4)
- Survival Analysis (4)
- Theory and Algorithms (4)
- Artificial Intelligence and Robotics (3)
- Bioinformatics (3)
- Business Analytics (3)
- Institution
-
- Western University (6)
- Kennesaw State University (5)
- University of Kentucky (5)
- Southern Methodist University (4)
- Virginia Commonwealth University (4)
-
- The University of Southern Mississippi (3)
- University of Arkansas, Fayetteville (3)
- Claremont Colleges (2)
- Florida International University (2)
- Illinois State University (2)
- Misericordia University (2)
- Purdue University (2)
- SUNY Geneseo (2)
- Technological University Dublin (2)
- University of Denver (2)
- University of Massachusetts Amherst (2)
- University of Nebraska - Lincoln (2)
- Washington University in St. Louis (2)
- Western Kentucky University (2)
- Air Force Institute of Technology (1)
- Bucknell University (1)
- COBRA (1)
- California Polytechnic State University, San Luis Obispo (1)
- City University of New York (CUNY) (1)
- Colby College (1)
- Concordia University St. Paul (1)
- GALILEO, University System of Georgia (1)
- James Madison University (1)
- Louisiana State University (1)
- Minnesota State University, Mankato (1)
- Keyword
-
- Statistics (8)
- Machine learning (3)
- Modeling (3)
- Analytics (2)
- COVID-19 (2)
-
- Deep Learning (2)
- Imbalance (2)
- Morgridge College of Education (2)
- NBA (2)
- Random forest (2)
- Research Methods and Information Science (2)
- Research Methods and Statistics (2)
- Risk modeling (2)
- Simulation (2)
- Small area estimation (2)
- Time series (2)
- 4/2 model (1)
- AR(1) (1)
- AUC (1)
- Age-demographic model (1)
- Aggregate loss (1)
- Alpha (1)
- Alzheimer’s disease (1)
- American ginseng (1)
- Antimicrobial Resistance (1)
- Appalachia (1)
- Astrophysics (1)
- Atlantic surfclam (1)
- Autoencoders (1)
- Average Causal Effect (1)
- Publication
-
- Electronic Thesis and Dissertation Repository (6)
- Theses and Dissertations--Statistics (5)
- Published and Grey Literature from PhD Candidates (4)
- Electronic Theses and Dissertations (3)
- Graduate Theses and Dissertations (3)
-
- Theses and Dissertations (3)
- Annual Symposium on Biomathematics and Ecology Education and Research (2)
- Articles (2)
- CMC Senior Theses (2)
- Dissertations (2)
- Doctoral Dissertations (2)
- FIU Electronic Theses and Dissertations (2)
- GREAT Day Posters (2)
- Master's Theses (2)
- Masters Theses & Specialist Projects (2)
- SMU Data Science Review (2)
- Statistical Science Theses and Dissertations (2)
- Student Research Poster Presentations 2020 (2)
- The Journal of Purdue Undergraduate Research (2)
- Access*: Interdisciplinary Journal of Student Research and Scholarship (1)
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (1)
- All Graduate Theses, Dissertations, and Other Capstone Projects (1)
- Basic Science Engineering (1)
- Biology and Medicine Through Mathematics Conference (1)
- Civil and Architectural Engineering Faculty Research (1)
- Department of Statistics: Dissertations, Theses, and Student Work (1)
- Doctor of Data Science and Analytics Dissertations (1)
- Electrical & Systems Engineering Publications and Presentations (1)
- English Language Institute (1)
- Faculty Journal Articles (1)
- Publication Type
- File Type
Articles 1 - 30 of 86
Full-Text Articles in Statistical Models
Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake
Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake
Theses & Dissertations
Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …
Influence Of Some Climatic Elements On Radon Concentration In Saeva Dupka Cave, Bulgaria, Peter Nojarov, Petar Stefanov, Karel Turek
Influence Of Some Climatic Elements On Radon Concentration In Saeva Dupka Cave, Bulgaria, Peter Nojarov, Petar Stefanov, Karel Turek
International Journal of Speleology
This study reveals the influence of some climatic elements on radon concentration in Saeva Dupka Cave, Bulgaria. The research is based mainly on statistical methods. Radon concentration in the cave is determined by two main mechanisms. The first one is through penetration of radon from soil and rocks around the cave (present all year round, but has leading role during the warm half of the year). The second one is through thermodynamic exchange of air between inside of the cave and outside atmosphere (cold half of the year). Climatic factors that affect radon concentration in the cave are temperatures (air, …
A Management Strategy Evaluation Of The Impacts Of Interspecific Competition And Recreational Fishery Dynamics On Vermilion Snapper (Rhomboplites Aurorubens) In The Gulf Of Mexico, Megumi C. Oshima
Dissertations
In the Gulf of Mexico (GOM), Vermilion Snapper (Rhomboplites auroruben), are believed to compete with Red Snapper directly for prey and habitat. The two species share similar diets and have significant spatial overlap in the Gulf. Red Snapper are thought to be the dominate competitor, forcing Vermilion Snapper to feed on less nutritious prey when local resources are depleted. In addition to ecological pressures, GOM Vermilion Snapper support substantial commercial and recreational fisheries. Over the past decade, recreational landings have steadily increased, reaching a historical high in 2018. One cause may be stricter regulations for similar target species such as …
Development Of A Statistical Model To Predict Materials’ Unit Prices For Future Maintenance And Rehabilitation In Highway Life Cycle Cost Analysis, Changmo Kim, Ghazan Khan, Brent Nguyen, Emily L. Hoang
Development Of A Statistical Model To Predict Materials’ Unit Prices For Future Maintenance And Rehabilitation In Highway Life Cycle Cost Analysis, Changmo Kim, Ghazan Khan, Brent Nguyen, Emily L. Hoang
Mineta Transportation Institute Publications
The main objectives of this study are to investigate the trends in primary pavement materials’ unit price over time and to develop statistical models and guidelines for using predictive unit prices of pavement materials instead of uniform unit prices in life cycle cost analysis (LCCA) for future maintenance and rehabilitation (M&R) projects. Various socio-economic data were collected for the past 20 years (1997–2018) in California, including oil price, population, government expenditure in transportation, vehicle registration, and other key variables, in order to identify factors affecting pavement materials’ unit price. Additionally, the unit price records of the popular pavement materials were …
Gene Set Testing By Distance Correlation, Sho-Hsien Su
Gene Set Testing By Distance Correlation, Sho-Hsien Su
Graduate Theses and Dissertations
Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …
Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman
Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman
Master's Theses
Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …
Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek
Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek
Graduate Theses and Dissertations
Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear models, …
Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das
Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das
Electronic Theses and Dissertations
Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …
Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi
Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi
Electronic Thesis and Dissertation Repository
Understanding the dynamics of wildfires contributes significantly to the development of fire science. Challenges in the analysis of historical fire data include defining fire dynamics within existing statistical frameworks, modeling the duration and size of fires as joint outcomes, identifying the how fires are grouped into clusters of subpopulations, and assessing the effect of environmental variables in different modeling frameworks. We develop novel statistical methods to consider outcomes related to fire science jointly. These methods address these challenges by linking univariate models for separate outcomes through shared random effects, an approach referred to as joint modeling. Comparisons with existing …
Stochastic Analysis And Statistical Inference For Seir Models Of Infectious Diseases, Andrés Ríos-Gutiérrez, Viswanathan Arunachalam, Anuj Mubayi
Stochastic Analysis And Statistical Inference For Seir Models Of Infectious Diseases, Andrés Ríos-Gutiérrez, Viswanathan Arunachalam, Anuj Mubayi
Annual Symposium on Biomathematics and Ecology Education and Research
No abstract provided.
Stochastic Modeling Of Ovarian Follicle Growth In Adult Female Rats, Zhaozhi Li
Stochastic Modeling Of Ovarian Follicle Growth In Adult Female Rats, Zhaozhi Li
Annual Symposium on Biomathematics and Ecology Education and Research
No abstract provided.
Statistical Modeling Of Private Sector Participation In Disaster Risk Reduction Data, Wupeng Yin
Statistical Modeling Of Private Sector Participation In Disaster Risk Reduction Data, Wupeng Yin
FIU Electronic Theses and Dissertations
The impacts of disaster on the private sector are inevitable, but their risks can be managed and reduced by preventively evaluative measures. Disaster risk reduction index (DRRI) and Disaster Experience (DE) variables were investigated in a survey study in six Western Hemisphere cities within the private sector of various business sizes. Our thesis built and evaluated 16 predictive models of DRRI with 36 categorical predictors and N = 1162 observations. Four statistical methods for linear regression and five for classification as well as seven machine learning methods were utilized. We also used stepwise selection and regulation methods for variable selection. …
Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman
Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman
Access*: Interdisciplinary Journal of Student Research and Scholarship
The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …
Interval Estimation Of Proportion Of Second-Level Variance In Multi-Level Modeling, Steven Svoboda
Interval Estimation Of Proportion Of Second-Level Variance In Multi-Level Modeling, Steven Svoboda
The Nebraska Educator: A Student-Led Journal
Physical, behavioral and psychological research questions often relate to hierarchical data systems. Examples of hierarchical data systems include repeated measures of students nested within classrooms, nested within schools and employees nested within supervisors, nested within organizations. Applied researchers studying hierarchical data structures should have an estimate of the intraclass correlation coefficient (ICC) for every nested level in their analyses because ignoring even relatively small amounts of interdependence is known to inflate Type I error rate in single-level models. Traditionally, researchers rely upon the ICC as a point estimate of the amount of interdependency in their data. Recent methods utilizing an …
Cost Estimating Using A New Learning Curve Theory For Non-Constant Production Rates, Dakotah Hogan, John J. Elshaw, Clay M. Koschnick, Jonathan D. Ritschel, Adedeji B. Badiru, Shawn M. Valentine
Cost Estimating Using A New Learning Curve Theory For Non-Constant Production Rates, Dakotah Hogan, John J. Elshaw, Clay M. Koschnick, Jonathan D. Ritschel, Adedeji B. Badiru, Shawn M. Valentine
Faculty Publications
Traditional learning curve theory assumes a constant learning rate regardless of the number of units produced. However, a collection of theoretical and empirical evidence indicates that learning rates decrease as more units are produced in some cases. These diminishing learning rates cause traditional learning curves to underestimate required resources, potentially resulting in cost overruns. A diminishing learning rate model, namely Boone’s learning curve, was recently developed to model this phenomenon. This research confirms that Boone’s learning curve systematically reduced error in modeling observed learning curves using production data from 169 Department of Defense end-items. However, high amounts of variability in …
A Differential Geometry-Based Machine Learning Algorithm For The Brain Age Problem, Justin Asher, Khoa Tan Dang, Maxwell Masters
A Differential Geometry-Based Machine Learning Algorithm For The Brain Age Problem, Justin Asher, Khoa Tan Dang, Maxwell Masters
The Journal of Purdue Undergraduate Research
No abstract provided.
Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, Juliet Aygun, Alaina Bartfeld, Sahana Rayan
Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, Juliet Aygun, Alaina Bartfeld, Sahana Rayan
The Journal of Purdue Undergraduate Research
No abstract provided.
Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen
Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen
Electronic Thesis and Dissertation Repository
This thesis examines the presence and strength of predictive causal relationship between re-newable energy prices and economic growth. We look for evidence by investigating the cases of Norway, New Zealand, and Canada’s two provinces of Alberta and Ontario. The usual vectorautoregressive model (VAR) and its various improved versions still assume constant parametersover time. We devise a Markov-switching VAR (MS-VAR) model in order to accommodate the observed time-dependent causal relation changes. Our proposed modelling approach is induced by the hidden Markov model methodologies in terms of an online parameter estimationthrough recursive filtering. The parameters of the MS-VAR model are governed by …
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Dissertations
In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies.
First, to improve the prediction accuracy of learning …
A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega
A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega
MSU Graduate Theses
The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …
D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin
D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin
Mathematics & Statistics Theses & Dissertations
Dependent longitudinal binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. A popular method for analyzing such data is the multivariate probit (MP) model. The motivation for this dissertation stems from the fact that the MP model fails even the binary correlations are within the feasible range. The reason being the underlying correlation matrix of the latent variables in the MP model may not be positive definite. In this dissertation, we study alternatives that are based on D-vine pair-copula models. We consider both the serial dependence modeled by the first order autoregressive (AR(1)) and …
Lectures On Mathematical Computing With Python, Jay Gopalakrishnan
Lectures On Mathematical Computing With Python, Jay Gopalakrishnan
PDXOpen: Open Educational Resources
This open resource is a collection of class activities for use in undergraduate courses aimed at teaching mathematical computing, and computational thinking in general, using the python programming language. It was developed for a second-year course (MTH 271) revamped for a new undergraduate program in data science at Portland State University. The activities are designed to guide students' use of python modules effectively for scientific computation, data analysis, and visualization.
Adopt/Adapt
If you are an instructor adopting or adapting this open educational resource, please help us understand your use by filling out this form
Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta
Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta
Department of Statistics: Dissertations, Theses, and Student Work
Novel diagnostic tests are usually compared with gold standard tests for evaluating diagnostic accuracy. For assessing antimicrobial resistance (AMR) to bovine respiratory disease (BRD) pathogens, phenotypic broth microdilution method is used as gold standard (GS). The objective of the thesis is to evaluate the optimal cycle threshold (Ct) generated by real-time polymerase chain reaction (rtPCR) to genes that confer resistance that will translate to the phenotypic classification of AMR. Data from two different methodologies are assessed to identify Ct that will discriminate between resistance (R) and susceptibility (S). First, the receiver operating characteristic (ROC) curve was used to determine the …
Latent Class Models For At-Risk Populations, Shuaimin Kang
Latent Class Models For At-Risk Populations, Shuaimin Kang
Doctoral Dissertations
Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …
Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen
Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen
Statistical Science Theses and Dissertations
Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …
Working Children On Java Island 2017, Yuniarti
Working Children On Java Island 2017, Yuniarti
English Language Institute
Children's wellbeing has currently become a global concern as many of them are engaged in the labor force. A small area estimation (SAE) technique, EBLUP under Fey Herriot model, is employed to reveal their number in regencies of Java Island. Statistics have been disaggregated by geographical location (urban/rural) and gender. These statistics are required by the government as the basis for policy making.
Mathematical Modeling: Instructor And Student Resources, Marnie Phipps, Patty Wagner
Mathematical Modeling: Instructor And Student Resources, Marnie Phipps, Patty Wagner
Mathematics Ancillary Materials
This collection of student and instructor materials for Mathematical Modeling contains lesson plans, lecture slides, homework, learning goals, and student notes for the following major topics:
- Linear Functions
- Quadratic Functions
- Exponential Functions
- Logarithmic Functions
This is a materials update for a collection of materials created for a Round Nine ALG Textbook Transformation Grant.
Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers
Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers
Graduate Theses and Dissertations
Clostridioides (formally Clostridium) difficile is a medically relevant pathogen pertinent to infectious disease research. C. difficile is distinctly known for its ability to produce two toxins, enterotoxin A and cytotoxin B, and the propensity to colonize the mammalian gastrointestinal tract. It is known that metabolism is tightly correlated with sporulation in endospore producers such as C. difficile, but an interesting and novel regulatory relationship found by the Ivey lab has yet to be understood. The relationship explored in this study is observed between the sporulation factor, SpoIIE, which represses expression of an ABC peptide transporter, app. In this study, two …
Italian Sociologists: A Community Of Disconnected Groups, Aliakbar Akbaritabar, Vincent Traag, Alberto Caimo, Flaminio Squazzoni
Italian Sociologists: A Community Of Disconnected Groups, Aliakbar Akbaritabar, Vincent Traag, Alberto Caimo, Flaminio Squazzoni
Articles
Examining coauthorship networks is key to study scientific collaboration patterns and structural characteristics of scientific communities. Here, we studied coauthorship networks of sociologists in Italy, using temporal and multi-level quantitative analysis. By looking at publications indexed in Scopus, we detected research communities among Italian sociologists. We found that Italian sociologists are fractured in many disconnected groups. The giant connected component of the Italian sociology could be split into five main groups with a mixture of three main disciplinary topics: sociology of culture and communication (present in two groups), economic sociology (present in three groups) and general sociology (present in three …
Extensions Of Classification Method Based On Quantiles, Yuanhao Lai
Extensions Of Classification Method Based On Quantiles, Yuanhao Lai
Electronic Thesis and Dissertation Repository
This thesis deals with the problem of classification in general, with a particular focus on heavy-tailed or skewed data. The classification problem is first formalized by statistical learning theory and several important classification methods are reviewed, where the distance-based classifiers, including the median-based classifier and the quantile-based classifier (QC), are especially useful for the heavy-tailed or skewed inputs. However, QC is limited by its model capacity and the issue of high-dimensional accumulated errors. Our objective of this study is to investigate more general methods while retaining the merits of QC.
We present four extensions of QC, which appear in chronological …