Open Access. Powered by Scholars. Published by Universities.^{®}
Statistics and Probability Commons^{™}
Open Access. Powered by Scholars. Published by Universities.^{®}
 Discipline

 Applied Statistics (3098)
 Social and Behavioral Sciences (2297)
 Biostatistics (2259)
 Life Sciences (1953)
 Medicine and Health Sciences (1741)

 Statistical Theory (1580)
 Mathematics (1396)
 Statistical Models (1252)
 Statistical Methodology (1077)
 Public Health (861)
 Computer Sciences (820)
 Agriculture (768)
 Applied Mathematics (727)
 Engineering (635)
 Epidemiology (597)
 Business (547)
 Genetics and Genomics (461)
 Economics (427)
 Longitudinal Data Analysis and Time Series (394)
 Probability (361)
 Categorical Data Analysis (344)
 Design of Experiments and Sample Surveys (343)
 Multivariate Analysis (341)
 Medical Sciences (314)
 Medical Specialties (312)
 Education (292)
 Bioinformatics (291)
 Other Statistics and Probability (288)
 Institution

 Iowa State University (1664)
 COBRA (1091)
 Wayne State University (1090)
 Selected Works (1089)
 University of Pennsylvania (756)

 SelectedWorks (491)
 Kansas State University Libraries (490)
 Marquette University (286)
 University of Kentucky (233)
 Utah State University (225)
 Wright State University (200)
 University of Nevada, Las Vegas (176)
 Missouri University of Science and Technology (162)
 University of Nebraska  Lincoln (161)
 Himmelfarb Health Sciences Library, The George Washington University (157)
 California Polytechnic State University, San Luis Obispo (139)
 University of Massachusetts Medical School (137)
 Old Dominion University (112)
 University of South Florida (107)
 Western University (107)
 Virginia Commonwealth University (102)
 Brigham Young University (99)
 University of South Carolina (99)
 University of Massachusetts Amherst (95)
 Western Michigan University (88)
 Cornell University Law School (79)
 University of Iowa (78)
 The University of Maine (73)
 University of New Mexico (71)
 Claremont Colleges (69)
 Keyword

 Statistics (739)
 Humans (164)
 Female (114)
 Male (108)
 Simulation (104)

 Bayesian (98)
 Machine learning (94)
 Aged (79)
 Bootstrap (75)
 Empirical legal studies (73)
 Regression (72)
 Reliability (71)
 Middle Aged (70)
 Logistic regression (69)
 Probability (68)
 Classification (66)
 Prediction (66)
 Model selection (65)
 Causal inference (61)
 Estimation (58)
 Forecasting (58)
 Maximum likelihood (58)
 Missing data (57)
 Longitudinal data (56)
 Gene expression (56)
 Power (56)
 Genetics (55)
 Microarray (54)
 Mathematics (54)
 Bias (53)
 Publication Year
 Publication

 Journal of Modern Applied Statistical Methods (1030)
 Statistics Papers (571)
 Conference on Applied Statistics in Agriculture (489)
 Retrospective Theses and Dissertations (478)
 Theses and Dissertations (434)

 Graduate Theses and Dissertations (297)
 Mathematics, Statistics and Computer Science Faculty Research and Publications (279)
 Statistics Publications (255)
 U.C. Berkeley Division of Biostatistics Working Paper Series (242)
 UW Biostatistics Working Paper Series (215)
 Harvard University Biostatistics Working Paper Series (203)
 Johns Hopkins University, Dept. of Biostatistics Working Papers (178)
 Electronic Theses and Dissertations (160)
 Statistics Preprints (142)
 Mathematics and Statistics Faculty Publications (134)
 Mathematics and Statistics Faculty Research & Creative Works (133)
 Publicly Accessible Penn Dissertations (110)
 The University of Michigan Department of Biostatistics Working Paper Series (109)
 Epidemiology Faculty Publications (106)
 Statistics (105)
 All Graduate Plan B and other Reports (105)
 All Graduate Theses and Dissertations (101)
 Faculty Publications, Department of Statistics (100)
 Doctoral Dissertations (97)
 Electronic Thesis and Dissertation Repository (93)
 International Conference on Gambling & Risk Taking (88)
 COBRA Preprint Series (85)
 Dissertations (79)
 Cornell Law Faculty Publications (79)
 Population and Quantitative Health Sciences Publications (77)
 Publication Type
Articles 1  30 of 12116
FullText Articles in Statistics and Probability
"A Comparison Of Variable Selection Methods Using Bootstrap Samples From Environmental Metal Mixture Data", PaulYvann Djamen 4785403, PaulYvann Djamen
"A Comparison Of Variable Selection Methods Using Bootstrap Samples From Environmental Metal Mixture Data", PaulYvann Djamen 4785403, PaulYvann Djamen
Mathematics & Statistics ETDs
In this thesis, I studied a newly developed variable selection method SODA, and three customarily used variable selection methods: LASSO, Elastic net, and Random forest for environmental mixture data. The motivating datasets have neurodevelopmental status as responses and metal measurements and demographic variables as covariates. The challenges for variable selections include (1) many measured metal concentrations are highly correlated, (2) there are many possible ways of modeling interactions among the metals, (3) the relationships between the outcomes and explanatory variables are possibly nonlinear, (4) the signal to noise ratio in the real data may be low. To compare these methods ...
Comparing Means Under Heteroscedasticity And Nonnormality: Further Exploring Robust Means Modeling, Alyssa Counsell, Robert Philip Chalmers, Robert A. Cribbie
Comparing Means Under Heteroscedasticity And Nonnormality: Further Exploring Robust Means Modeling, Alyssa Counsell, Robert Philip Chalmers, Robert A. Cribbie
Journal of Modern Applied Statistical Methods
Comparing the means of independent groups is a concern when the assumptions of normality and variance homogeneity are violated. Robust means modeling (RMM) was proposed as an alternative to ANOVAtype procedures when the assumptions of normality and variance homogeneity are violated. The purpose of this study is to compare the Type I error and power rates of RMM to the trimmed Welch procedure. A Monte Carlo study was used to investigate RMM and the trimmed Welch procedure under several conditions of nonnormality and variance heterogeneity. The results suggest that the trimmed Welch provides a better balance of Type I error ...
Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox
Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox
Journal of Modern Applied Statistical Methods
For a binary random variable Y, let p(x) = P(Y = 1  X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodnessoffit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the sample size increases. The goal is to suggest an alternative approach based on a smoother, which provides a more flexible approximation of p(x).
A Note On Inferences About The Probability Of Success, Rand Wilcox
A Note On Inferences About The Probability Of Success, Rand Wilcox
Journal of Modern Applied Statistical Methods
There is an extensive literature dealing with inferences about the probability of success. A minor goal in this note is to point out when certain recommended methods can be unsatisfactory when the sample size is small. The main goal is to report results on the twosample case. Extant results suggest using one of four methods. The results indicate when computing a 0.95 confidence interval, two of these methods can be more satisfactory when dealing with small sample sizes.
At The Interface Of Algebra And Statistics, TaiDanae Bradley
At The Interface Of Algebra And Statistics, TaiDanae Bradley
All Dissertations, Theses, and Capstone Projects
This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals ...
Research In Short Term Actuarial Modeling, Elijah Howells
Research In Short Term Actuarial Modeling, Elijah Howells
Electronic Theses, Projects, and Dissertations
This paper covers mathematical methods used to conduct actuarial analysis in the short term, such as policy deductible analysis, maximum covered loss analysis, and mixtures of distributions. Assessment of a loss variable's distribution under the effect of a policy deductible, as well as one with an implemented maximum covered loss, and under both a policy deductible and maximum covered loss will also be covered. The derivation, meaning, and use of cost per loss and cost per payment will be discussed, as will those of an aggregate sum distribution, stop loss policy, and maximum likelihood estimation. For each topic, special ...
Integrated Multiple Mediation Analysis: A Robustness–Specificity TradeOff In Causal Structure, AnShun Tai, ShengHsuan Lin
Integrated Multiple Mediation Analysis: A Robustness–Specificity TradeOff In Causal Structure, AnShun Tai, ShengHsuan Lin
Harvard University Biostatistics Working Paper Series
Recent methodological developments in causal mediation analysis have addressed several issues regarding multiple mediators. However, these developed methods differ in their definitions of causal parameters, assumptions for identification, and interpretations of causal effects, making it unclear which method ought to be selected when investigating a given causal effect. Thus, in this study, we construct an integrated framework, which unifies all existing methodologies, as a standard for mediation analysis with multiple mediators. To clarify the relationship between existing methods, we propose four strategies for effect decomposition: twoway, partially forward, partially backward, and complete decompositions. This study reveals how the direct and ...
WaitingTime Paradox In 1922, Naoki Masuda, Takayuki Hiraoka
WaitingTime Paradox In 1922, Naoki Masuda, Takayuki Hiraoka
Northeast Journal of Complex Systems (NEJCS)
We present an English translation and discussion of an essay that a Japanese physicist, Torahiko Terada, wrote in 1922. In the essay, he described the waitingtime paradox, also called the bus paradox, which is a known mathematical phenomenon in queuing theory, stochastic processes, and modern temporal network analysis. He also observed and analyzed data on Tokyo City trams to verify the relevance of the waitingtime paradox to busy passengers in Tokyo at the time. This essay seems to be one of the earliest documentations of the waitingtime paradox in a sufficiently scientific manner.
Metabolomic Profiling Of Nicotiana Spp. Nectars Indicate That Pollinator Feeding Preference Is A Stronger Determinant Than Plant Phylogenetics In Shaping Nectar Diversity, Fredy A. Silva, Elizabeth C. Chatt, SitiNabilla Mahalim, Adel Guirgis, Xingche Guo, Dan S. Nettleton, Basil J. Nikolau, Robert W. Thornburg
Metabolomic Profiling Of Nicotiana Spp. Nectars Indicate That Pollinator Feeding Preference Is A Stronger Determinant Than Plant Phylogenetics In Shaping Nectar Diversity, Fredy A. Silva, Elizabeth C. Chatt, SitiNabilla Mahalim, Adel Guirgis, Xingche Guo, Dan S. Nettleton, Basil J. Nikolau, Robert W. Thornburg
Statistics Publications
Floral nectar is a rich secretion produced by the nectary gland and is offered as reward to attract pollinators leading to improved seed set. Nectars are composed of a complex mixture of sugars, amino acids, proteins, vitamins, lipids, organic and inorganic acids. This composition is influenced by several factors, including floral morphology, mechanism of nectar secretion, time of flowering, and visitation by pollinators. The objective of this study was to determine the contributions of flowering time, plant phylogeny, and pollinator selection on nectar composition in Nicotiana. The main classes of nectar metabolites (sugars and amino acids) were quantified using gas ...
On Statistical Significance Of Discriminant Function Coefficients, Tolulope T. Sajobi, Gordon H. Fick, Lisa M. Lix
On Statistical Significance Of Discriminant Function Coefficients, Tolulope T. Sajobi, Gordon H. Fick, Lisa M. Lix
Journal of Modern Applied Statistical Methods
Discriminant function coefficients are useful for describing group differences and identifying variables that distinguish between groups. Test procedures were compared based on asymptotically approximations, empirical, and exact distributions for testing hypotheses about discriminant function coefficients. These tests are useful for assessing variable importance in multivariate group designs.
Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen
Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen
Statistical Science Theses and Dissertations
In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.
Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda
Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda
Statistical Science Theses and Dissertations
For degradation data in reliability analysis, estimation of the firstpassage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible ...
Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma
Department of Education Policy and Leadership Theses and Dissertations
The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains ...
Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden
Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden
Biology and Medicine Through Mathematics Conference
No abstract provided.
An Improved Two IndependentSamples Randomization Test For SingleCase AbType Intervention Designs: A 20Year Journey, Joel R. Levin, John M. Ferron, Boris S. Gafurov
An Improved Two IndependentSamples Randomization Test For SingleCase AbType Intervention Designs: A 20Year Journey, Joel R. Levin, John M. Ferron, Boris S. Gafurov
Journal of Modern Applied Statistical Methods
Detailed is a 20year arduous journey to develop a statistically viable twophase (AB) singlecase two independentsamples randomization test procedure. The test is designed to compare the effectiveness of two different interventions that are randomly assigned to cases. In contrast to the unsatisfactory simulation results produced by an earlier proposed randomization test, the present test consistently exhibited acceptable Type I error control under various design and effecttype configurations, while at the same time possessing adequate power to detect moderately sized interventiondifference effects. Selected issues, applications, and a multiplebaseline extension of the twosample test are discussed.
Support Vector MachineBased Modified Sp Statistic For Subset Selection With NonNormal Error Terms, Shivaji Shripati Desai, D N. Kashid
Support Vector MachineBased Modified Sp Statistic For Subset Selection With NonNormal Error Terms, Shivaji Shripati Desai, D N. Kashid
Journal of Modern Applied Statistical Methods
Support vector machine (SVM) is used for estimation of regression parameters to modify the sum of cross products (Sp). It works well for some nonnormal error distributions. The performance of existing robust methods and the modified Sp is evaluated through simulated and real data. The results show the performance of the modified Sp is good.
Recurrence Relations For Marginal And Joint Moment Generating Functions Of ToppLeone Generated Exponential Distribution Based On Record Values And Its Characterization, Zaki Anwar, Neetu Gupta, Mohd Akram Raza Khan, Qazi Azhad Jamal
Recurrence Relations For Marginal And Joint Moment Generating Functions Of ToppLeone Generated Exponential Distribution Based On Record Values And Its Characterization, Zaki Anwar, Neetu Gupta, Mohd Akram Raza Khan, Qazi Azhad Jamal
Journal of Modern Applied Statistical Methods
The exact expressions and some recurrence relations are derived for marginal and joint moment generating functions of k^{th} lower record values from ToppLeone Generated (TLG) Exponential distribution. This distribution is characterized by using the recurrence relation of the marginal moment generating function of k^{th} lower record values.
A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And NonConventional Location Parameters, Housila P. Singh, Anita Yadav
A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And NonConventional Location Parameters, Housila P. Singh, Anita Yadav
Journal of Modern Applied Statistical Methods
Classes of ratiotype estimators t (say) and ratiotype exponential estimators t_{e} (say) of the population mean are proposed, and their biases and mean squared errors under large sample approximation are presented. It is the class of ratiotype exponential estimators t_{e} provides estimators more efficient than the ratiotype estimators.
Decision Tree For Predicting The Party Of Legislators, Afsana Mimi
Decision Tree For Predicting The Party Of Legislators, Afsana Mimi
Publications and Research
The motivation of the project is to identify the legislators who voted frequently against their party in terms of their roll call votes using Office of Clerk U.S. House of Representatives Data Sets collected in 2018 and 2019. We construct a model to predict the parties of legislators based on their votes. The method we used is Decision Tree from Data Mining. Python was used to collect raw data from internet, SAS was used to clean data, and all other calculations and graphical presentations are performed using the R software.
FirstYear Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc
FirstYear Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc
Electronic Theses and Dissertations
This study examined student perceptions and experiences of an introductory Computer Science course at the University of Maine; COS 125: Introduction to Problem Solving Using Computer Programs. It also explored the pathways that students pursue after taking COS 125, depending on their success in the course, and their motivation to persist. Through characterizing student populations and their performance in their first semester in the Computer Science program, they can be placed into one of three categories that explain their path; a “continuer” (passed COS 125 and decided to stay in the major), a “persister” (did not pass COS 125 and ...
Using Stability To Select A Shrinkage Method, Dean Dustin
Using Stability To Select A Shrinkage Method, Dean Dustin
Dissertations and Theses in Statistics
Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The ...
Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey
Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey
Ecology, Evolution and Organismal Biology Publications
Patterns of vegetation distribution at regional to subcontinental scales can inform understanding of climate. Delineating ecoregion boundaries over geologic time is complicated by the difficulty of distinguishing between prairie types at broad spatial scales using the pollen record. Pollen ratios are sometimes employed to distinguish between vegetation types, although their applicability is often limited to a geographic range. The Neotoma Paleoecology Database offers an unparalleled opportunity to synthesize a large number of pollen datasets. Ambrosia (ragweed) is a genus of mesic‐adapted species sensitive to summer moisture. Artemisia (sagebrush, wormwood, mugwort) is a genus of dry‐mesic‐adapted species resilient ...
The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte
Student Scholar Symposium Abstracts and Posters
Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogendeficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The ...
Life And Death: Quantifying The Risk Of Heart Disease With Machine Learning, Jack Scott Glienke
Life And Death: Quantifying The Risk Of Heart Disease With Machine Learning, Jack Scott Glienke
Honors Program Theses
Coronary heart disease has long been a key area of focus in the discussion of public health. As such, numerous studies have been conducted throughout history with the sole intention of identifying risk factors leading to the onset of cardiovascular conditions. A plethora of statistical procedures can be used to identify an individual’s risk of developing heart disease, yet regression models tend to be the default tool used by researchers. Using the data obtained from the most influential cardiovascular study to date, the Framingham Heart Study, this analysis uses machine learning techniques to generate and test the predictive power ...
Gait Characterization Using Computer Vision Video Analysis, Martha T. Gizaw
Gait Characterization Using Computer Vision Video Analysis, Martha T. Gizaw
Undergraduate Honors Theses
The World Health Organization reports that falls are the secondleading cause of accidental death among senior adults around the world. Currently, a research team at William & Mary’s Department of Kinesiology & Health Sciences attempts to recognize and correct agingrelated factors that can result in falling. To meet this goal, the members of that team videotape walking tests to examine individual gait parameters of older subjects. However, they undergo a slow, laborious process of analyzing video frame by video frame to obtain such parameters. This project uses computer vision software to reconstruct walking models from residents of an independent living retirement ...
Modeling Movement: A MachineLearning Approach To Track Migration Routes After Displacement, Ethan Harrison
Modeling Movement: A MachineLearning Approach To Track Migration Routes After Displacement, Ethan Harrison
Undergraduate Honors Theses
Over the past decade, the number of individuals internally displaced by conflict (IDPs) has reached unprecedented levels. Humanitarian actors and firstresponders face persistent information gaps in meeting the needs of these populations. Specifically, they face challenges in understanding where and how IDPs move after they are displaced, which is necessary to locate them in conflictaffected situations and provide them with lifesaving assistance. In this paper, I propose a framework, using established machinelearning methods, to forecast the migration routes of these displaced populations (Chapter 1). In a case study of displacement in Yemen, my models predict 80% of IDPs' migration routes ...
'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst
'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst
All Graduate Theses and Dissertations
In order to effectively create mapbased visualizations, some map modifications need to be conducted to ensure the map is readable and interpretable. There are several issues that need to be addressed to achieve this. The boundaries of a country may be overly complex which is particularly true with coastal areas of countries. Regions may be small and not seen in the final plot, as is the case with many capital cities in the world’s countries such as Washington D.C. and the Federal District of Mexico City. In other countries, regions may geographically lie far away from the rest ...
Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim
Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim
Engineering and Applied Science Theses & Dissertations
Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross sectional nature of training and prediction processes. Finding temporal patterns in EHR is ...
Applications Of Machine Learning In HighFrequency Trade Direction Classification, Jared E. Hansen
Applications Of Machine Learning In HighFrequency Trade Direction Classification, Jared E. Hansen
All Graduate Theses and Dissertations
The correct assignment of trades as buyerinitiated or sellerinitiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving stateoftheart results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and ...
Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever
Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever
HCA Healthcare Journal of Medicine
This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include ttests, ANOVA and chisquare tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.