Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Multivariate Analysis (8)
- Medicine and Health Sciences (5)
- Applied Statistics (3)
- Clinical Epidemiology (3)
- Data Science (3)
-
- Public Health (3)
- Statistical Methodology (3)
- Biostatistics (2)
- Chemistry (2)
- Epidemiology (2)
- Microarrays (2)
- Other Chemistry (2)
- Other Statistics and Probability (2)
- Statistical Theory (2)
- Analytical Chemistry (1)
- Art and Design (1)
- Arts and Humanities (1)
- Bioinformatics (1)
- Biometry (1)
- Categorical Data Analysis (1)
- Computational Biology (1)
- Computer Sciences (1)
- Design of Experiments and Sample Surveys (1)
- Digital Humanities (1)
- Electrical and Computer Engineering (1)
- Engineering (1)
- Genetics and Genomics (1)
- Institution
- Publication Year
- Publication
- Publication Type
Articles 1 - 12 of 12
Full-Text Articles in Statistical Models
Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre
Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre
SMU Data Science Review
Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …
Analyzing Relationships With Machine Learning, Oscar Ko
Analyzing Relationships With Machine Learning, Oscar Ko
Dissertations, Theses, and Capstone Projects
Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.
The dataset is from a Stanford University survey, “How Couples …
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Dissertations
In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies.
First, to improve the prediction accuracy of learning …
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
SMU Data Science Review
In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …
Deep Learning Analysis Of Limit Order Book, Xin Xu
Deep Learning Analysis Of Limit Order Book, Xin Xu
Arts & Sciences Electronic Theses and Dissertations
In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is non-available for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages …
Data Analysis Methods Using Persistence Diagrams, Andrew Marchese
Data Analysis Methods Using Persistence Diagrams, Andrew Marchese
Doctoral Dissertations
In recent years, persistent homology techniques have been used to study data and dynamical systems. Using these techniques, information about the shape and geometry of the data and systems leads to important information regarding the periodicity, bistability, and chaos of the underlying systems. In this thesis, we study all aspects of the application of persistent homology to data analysis. In particular, we introduce a new distance on the space of persistence diagrams, and show that it is useful in detecting changes in geometry and topology, which is essential for the supervised learning problem. Moreover, we introduce a clustering framework directly …
Advanced Data Analysis - Lecture Notes, Erik B. Erhardt, Edward J. Bedrick, Ronald M. Schrader
Advanced Data Analysis - Lecture Notes, Erik B. Erhardt, Edward J. Bedrick, Ronald M. Schrader
Open Textbooks
Lecture notes for Advanced Data Analysis (ADA1 Stat 427/527 and ADA2 Stat 428/528), Department of Mathematics and Statistics, University of New Mexico, Fall 2016-Spring 2017. Additional material including RMarkdown templates for in-class and homework exercises, datasets, R code, and video lectures are available on the course websites: https://statacumen.com/teaching/ada1 and https://statacumen.com/teaching/ada2 .
Contents
I ADA1: Software
- 0 Introduction to R, Rstudio, and ggplot
II ADA1: Summaries and displays, and one-, two-, and many-way tests of means
- 1 Summarizing and Displaying Data
- 2 Estimation in One-Sample Problems
- 3 Two-Sample Inferences
- 4 Checking Assumptions
- 5 One-Way Analysis of Variance
III ADA1: Nonparametric, categorical, …
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Dissertations & Theses (Open Access)
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays …
Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris
Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris
Jeffrey S. Morris
In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang, Gary M. Longton
UW Biostatistics Working Paper Series
No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields …
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang
Combining Predictors For Classification Using The Area Under The Roc Curve, Margaret S. Pepe, Tianxi Cai, Zheng Zhang
UW Biostatistics Working Paper Series
We compare simple logistic regression with an alternative robust procedure for constructing linear predictors to be used for the two state classification task. Theoritical advantages of the robust procedure over logistic regression are: (i) although it assumes a generalized linear model for the dichotomous outcome variable, it does not require specification of the link function; (ii) it accommodates case-control designs even when the model is not logistic; and (iii) it yields sensible results even when the generalized linear model assumption fails to hold. Surprisingly, we find that the linear predictor derived from the logistic regression likelihood is very robust in …
Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe
Semi-Parametric Regression For The Area Under The Receiver Operating Characteristic Curve, Lori E. Dodd, Margaret S. Pepe
UW Biostatistics Working Paper Series
Medical advances continue to provide new and potentially better means for detecting disease. Such is true in cancer, for example, where biomarkers are sought for early detection and where improvements in imaging methods may pick up the initial functional and molecular changes associated with cancer development. In other binary classification tasks, computational algorithms such as Neural Networks, Support Vector Machines and Evolutionary Algorithms have been applied to areas as diverse as credit scoring, object recognition, and peptide-binding prediction. Before a classifier becomes an accepted technology, it must undergo rigorous evaluation to determine its ability to discriminate between states. Characterization of …