Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Data analysis

Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 34

Full-Text Articles in Physical Sciences and Mathematics

Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin Jul 2023

Statistical Methods For Single Cell Sequencing Data Analysis, Fei Qin

Theses and Dissertations

The recent emergence of single cell sequencing (SCS) technology has provided us with single-cell DNA or RNA sequencing (scDNA/RNA-seq) information to investigate cellular evolutionary relationships. Despite many analysis methods have been developed to infer intra-tumor genetic heterogeneity, cluster cellular subclones, detect genetic mutations, and investigate spatially variable (SV) genes, exploring SCS data remains statistically challenging due to its noisy nature.

To identify subclones with scDNA-seq data, many existing studies use an independent statistical model to detect copy number profile in the first step, followed by classical clustering methods for subclone identification in downstream analyses. However, spurious results might be generated …


The Effectiveness Of Visualization Techniques For Supporting Decision-Making, Cansu Yalim, Holly A. H. Handley Apr 2023

The Effectiveness Of Visualization Techniques For Supporting Decision-Making, Cansu Yalim, Holly A. H. Handley

Modeling, Simulation and Visualization Student Capstone Conference

Although visualization is beneficial for evaluating and communicating data, the efficiency of various visualization approaches for different data types is not always evident. This research aims to address this issue by investigating the usefulness of several visualization techniques for various data kinds, including continuous, categorical, and time-series data. The qualitative appraisal of each technique's strengths, weaknesses, and interpretation of the dataset is investigated. The research questions include: which visualization approaches perform best for different data types, and what factors impact their usefulness? The absence of clear directions for both researchers and practitioners on how to identify the most effective visualization …


Financial Literacy: Self-Evaluation And Reality, Yangsijia Wang Aug 2022

Financial Literacy: Self-Evaluation And Reality, Yangsijia Wang

Undergraduate Student Research Internships Conference

This study is on the topic of financial literacy, with the data source containing information on clients' demographic information and self-evaluation, change in account value, and trade record, three major problems were investigated: first, whether a client's demographic traits are related to his/her self-evaluation of financial knowledge level; second, does the trading behaviour differ for clients who self-identified as in different financial knowledge groups; and third, do people who self-identified as financially knowledgeable have better investment result. Data manipulation was done using SQL and R. Exploratory analysis including multiple types of plots and proportion tables was used to derive the …


A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo Jun 2022

A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo

FIU Electronic Theses and Dissertations

Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadway networks. Underlying these simulators are mathematical models of microscopic driver behavior from which macroscopic measures of flow and congestion can be recovered. Many models are intended to apply to only a subset of possible traffic scenarios and roadway configurations, while others do not have any explicit constraint on their applicability. Work zones on highways are one scenario for which no model invented to date has been shown to accurately reproduce realistic driving behavior. This makes it difficult to optimize for safety and other …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell May 2021

Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell

Undergraduate Theses and Capstone Projects

This thesis analyzes the correlation between a team’s statistics and the success of their performances, and develops a predictive model that can be used to forecast final season results for that team. Data from the 2017-2018 Premier League season is to be gathered and broken down within R to highlight what factors and variables are largely contributing to the success or downfall of a team. A multiple linear regression model and stepwise selection process is then used to include any factors that are significant in predicting in match results.

The predictions about the 17-18 season results based on the model …


Analysis Of Surface Temperature Trends Of Global Lakes Using Satellite Remote Sensing And In Situ Observations, Christal Jean Soverall, Zahida Yasmin, Mahoutin Godnou, Wen Yong Huang, Ryan Chen, Abdou Bah, Hamidreza Norouzi, Reginald Blake Aug 2020

Analysis Of Surface Temperature Trends Of Global Lakes Using Satellite Remote Sensing And In Situ Observations, Christal Jean Soverall, Zahida Yasmin, Mahoutin Godnou, Wen Yong Huang, Ryan Chen, Abdou Bah, Hamidreza Norouzi, Reginald Blake

Publications and Research

Even though lakes make up a small percentage of the water bodies on the global land surface, lakes provide critically important ecosystem services. Unfortunately, however, several lake surface areas around the globe have been changing with many of them drastically decreasing due to climate variability and local mismanagement at the basin-scale level. Lake Surface Water Temperature (LSWT) is recognized as a critical indicator of climate change in lakes. The changes in water and the surrounding land temperatures may be an indicator of climate variability if there is consistency between changes in both temperatures. This project focuses on the application of …


Circada: Shiny Apps For Exploration Of Experimental And Synthetic Circadian Time Series With An Educational Emphasis, Lisa Cenek, Liubou Klindziuk, Cindy Lopez, Eleanor Mccartney, Blanca Martin Burgos, Selma Tir, Mary E. Harrington, Tanya L. Leise Apr 2020

Circada: Shiny Apps For Exploration Of Experimental And Synthetic Circadian Time Series With An Educational Emphasis, Lisa Cenek, Liubou Klindziuk, Cindy Lopez, Eleanor Mccartney, Blanca Martin Burgos, Selma Tir, Mary E. Harrington, Tanya L. Leise

Psychology: Faculty Publications

Circadian rhythms are daily oscillations in physiology and behavior that can be assessed by recording body temperature, locomotor activity, or bioluminescent reporters, among other measures. These different types of data can vary greatly in waveform, noise characteristics, typical sampling rate, and length of recording. We developed 2 Shiny apps for exploration of these data, enabling visualization and analysis of circadian parameters such as period and phase. Methods include the discrete wavelet transform, sine fitting, the Lomb-Scargle periodogram, autocorrelation, and maximum entropy spectral analysis, giving a sense of how well each method works on each type of data. The apps also …


Utilizing Design Structure For Improving Design Selection And Analysis, Ahlam Ali Alzharani Jan 2020

Utilizing Design Structure For Improving Design Selection And Analysis, Ahlam Ali Alzharani

Theses and Dissertations

Recent work has shown that the structure for design plays a role in the simplicity or complexity of data analysis. To increase the knowledge of research in these areas, this dissertation aims to utilize design structure for improving design selection and analysis. In this regard, minimal dependent sets and block diagonal structure are both important concepts that are relevant to the orthogonality of the columns of a design. We are interested in finding ways to improve the data analysis especially for active effect detection by utilizing minimal dependent sets and block diagonal structure for design.

We introduce a new classification …


Twitter And Disasters: A Social Resilience Fingerprint, Benjamin A. Rachunok, Jackson B. Bennett, Roshanak Nateghi May 2019

Twitter And Disasters: A Social Resilience Fingerprint, Benjamin A. Rachunok, Jackson B. Bennett, Roshanak Nateghi

Purdue University Libraries Open Access Publishing Fund

Understanding the resilience of a community facing a crisis event is critical to improving its adaptive capacity. Community resilience has been conceptualized as a function of the resilience of components of a community such as ecological, infrastructure, economic, and social systems, etc. In this paper, we introduce the concept of a “resilience fingerprint” and propose a multi-dimensional method for analyzing components of community resilience by leveraging existing definitions of community resilience with data from the social network Twitter. Twitter data from 14 events are analyzed and their resulting resilience fingerprints computed. We compare the fingerprints between events and show that …


Understanding Water Consumption And Energy Trends In New York City, Wen Yong Huang, Johann Thiel May 2019

Understanding Water Consumption And Energy Trends In New York City, Wen Yong Huang, Johann Thiel

Publications and Research

In this study, we will be using the NYC Open Data website to examine publicly available data sets on water and energy consumption in New York City. In particular, we will use various scientific programming and machine learning modules in Python to analyze and visualize trends in water and energy usage within the five boroughs.


A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith Apr 2019

A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith

Timothy Smith

Preface & Acknowledgments

This textbook is designed for a higher level undergraduate, perhaps even first year graduate, course for engineering or science students who are interested to gain knowledge of using data analysis to make predictive models. While there is no statistical perquisite knowledge required to read this book, due to the fact that the study is designed for the reader to truly understand the underlying theory rather than just learn how to read computer output, it would be best read with some familiarity of elementary statistics. The book is self-contained and the only true perquisite knowledge is a solid …


A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith Jan 2019

A Self-Contained Course In The Mathematical Theory Of Statistics For Scientists & Engineers With An Emphasis On Predictive Regression Modeling & Financial Applications., Tim Smith

Open Access Textbooks

Preface & Acknowledgments

This textbook is designed for a higher level undergraduate, perhaps even first year graduate, course for engineering or science students who are interested to gain knowledge of using data analysis to make predictive models. While there is no statistical perquisite knowledge required to read this book, due to the fact that the study is designed for the reader to truly understand the underlying theory rather than just learn how to read computer output, it would be best read with some familiarity of elementary statistics. The book is self-contained and the only true perquisite knowledge is a solid …


Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer Jan 2019

Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer

HMC Senior Theses

Given the rise in the application of neural networks to all sorts of interesting problems, it seems natural to apply them to statistical tests. This senior thesis studies whether neural networks built to classify discrete circular probability distributions can outperform a class of well-known statistical tests for uniformity for discrete circular data that includes the Rayleigh Test1, the Watson Test2, and the Ajne Test3. Each neural network used is relatively small with no more than 3 layers: an input layer taking in discrete data sets on a circle, a hidden layer, and an output …


Data Analysis Basics – Part Ii, Judith A. Savageau Mar 2018

Data Analysis Basics – Part Ii, Judith A. Savageau

Judith A. Savageau

Blog post to AEA365, a blog sponsored by the American Evaluation Association (AEA) dedicated to highlighting Hot Tips, Cool Tricks, Rad Resources, and Lessons Learned for evaluators. The American Evaluation Association is an international professional association of evaluators devoted to the application and exploration of program evaluation, personnel evaluation, technology, and many other forms of evaluation. Evaluation involves assessing the strengths and weaknesses of programs, policies, personnel, products, and organizations to improve their effectiveness.


Data Analysis Basics – Part I, Judith A. Savageau Mar 2018

Data Analysis Basics – Part I, Judith A. Savageau

Judith A. Savageau

Blog post to AEA365, a blog sponsored by the American Evaluation Association (AEA) dedicated to highlighting Hot Tips, Cool Tricks, Rad Resources, and Lessons Learned for evaluators. The American Evaluation Association is an international professional association of evaluators devoted to the application and exploration of program evaluation, personnel evaluation, technology, and many other forms of evaluation. Evaluation involves assessing the strengths and weaknesses of programs, policies, personnel, products, and organizations to improve their effectiveness.


An Old Tool With Enduring Value: Using Excel To Prepare Data For Analysis, Gregory A. Smith Mar 2018

An Old Tool With Enduring Value: Using Excel To Prepare Data For Analysis, Gregory A. Smith

Gregory A. Smith

Microsoft Excel was first released on the Windows platform 30 years ago and has since become widely used. Although new tools for manipulating, analyzing, and visualizing data are constantly emerging, Excel remains a potent tool—and not just because of newer features. Simple functions such as TRIM, MID, SUBSTITUTE, FIND, ROUNDDOWN, and VLOOKUP can be used to manipulate data sets in powerful ways. This workshop applies selected functions to realistic library data sets. Demonstrations include: deriving time-series categories from date and time stamps pre-coding survey comments based on keywords dealing with messy data points such as call numbers and publisher names …


An Old Tool With Enduring Value: Using Excel To Prepare Data For Analysis, Gregory A. Smith Feb 2018

An Old Tool With Enduring Value: Using Excel To Prepare Data For Analysis, Gregory A. Smith

Faculty Publications and Presentations

Microsoft Excel was first released on the Windows platform 30 years ago and has since become widely used. Although new tools for manipulating, analyzing, and visualizing data are constantly emerging, Excel remains a potent tool—and not just because of newer features. Simple functions such as TRIM, MID, SUBSTITUTE, FIND, ROUNDDOWN, and VLOOKUP can be used to manipulate data sets in powerful ways.

This workshop applies selected functions to realistic library data sets. Demonstrations include:

  • deriving time-series categories from date and time stamps
  • pre-coding survey comments based on keywords
  • dealing with messy data points such as …


Experimental Design And Data Analysis In Computer Simulation Studies In The Behavioral Sciences, Michael Harwell, Nidhi Kohli, Yadira Peralta Dec 2017

Experimental Design And Data Analysis In Computer Simulation Studies In The Behavioral Sciences, Michael Harwell, Nidhi Kohli, Yadira Peralta

Journal of Modern Applied Statistical Methods

Treating computer simulation studies as statistical sampling experiments subject to established principles of experimental design and data analysis should further enhance their ability to inform statistical practice and a program of statistical research. Latin hypercube designs to enhance generalizability and meta-analytic methods to analyze simulation results are presented.


Integrative Pathway Analysis Pipeline For Mirna And Mrna Data, Diana Mabel Diaz Herrera Jan 2017

Integrative Pathway Analysis Pipeline For Mirna And Mrna Data, Diana Mabel Diaz Herrera

Wayne State University Theses

The identification of pathways that are involved in a particular phenotype helps us understand the underlying biological processes. Traditional pathway analysis techniques aim to infer the impact on individual pathways using only mRNA levels. However, recent studies showed that gene expression alone is unable to capture the whole picture of biological phenomena. At the same time, MicroRNAs (miRNAs) are newly discovered gene regulators that have shown to play an important role in diagnosis, and prognosis for different types of diseases. Current pathway analysis techniques do not take miRNAs into consideration. In this project, we investigate the effect of integrating miRNA …


The Document Similarity Network: A Novel Technique For Visualizing Relationships In Text Corpora, Dylan Baker Jan 2017

The Document Similarity Network: A Novel Technique For Visualizing Relationships In Text Corpora, Dylan Baker

HMC Senior Theses

With the abundance of written information available online, it is useful to be able to automatically synthesize and extract meaningful information from text corpora. We present a unique method for visualizing relationships between documents in a text corpus. By using Latent Dirichlet Allocation to extract topics from the corpus, we create a graph whose nodes represent individual documents and whose edge weights indicate the distance between topic distributions in documents. These edge lengths are then scaled using multidimensional scaling techniques, such that more similar documents are clustered together. Applying this method to several datasets, we demonstrate that these graphs are …


Making Sense Out Of Big Data - Popular Machine Learning Tools In Business Analytics, Kuldeep Kumar, Sukanto Bhattacharya Apr 2015

Making Sense Out Of Big Data - Popular Machine Learning Tools In Business Analytics, Kuldeep Kumar, Sukanto Bhattacharya

Kuldeep Kumar

'Big data' is the new buzzword in academic as well as industry circles. Laney (2001) came up with the three Vs that characterize big data - volume, velocity and variety. When talking about big data one is usually referring to a huge volume, in terabytes rather than gigabytes, that is captured either across cross-section or across time or more likely across both i.e. as a panel. However it is the sheer size of the data set that puts big data in an entirely different category requiring a special set of analytical tools and approaches for extracting information and also data …


Ensemble Prediction And Data Assimilation For Operational Hydrology, Dong-Jun Seo, Yuqiong Liu, Hamid Moradkhani, Albrecht Weerts Dec 2014

Ensemble Prediction And Data Assimilation For Operational Hydrology, Dong-Jun Seo, Yuqiong Liu, Hamid Moradkhani, Albrecht Weerts

Civil and Environmental Engineering Faculty Publications and Presentations

This special section in the Journal of Hydrology will discuss the need for advancing hydrologic ensemble prediction and DA.


Exonest: Bayesian Model Selection Applied To The Detection And Characterization Of Exoplanets Via Photometric Variations, Ben Placek, Kevin H. Knuth, Daniel Angerhausen Oct 2014

Exonest: Bayesian Model Selection Applied To The Detection And Characterization Of Exoplanets Via Photometric Variations, Ben Placek, Kevin H. Knuth, Daniel Angerhausen

Physics Faculty Scholarship

EXONEST is an algorithm dedicated to detecting and characterizing the photometric signatures of exoplanets, which include reflection and thermal emission, Doppler boosting, and ellipsoidal variations. Using Bayesian inference, we can test between competing models that describe the data as well as estimate model parameters. We demonstrate this approach by testing circular versus eccentric planetary orbital models, as well as testing for the presence or absence of four photometric effects. In addition to using Bayesian model selection, a unique aspect of EXONEST is the potential capability to distinguish between reflective and thermal contributions to the light curve. A case study is …


Using Spss To Understand Research And Data Analysis, Daniel Arkkelin Jan 2014

Using Spss To Understand Research And Data Analysis, Daniel Arkkelin

Psychology Curricular Materials

No abstract provided.


The Single-Case Data Analysis Package: Analysing Single-Case Experiments With R Software, Isis Bulté, Patrick Onghena Nov 2013

The Single-Case Data Analysis Package: Analysing Single-Case Experiments With R Software, Isis Bulté, Patrick Onghena

Journal of Modern Applied Statistical Methods

The RcmdrPlugin.SCDA plug-in package is discussed. It integrates three R packages in the R commander interface: SCVA (for Single-Case Visual Analysis), SCRT (for Single-Case Randomization Tests), and SCMA (for Single-Case Meta-Analysis). This way the plug-in package covers three important steps in the analysis of single-case data.


A Data Generating Review That Bops, Twists And Pulls At Misconceptions, Kimberly Gardner Apr 2013

A Data Generating Review That Bops, Twists And Pulls At Misconceptions, Kimberly Gardner

Faculty and Research Publications

Statistics is an integral part of the K-12 mathematics curriculum (age 5-18). Naturally, students construct misconceptions of what they learn. This article discusses The Bop It© Challenge, a review activity assesses student understanding and reveals their misundertandings of statistical concepts.


Heterogeneity And Data Analysis, Peter J. Taylor Sep 2011

Heterogeneity And Data Analysis, Peter J. Taylor

Working Papers on Science in a Changing World

This working paper is a discussion paper for a September 2011 meeting of the research group of Prof. Di Cook on data visualization and exploratory data analysis at Iowa State University. A taxonomy of eleven kinds of heterogeneity is presented, followed by a set of vignettes that illustrate some of the meanings and sketch some implications, then a series of images that illustrate the heterogeneities. Several of the vignettes speak to a broad contention about heterogeneity and control: In relation to modern understandings of heredity and development over the life course, research and application of resulting knowledge are untroubled by …


How To Make Teaching Of Statistics More Effective In Business Schools?, Kuldeep Kumar Sep 2011

How To Make Teaching Of Statistics More Effective In Business Schools?, Kuldeep Kumar

Kuldeep Kumar

Statistics is taught in almost all Business Schools as a core course and prerequisite to may advance economics, finance and accountancy courses. However, Statistics has to be taught in a different way in Business Schools as compared to how it is taught in their own statistics department. There should be more emphasis on applications in Business area rather than theory. There has been lot of interest in teaching of statistics in Business schools for a very long time, for example see Cox (1965), Moore (1976) and Love and Hildebrand (2002). This paper discusses author's experience of teaching statistics in an …


An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu Mar 2011

An Approach To Nearest Neighboring Search For Multi-Dimensional Data, Yong Shi, Li Zhang, Lei Zhu

Faculty and Research Publications

Finding nearest neighbors in large multi-dimensional data has always been one of the research interests in data mining field. In this paper, we present our continuous research on similarity search problems. Previously we have worked on exploring the meaning of K nearest neighbors from a new perspective in PanKNN [20]. It redefines the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. It can be applied in various data mining fields. A large amount of real data sets have irrelevant or obstacle information which greatly affects the effectiveness …