Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 40

Full-Text Articles in Physical Sciences and Mathematics

A Review Of Recent Gene Expression-Based And Dna Methylation-Based Mathematical Cell Type Deconvolution Methods, Chenxiao Tian Aug 2023

A Review Of Recent Gene Expression-Based And Dna Methylation-Based Mathematical Cell Type Deconvolution Methods, Chenxiao Tian

Arts & Sciences Electronic Theses and Dissertations

In recent years, many cell type deconvolution methods based on DNA methylation data and gene expression data have been developed. Both of these two methods have its special advantages and disadvantages, e.g., DNA methylation-based methods’ data source is usually more stable than gene expression and DNA methylation is easier to measure in FFPE tissues or formalin-fixed paraffin-embedded, while some gene-expression data like scRNA-seq data usually has high cost and complexity. On the other hand, gene expression-based deconvolution methods currently have many more available methods than DNA methylation-based deconvolution methods, which leads to DNA methylation-based methods in many cases can learn …


Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng May 2023

Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng

Arts & Sciences Electronic Theses and Dissertations

Machine learning (ML) models are widely used to investigate the human connectome and to predict and understand behavior, emotion, and cognition. Prior research has organized pediatric connectome data using adult functional network models. However, this assumes that adult functional network models are appropriate and useful for prediction developmental outcomes from pediatric connectome data. We hypothesize that the application of adult brain network models could result in poor model fit, limiting the generalizability of results. Here, we test whether prediction of biological age is improved by concordant brain network models matching underlying functional connectome data. To quantify the difference in age …


Dealing With Dimensionality: Problems And Techniques In High-Dimensional Statistics, Cezareo Rodriguez Dec 2022

Dealing With Dimensionality: Problems And Techniques In High-Dimensional Statistics, Cezareo Rodriguez

Arts & Sciences Electronic Theses and Dissertations

In modern data analysis, problems involving high dimensional data with more variables than subjects is increasingly common. Two such cases are mediation analysis and distributed optimization. In Chapter 2 we start with an overview of high dimensional statistics and mediation analysis. In Chapter 3 we motivate and prove properties for a new marginal screening procedure for performing high dimensional mediation analysis. This screening procedure is shown via simulation to perform better than benchmark approaches and is applied to a DNA methylation study. In Chapter 4 we construct a cryptosystem that accurately performs distributed penalized quantile regression in the high-dimensional setting …


Kernel Estimation Of Spot Volatility And Its Application In Volatility Functional Estimation, Bei Wu Dec 2022

Kernel Estimation Of Spot Volatility And Its Application In Volatility Functional Estimation, Bei Wu

Arts & Sciences Electronic Theses and Dissertations

It\^o semimartingale models for the dynamics of asset returns have been widely studied in financial econometrics. A key component of the model, spot volatility, plays a crucial role in option pricing, portfolio management, and financial risk assessment. In this dissertation, we consider three problems related to the estimation of spot volatility using high-frequency asset returns. We first revisit the problem of estimating the spot volatility of an It\^o semimartingale using a kernel estimator. We prove a Central Limit Theorem with an optimal convergence rate for a general two-sided kernel under quite mild assumptions, which includes leverage effects and jumps of …


Contribution To Data Science: Time Series, Uncertainty Quantification And Applications, Dhrubajyoti Ghosh Dec 2022

Contribution To Data Science: Time Series, Uncertainty Quantification And Applications, Dhrubajyoti Ghosh

Arts & Sciences Electronic Theses and Dissertations

Time series analysis is an essential tool in modern world statistical analysis, with a myriad of real data problems having temporal components that need to be studied to gain a better understanding of the temporal dependence structure in the data. For example, in the stock market, it is of significant importance to identify the ups and downs of the stock prices, for which time series analysis is crucial. Most of the existing literature on time series deals with linear time series, or with Gaussianity assumption. However, there are multiple instances where the time series shows nonlinear trends, or when the …


Association Of Structural Variation (Sv) With Cardiometabolic Traits In Finns, Lei Chen Aug 2021

Association Of Structural Variation (Sv) With Cardiometabolic Traits In Finns, Lei Chen

Arts & Sciences Electronic Theses and Dissertations

Cardiovascular diseases (CVDs) are known to be associated with a variety of quantitative risk factors such as cholesterol, metabolites, and insulin. Understanding the genetic basis of these quantitative traits can shed light on the etiology, prevention, diagnosis, and treatment of disease. However most prior trait-mapping studies have focused on single nucleotide variants (SNVs) and Indels, with the contribution of structural variation (SV) remaining unknown. In this thesis, we present the results of a study examining genetic association between SVs and cardiometabolic traits in the Finnish population. In the first chapter, we used sensitive methods to identify and genotype 129,166 high-confidence …


Market Making In A Limit Order Book: Classical Optimal Control And Reinforcement Learning Approaches, Chuyi Yu Aug 2021

Market Making In A Limit Order Book: Classical Optimal Control And Reinforcement Learning Approaches, Chuyi Yu

Arts & Sciences Electronic Theses and Dissertations

Since the last decade, algorithmic trading has become one of the most significant developments in electronic security markets. Several types of problems and practices have been studied such as optimal execution, market making, statistical arbitrage, latency arbitrage, and so on. Among these, high-frequency market making plays a crucial role since it provides large liquidity to the market, which makes trading and investing cheaper for other market participants, and also creates sizable profits for high-frequency market makers (HFM) from the large quantity of round-trip executions involved in such practices. In this thesis, we discuss two approaches to solve the high-frequency market …


Smooth Ica Under Time Pattern Assumptions, Jiayi Fu Aug 2021

Smooth Ica Under Time Pattern Assumptions, Jiayi Fu

Arts & Sciences Electronic Theses and Dissertations

Independent component analysis (ICA) is wildly used in differently areas. As traditional ICA models make no assumptions on time pattern, they do not take time domain information into consideration. In this thesis, we introduced new assumptions that allow local dependence over time, and we built smooth ICA models to utilize the smoothness information for sources signals. Based on the local dependence assumptions, constrained optimization problems with smoothing penalty were discussed. Then we introduced smooth ICA estimators and estimating equations. Under local dependence assumptions, we gave proofs about the consistency and asymptotic normality of these estimators. We derived the Newton iterative …


Adaptive Optimal Market Making Strategies With Inventory Liquidation Cost, Yi Zhang May 2021

Adaptive Optimal Market Making Strategies With Inventory Liquidation Cost, Yi Zhang

Arts & Sciences Electronic Theses and Dissertations

Along the lines of the paper \cite{zoe}, we find a general form of the optimal market making strategy for a high-frequency market maker (HFM) in a discrete-time Limit Order Book (LOB) model. Unlike \cite{zoe}, the optimal market making strategy is adaptive depending on the arrival of Market Order (MO) in the previous time intervals. We provide a method to make each placement of Limit Orders (LO) dependent on previous information in the same trading day and prove the admissibility of the optimal market making strategy under some general assumptions. Empirical study shows the adaptive optimal strategies outperform the non-adaptive strategy …


Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes Jan 2021

Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes

Arts & Sciences Electronic Theses and Dissertations

Pediatric musculoskeletal disorders are an extremely broad category of diseases that are often inherited. While individually rare, collectively these disorders are common, affecting around 3% of live births in the US. Despite the mounting clinical and molecular evidence for a genetic etiology, the cause for many patients with pediatric musculoskeletal disorders remain largely unknown. Major challenges in rare pediatric diseases include recruiting large numbers of patients and determining the significance and functional impacts of variants associated with disease within individuals or families. Whole exome sequencing (WES) is a powerful tool to identify coding variants that are associated with rare pediatric …


Wavelet Coherence Analysis With An Application Of Brain Images, Yiqian Fang Aug 2020

Wavelet Coherence Analysis With An Application Of Brain Images, Yiqian Fang

Arts & Sciences Electronic Theses and Dissertations

Wavelet analysis has become an emerging method in a wide range of applications with non-stationary data. In this work, we apply wavelets to tackle the problem of estimating dynamic association in a collection of multivariate non-stationary time series. Coherence is a common metric for linear dependence across signals. However, it assumes static dependence and does not sufficiently model many biological processes with time-evolving dependence structures. We explore continuous wavelet analysis for modeling and estimating such dynamic dependence under the replicated multivariate time series settings. Wavelet transformation provides a decomposition of signals that localizes in both time and frequency domains, hence …


Multi-Omics Integration For Gene Fusion Discovery And Somatic Mutation Haplotyping In Cancer, Steven Mason Foltz May 2020

Multi-Omics Integration For Gene Fusion Discovery And Somatic Mutation Haplotyping In Cancer, Steven Mason Foltz

Arts & Sciences Electronic Theses and Dissertations

Cancer is a disease caused by changes to the genome and dysregulation of gene expression. Among many types of mutations, including point mutations, small insertions and deletions, large scale structural variants, and copy number changes, gene fusions are another category of genomic and transcriptomic alteration that can lead to cancer and which can serve as therapeutic targets. We studied gene fusion events using data from The Cancer Genome Atlas, including over 9,000 patients from 33 cancer types, finding patterns of gene fusion events and dysregulation of gene expression within and across cancer types. With data from the CoMMpass study (Multiple …


Bayesian Posterior Inference And Lan For L̩Vy Models Under High-Frequency Data, Qi Wang May 2020

Bayesian Posterior Inference And Lan For L̩Vy Models Under High-Frequency Data, Qi Wang

Arts & Sciences Electronic Theses and Dissertations

Parameter estimation and inference for L̩vy models under high-frequency data has been an exciting and important task in the field of financial mathematics and has been found practically useful when analyzing real financial data. One feature of L̩vy models is the allowance of jumps to model the abrupt changes sometimes observed in the market. In this thesis, we discuss some problems related to the statistical inference of L̩vy models based on high-frequency data emphasizing on the presence of the jumps. The first problem we consider focuses on the estimation of the volatility, which is critical to measure and control the …


Bayesian Variable Selection And Post-Selection Inference, Qiyiwen Zhang May 2020

Bayesian Variable Selection And Post-Selection Inference, Qiyiwen Zhang

Arts & Sciences Electronic Theses and Dissertations

In this dissertation, we first develop a novel perspective to compare Bayesian variable selection procedures in terms of their selection criteria as well as their finite-sample properties. Secondly, we investigate Bayesian post-selection inference in two types of selection problems: linear regression and population selection. We will demonstrate that both inference problems are susceptible to selection effects since the selection procedure is data-dependent. Before comparing Bayesian variable selection procedures, we first classify the current Bayesian variable selection procedures into two classes: those with selection criteria defined on the space of candidate models, and those with selection criteria not explicitly formulated on …


Variational Inference For Quantile Rgression, Bufei Guo May 2019

Variational Inference For Quantile Rgression, Bufei Guo

Arts & Sciences Electronic Theses and Dissertations

Quantile regression (QR) (Koenker and Bassett, 1978), is an alternative to classic lin- ear regression with extensive applications in many fields. This thesis studies Bayesian quantile regression (Yu and Moyeed, 2001) using variational inference, which is one of the alternative methods to the Markov chain Monte Carlo (MCMC) in approximating intractable posterior distributions. The lasso regularization is shown to be effective in improving the accuracy of quantile regression (Li and Zhu, 2008). This thesis developed variational inference for quantile regression and regularized quantile regression with the lasso penalty. Simulation results show that variational inference is a computationally more efficient alternative …


Essays On Econometrics And Rational Choice, Junnan He May 2019

Essays On Econometrics And Rational Choice, Junnan He

Arts & Sciences Electronic Theses and Dissertations

Decision and choice theory is a topic of interest in both econometrics and microeconomic theory. We contribute to the theory of decision under both contexts, that is, the theory of model selection in econometrics, and the theory of rational decision in microeconomics.

There is a long-lasting theoretical interest in model selection. More recently, research on sparse estimators, a class of estimation methods that select and estimate important parameters simultaneously, has been the central focus on model selection. The methods become especially relevant when the problem is of high-dimensional nature. Theoretically, sparse methods can perform well when the true data generating …


Quantifying Lithochemical Diversity Of Martian Materials Using Hierarchical Clustering And A Similarity Index For Classification, Michael Conner Bouchard May 2019

Quantifying Lithochemical Diversity Of Martian Materials Using Hierarchical Clustering And A Similarity Index For Classification, Michael Conner Bouchard

Arts & Sciences Electronic Theses and Dissertations

We are currently living in the golden age of robotic exploration of Mars, with a continued robotic presence there since 1997. Next to Earth, Mars is the planet about which we have gathered the most geologic information. Unlike Earth, Mars does not appear to have plate tectonics, and the planet’s primary and secondary crust is dominated by basalts. Understanding the compositional diversity of the materials that make up the martian crust will give us a better insight into the geologic processes that formed the planet and its subsequent evolution. One large and growing source of martian surface compositions is the …


Mechanics Of Phenotypic Aging Trajectories In C. Elegans And Humans, William Zhang May 2019

Mechanics Of Phenotypic Aging Trajectories In C. Elegans And Humans, William Zhang

Arts & Sciences Electronic Theses and Dissertations

Overall, my dissertation integrates longitudinal measurements of physiology to investigate the aging process. In the first half, I examine the surprising and largely unexplained degree of variation in lifespan within even homogeneous populations. I sought to understand how physiological aging differs between long- and short-lived individuals within a population of genetically identical C. elegans reared in a homogeneous environment. Using a novel culture apparatus, I longitudinally monitored aspects of aging physiology across a large population of isolated individuals. Aggregating several measures into an overall estimate of senescence, I find that long- and short-lived individuals start adulthood on an equal physiological …


Topics In Complex And Large-Scale Data Analysis, Guanshengrui Hao May 2019

Topics In Complex And Large-Scale Data Analysis, Guanshengrui Hao

Arts & Sciences Electronic Theses and Dissertations

Past few decades have witnessed skyrocketed development of modern technologies. As a result, data collected from modern technologies are evolving towards a direction with more complicated structure and larger scale, driving the traditional data analysis methods to develop and adapt. In this dissertation, we study three statistical issues rising in data with complicated structure and/or in large scale. In Chapter 2, we propose a Bayesian framework via exponential random graph models (ERGM) to estimate the model parameters and network structures for networks with measurement errors; In Chapter 3, we design a novel network sampling algorithm for large-scale networks with community …


Grammar And Variation: Understanding How Cis-Regulatory Information Is Encoded In Mammalian Genomes, Dana Michele King Dec 2018

Grammar And Variation: Understanding How Cis-Regulatory Information Is Encoded In Mammalian Genomes, Dana Michele King

Arts & Sciences Electronic Theses and Dissertations

Understanding how genotype leads to phenotype is key to understand both the development and dysfunction of complex organisms. In the context of regulating the gene expression patterns that contribute to cell identity and function, the goal of my thesis research is to how changes in genome sequence may impact impact gene expression by determining how sequence features contribute to regulatory potential. To accomplish this goal, I first leveraged the key regulatory role of pluripotency transcription factors (TFs) in mouse embryonic stem cells (mESCs) and tested synthetically generated and genomic identified combinations of binding site for four TFs, OCT4, SOX2, KLF4, …


Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An Dec 2018

Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An

Arts & Sciences Electronic Theses and Dissertations

Inspired by classic cocktail-party problem, the basic Independent Component Analysis (ICA) model is created. What differs Independent Component Analysis (ICA) from other kinds of analysis is the intrinsic non-Gaussian assumption of the data. Several approaches are proposed based on maximizing the non-Gaussianity of the data, which is measured by kurtosis, mutual information, and others. With each estimation, we need to optimize the functions of expectations of non-quadratic functions since it can help us to access the higher-order statistics of non-Gaussian part of the data. In this thesis, our goal is to review the one of the most efficient estimation methods, …


Generalized Non-Inferential Approach To Modeling Restricted Discrete Choice For The Case Of The Spatial Random Utility, Elena Labzina Aug 2018

Generalized Non-Inferential Approach To Modeling Restricted Discrete Choice For The Case Of The Spatial Random Utility, Elena Labzina

Arts & Sciences Electronic Theses and Dissertations

Multinomial logistic regression model (MNL) is a powerful and easily tractable way for measuring the probabilistic impact of input variables on individual categorical choices. Crucially, the standard MNL assumes that all subjects of the study have the same choice sets. In the meanwhile, especially in political science and economics, this condition is frequently violated. Probably, the most graphical example of varying choice sets (VCS) is partially contested elections. Furthermore, the MNL implicitly implies the Independence of the Irregular Alternatives (IIA) assumption by requiring i.i.d errors that contrasts the MNL and the multinomial probit (MNP) and mixed logit (MXL) models. In …


Deep Learning Analysis Of Limit Order Book, Xin Xu May 2018

Deep Learning Analysis Of Limit Order Book, Xin Xu

Arts & Sciences Electronic Theses and Dissertations

In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is non-available for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages …


Algorithmic Trading With Prior Information, Xinyi Cai May 2018

Algorithmic Trading With Prior Information, Xinyi Cai

Arts & Sciences Electronic Theses and Dissertations

Traders utilize strategies by using a mix of market and limit orders to generate profits. There are different types of traders in the market, some have prior information and can learn from changes in prices to tweak her trading strategy continuously(Informed Traders), some have no prior information but can learn(Uninformed Learners), and some have no prior information and cannot learn(Uninformed Traders). In this thesis. Alvaro C, Sebastian J and Damir K \cite{AL} proposed a model for algorithmic traders to access the impact of dynamic learning in profit and loss in 2014. The traders can employ the model to decide which …


Variable Selection Via Lasso With High-Dimensional Proteomic Data, Hongxuan Zhai May 2018

Variable Selection Via Lasso With High-Dimensional Proteomic Data, Hongxuan Zhai

Arts & Sciences Electronic Theses and Dissertations

Multiclass classification with high-dimensional data is an applied topic both in statistics and machine learning. The classification procedure could be done in various ways. In this thesis, we review the theory of the Lasso procedure which provides a parameter estimator while simultaneously achieving dimension reduction due to a property of the L1 norm. Lasso with elastic net penalty and sparse group lasso are also reviewed. Our data is high-dimensional proteomic data (iTRAQ ratios) of breast cancer patients with four subtypes of breast cancer. We use the multinomial logistic regression to train our classifier and use the false classification rates obtained …


Distributed Quantile Regression Analysis And A Group Variable Selection Method, Liqun Yu May 2018

Distributed Quantile Regression Analysis And A Group Variable Selection Method, Liqun Yu

Arts & Sciences Electronic Theses and Dissertations

This dissertation develops novel methodologies for distributed quantile regression analysis

for big data by utilizing a distributed optimization algorithm called the alternating direction

method of multipliers (ADMM). Specifically, we first write the penalized quantile regression

into a specific form that can be solved by the ADMM and propose numerical algorithms

for solving the ADMM subproblems. This results in the distributed QR-ADMM

algorithm. Then, to further reduce the computational time, we formulate the penalized

quantile regression into another equivalent ADMM form in which all the subproblems have

exact closed-form solutions and hence avoid iterative numerical methods. This results in the

single-loop …


Nonparametric Estimation Of Time Series Volatility Model Estimation, Teng Tu May 2018

Nonparametric Estimation Of Time Series Volatility Model Estimation, Teng Tu

Arts & Sciences Electronic Theses and Dissertations

In this article we consider two estimation methods of a non-parametric volatility model with autoregressive error of order two. The first estimation method based on the two- lag difference. To get a better result, we consider the second approach based on the general quadratic forms. For illustration, we provided several data sets from different simulation models to support the procedures of both two methods, and prove that the second approach can make a better estimation.


Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang May 2017

Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang

Arts & Sciences Electronic Theses and Dissertations

The unexpected increase in loan default on the mortgage market is widely considered to be one of the main cause behind the economic crisis. To provide some insight on loan delinquency and default, I analyze the mortgage performance data from Fannie Mae website and investigate how economic factors and individual loan and borrower information affect the events of default and prepaid. Various delinquency status including default and prepaid are treated as discrete states of a Markov chain. One-step transition probabilities are estimated via multinomial logistic models. We find that in general current loan-to-value ratio, credit score, unemployment rate, and interest …


On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang May 2017

On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang

Arts & Sciences Electronic Theses and Dissertations

The general goal of this thesis is to investigate and examine some issues about post-selection inference which arises from the setting where statistical inference is carried out after a datadriven model selection step. In this setting, the classical inference theory which requires a fixed priori model becomes invalid since the selected model is a result of random event. Hence, a common practice in applied research which ignores the model selection and builds up confidence interval will result in misleading or even false conclusion. In this thesis, specifically, we first discusses some examples to show how the classical inference theory loses …


Statistical Analysis Of Markovian Queueing Models Of Limit Order Books, Yiyao Luo May 2017

Statistical Analysis Of Markovian Queueing Models Of Limit Order Books, Yiyao Luo

Arts & Sciences Electronic Theses and Dissertations

The objective of this thesis is to investigate the suitability of some Markovian queueing models in being able to effectively describe the dynamical properties of a limit order book more specifically. We review and compare the assumptions proposed by Huang et al.[Quantitative Finance,12,547-557(2012)] and Cont et al.[SIAM Journal for Financial Mathematics,4,1- 25(2013)], and estimate the intensity parameters in both ways, based on real data of a stock on the Nasdaq Stock Market. Trough comparing by cumulative distribution functions of first-passage time to state 0, we will hsow that the estimators of Cont’s model fit our data better and we put …