Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Bias assessment (1)
- Bias reduction (1)
- Binding Sites (1)
- Cluster detection (1)
- Copy number variation (1)
-
- Deterministic design matrix (1)
- Double-smoothing (1)
- High-dimensional statistics (1)
- Hypothesis testing (1)
- Influence measure (1)
- Information Theory (1)
- Local Polynomial (1)
- Machine Learning (1)
- Microarray probe design sampling (1)
- Missing data (1)
- Mouse Diversity Genotyping Array (1)
- Mutation (1)
- Mutation shower (1)
- Point processes (1)
- Position-Specific Scoring Matrices (1)
- Post-selection inference. (1)
- Single nucleotide polymorphism (1)
- Spatial statistics (1)
- Transcription Factors (1)
- Variable screening (1)
Articles 1 - 4 of 4
Full-Text Articles in Physical Sciences and Mathematics
Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma
Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma
Electronic Thesis and Dissertation Repository
When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a …
Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo
Electronic Thesis and Dissertation Repository
Mutations are alterations of the DNA nucleotide sequence of the genome. Analyses of spatial properties of mutations are critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. The studies in this thesis focus on two types of mutations: point mutations, i.e., single nucleotide polymorphism (SNP) genotype differences, and mutations in segments, i.e., copy number variations (CNVs). The microarray platform, such as the Mouse Diversity Genotyping Array (MDGA), detects these mutations genome-wide with lower cost compared to whole genome sequencing, and thus is considered for suitability as a screening tool for large populations. Yet it provides observation …
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Electronic Thesis and Dissertation Repository
In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.
Two methods …
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Electronic Thesis and Dissertation Repository
ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.
This thesis presents a novel …