Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Entire DC Network

Wavelet-Based Functional Mixed Models To Characterize Population Heterogeneity In Accelerometer Profiles: A Case Study. , Jeffrey S. Morris, Cassandra Arroyo, Brent A. Coull, Louise M. Ryan, Steven L. Gortmaker Dec 2006

Wavelet-Based Functional Mixed Models To Characterize Population Heterogeneity In Accelerometer Profiles: A Case Study. , Jeffrey S. Morris, Cassandra Arroyo, Brent A. Coull, Louise M. Ryan, Steven L. Gortmaker

Jeffrey S. Morris

We present a case study illustrating the challenges of analyzing accelerometer data taken from a sample of children participating in an intervention study designed to increase physical activity. An accelerometer is a small device worn on the hip that records the minute-by-minute activity levels of the child throughout the day for each day it is worn. The resulting data are irregular functions characterized by many peaks representing short bursts of intense activity. We model these data using the wavelet-based functional mixed model. This approach incorporates multiple fixed effects and random effect functions of arbitrary form, the estimates of which are …


Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang Dec 2006

Alternative Probeset Definitions For Combining Microarray Data Across Studies Using Different Versions Of Affymetrix Oligonucleotide Arrays, Jeffrey S. Morris, Chunlei Wu, Kevin R. Coombes, Keith A. Baggerly, Jing Wang, Li Zhang

Jeffrey S. Morris

Many published microarray studies have small to moderate sample sizes, and thus have low statistical power to detect significant relationships between gene expression levels and outcomes of interest. By pooling data across multiple studies, however, we can gain power, enabling us to detect new relationships. This type of pooling is complicated by the fact that gene expression measurements from different microarray platforms are not directly comparable. In this chapter, we discuss two methods for combining information across different versions of Affymetrix oligonucleotide arrays. Each involves a new approach for combining probes on the array into probesets. The first approach involves …


An Econometric Method Of Correcting For Unit Nonresponse Bias In Surveys, Martin Ravallion, Anton Korinek, Johan Mistiaen Dec 2006

An Econometric Method Of Correcting For Unit Nonresponse Bias In Surveys, Martin Ravallion, Anton Korinek, Johan Mistiaen

Martin Ravallion

Past approaches to correcting for unit nonresponse in sample surveys by re-weighting the data assume that the problem is ignorable within arbitrary subgroups of the population. Theory and evidence suggest that this assumption is unlikely to hold, and that household characteristics such as income systematically affect survey compliance. We show that this leaves a bias in the re-weighted data and we propose a method of correcting for this bias. The geographic structure of nonresponse rates allows us to identify a micro compliance function, which is then used to re-weight the unit-record data. An example is given for the US Current …


Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan Dec 2006

Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan

Maya Petersen

This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus …


Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan Dec 2006

Identifying Important Explanatory Variables For Time-Varying Outcomes., Oliver Bembom, Maya L. Petersen, Mark J. Van Der Laan

Oliver Bembom

This chapter describes a systematic and targeted approach for estimating the impact of each of a large number of baseline covariates on an outcome that is measured repeatedly over time. These variable importance estimates can be adjusted for a user-specified set of confounders and lend themselves in a straightforward way to obtaining confidence intervals and p-values. Hence, they can in particular be used to identify a subset of baseline covariates that are the most important explanatory variables for the time-varying outcome of interest. We illustrate the methodology in a data analysis aimed at finding mutations of the human immunodeficiency virus …


Prepms: Tof Ms Data Graphical Preprocessing Tool, Yuliya V. Karpievitch, Elizabeth G. Hill, Adam J. Smolka, Jeffrey S. Morris, Kevin R. Coombes, Keith A. Baggerly, Jonas S. Almeida Nov 2006

Prepms: Tof Ms Data Graphical Preprocessing Tool, Yuliya V. Karpievitch, Elizabeth G. Hill, Adam J. Smolka, Jeffrey S. Morris, Kevin R. Coombes, Keith A. Baggerly, Jonas S. Almeida

Jeffrey S. Morris

We introduce a simple-to-use graphical tool that enables researchers to easily prepare time-of-flight mass spectrometry data for analysis. For ease of use, the graphical executable provides default parameter settings experimentally determined to work well in most situations. These values can be changed by the user if desired. PrepMS is a stand-alone application made freely available (open source), and is under the General Public License (GPL). Its graphical user interface, default parameter settings, and display plots allow PrepMS to be used effectively for data preprocessing, peak detection, and visual data quality assessment.


Wavelet-Based Functional Mixed Model Analysis: Computational Considerations, Richard C. Herrick, Jeffrey S. Morris Aug 2006

Wavelet-Based Functional Mixed Model Analysis: Computational Considerations, Richard C. Herrick, Jeffrey S. Morris

Jeffrey S. Morris

Wavelet-based Functional Mixed Models is a new Bayesian method extending mixed models to irregular functional data (Morris and Carroll, JRSS-B, 2006). These data sets are typically very large and can quickly run into memory and time constraints unless these issues are carefully dealt with in the software. We reduce runtime by 1.) identifying and optimizing hotspots, 2.) using wavelet compression to do less computation with minimal impact on results, and 3.) dividing the code into multiple executables to be run in parallel using a grid computing resource. We discuss rules of thumb for estimating memory requirements and computation times in …


Bayesian Sample Size Calculations In Phase Ii Clinical Trials Using A Mixture Of Informative Priors., Byron J. Gajewski, Matthew S. Mayo Aug 2006

Bayesian Sample Size Calculations In Phase Ii Clinical Trials Using A Mixture Of Informative Priors., Byron J. Gajewski, Matthew S. Mayo

Byron J Gajewski

A number of researchers have discussed phase II clinical trials from a Bayesian perspective. A recent article by Mayo and Gajewski focuses on sample size calculations, which they determine by specifying an informative prior distribution and then calculating a posterior probability that the true response will exceed a prespecified target. In this article, we extend these sample size calculations to include a mixture of informative prior distributions. The mixture comes from several sources of information. For example consider information from two (or more) clinicians. The first clinician is pessimistic about the drug and the second clinician is optimistic. We tabulate …


Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris Jun 2006

Some Statistical Issues In Microarray Gene Expression Data, Matthew S. Mayo, Byron J. Gajewski, Jeffrey S. Morris

Jeffrey S. Morris

In this paper we discuss some of the statistical issues that should be considered when conducting experiments involving microarray gene expression data. We discuss statistical issues related to preprocessing the data as well as the analysis of the data. Analysis of the data is discussed in three contexts: class comparison, class prediction and class discovery. We also review the methods used in two studies that are using microarray gene expression to assess the effect of exposure to radiofrequency (RF) fields on gene expression. Our intent is to provide a guide for radiation researchers when conducting studies involving microarray gene expression …


A Review Of Limdep 9.0 And Nlogit 4.0, Joseph Hilbe May 2006

A Review Of Limdep 9.0 And Nlogit 4.0, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


Mathematica 5.2: A Review, Joseph Hilbe May 2006

Mathematica 5.2: A Review, Joseph Hilbe

Joseph M Hilbe

No abstract provided.


Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Raymond J. Carroll Apr 2006

Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Raymond J. Carroll

Jeffrey S. Morris

Increasingly, Increasingly, scientific studies yield functional data, in which the ideal units of observation are curves and the observed data consist of sets of curves that are sampled on a fine grid. We present new methodology that generalizes the linear mixed model to the functional mixed model framework, with model fitting done by using a Bayesian wavelet-based approach. This method is flexible, allowing functions of arbitrary formand the full range of fixed effects structures and between-curve covariance structures that are available in the mixed model framework. It yields nonparametric estimates of the fixed and random-effects functions as well as the …


Synchrony Of Change In Depressive Symptoms, Health Status, And Quality Of Life In Persons With Clinical Depression, Paula Diehr Apr 2006

Synchrony Of Change In Depressive Symptoms, Health Status, And Quality Of Life In Persons With Clinical Depression, Paula Diehr

Paula Diehr

BACKGROUND: Little is known about longitudinal associations among measures of depression, mental and physical health, and quality of life (QOL). We followed 982 clinically depressed persons to determine which measures changed and whether the change was synchronous with change in depressive symptoms. METHODS: Data were from the Longitudinal Investigation of Depression Outcomes (LIDO). Depressive symptoms, physical and mental health, and quality of life were measured at baseline, 6 weeks, 3 months, and 9 months. Change in the measures was examined over time and for persons with different levels of change in depressive symptoms. RESULTS: On average, all of the measures …


Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Shrinkage Estimation For Sage Data Using A Mixture Dirichlet Prior, Jeffrey S. Morris, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA species, and can be modeled using a multinomial distribution with two characteristics: skewness in the distribution of relative abundances and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample will fail …


An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris Mar 2006

An Introduction To High-Throughput Bioinformatics Data, Keith A. Baggerly, Kevin R. Coombes, Jeffrey S. Morris

Jeffrey S. Morris

High throughput biological assays supply thousands of measurements per sample, and the sheer amount of related data increases the need for better models to enhance inference. Such models, however, are more effective if they take into account the idiosyncracies associated with the specific methods of measurement: where the numbers come from. We illustrate this point by describing three different measurement platforms: microarrays, serial analysis of gene expression (SAGE), and proteomic mass spectrometry.


Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris Mar 2006

Bayesian Mixture Models For Gene Expression And Protein Profiles, Michele Guindani, Kim-Anh Do, Peter Mueller, Jeffrey S. Morris

Jeffrey S. Morris

We review the use of semi-parametric mixture models for Bayesian inference in high throughput genomic data. We discuss three specific approaches for microarray data, for protein mass spectrometry experiments, and for SAGE data. For the microarray data and the protein mass spectrometry we assume group comparison experiments, i.e., experiments that seek to identify genes and proteins that are differentially expressed across two biologic conditions of interest. For the SAGE data example we consider inference for a single biologic sample.


Analysis Of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Philip J. Brown, Keith A. Baggerly, Kevin R. Coombes Mar 2006

Analysis Of Mass Spectrometry Data Using Bayesian Wavelet-Based Functional Mixed Models, Jeffrey S. Morris, Philip J. Brown, Keith A. Baggerly, Kevin R. Coombes

Jeffrey S. Morris

In this chapter, we demonstrate how to analyze MALDI-TOF/SELDITOF mass spectrometry data using the wavelet-based functional mixed model introduced by Morris and Carroll (2006), which generalizes the linear mixed models to the case of functional data. This approach models each spectrum as a function, and is very general, accommodating a broad class of experimental designs and allowing one to model nonparametric functional effects for various factors, which can be conditions of interest (e.g. cancer/normal) or experimental factors (blocking factors). Inference on these functional effects allows us to identify protein peaks related to various outcomes of interest, including dichotomous outcomes, categorical …


The "Duty" To Be A Rational Shareholder, David A. Hoffman Feb 2006

The "Duty" To Be A Rational Shareholder, David A. Hoffman

David A Hoffman

How and when do courts determine that corporate disclosures are actionable under the federal securities laws? The applicable standard is materiality: would a (mythical) reasonable investor have considered a given disclosure important. As I establish through empirical and statistical testing of approximately 500 cases analyzing the materiality standard, judicial findings of immateriality are remarkably common, and have been stable over time. Materiality's scope results in the dismissal of a large number of claims, and creates a set of cases in which courts attempt to explain and defend their vision of who is, and is not, a reasonable investor. Thus, materiality …


Investigating Omitted Variable Bias In Regression Parameter Estimation: A Genetic Algorithm Approach, Lonnie K. Stevans, David N. Sessions Jan 2006

Investigating Omitted Variable Bias In Regression Parameter Estimation: A Genetic Algorithm Approach, Lonnie K. Stevans, David N. Sessions

Lonnie K. Stevans

Bias in regression estimates resulting from the omission of a correlated relevant variable is a well known phenomenon. In this study, we apply a genetic algorithm to estimate the missing variable and, using that estimated variable, demonstrate that significant bias in regression estimates can be substantially corrected with relatively high confidence in effective models. Our interest is restricted to the case of a missing binary indicator variable and the analytical properties of bias and MSE dominance of the resulting dependent error generated vector process. These findings are compared to prior results for the independent error proxy process. Simulations are run …


Spatial-Temporal Data Mining Procedure: Lasr, Xiao-Feng Wang, Jiayang Sun, Kath Bogie Jan 2006

Spatial-Temporal Data Mining Procedure: Lasr, Xiao-Feng Wang, Jiayang Sun, Kath Bogie

Xiaofeng Wang

This paper is concerned with the statistical development of our spatial-temporal data mining procedure, LASR (pronounced "laser"). LASR is the abbreviation for Longitudinal Analysis with Self-Registration of largep-small-n data. It was motivated by a study of "Neuromuscular Electrical Stimulation" experiments, where the data are noisy and heterogeneous, might not align from one session to another, and involve a large number of multiple comparisons. The three main components of LASR are: (1) data segmentation for separating heterogeneous data and for distinguishing outliers, (2) automatic approaches for spatial and temporal data registration, and (3) statistical smoothing mapping for identifying "activated" regions based …


Non-Normal Path Analysis In The Presence Of Measurement Error And Missing Data: A Bayesian Analysis Of Nursing Homes' Structure And Outcomes, Byron J. Gajewski, Robert Lee, Sarah Thomspn, Dunton Nancy, Annette Becker, Valorie Coffland Jan 2006

Non-Normal Path Analysis In The Presence Of Measurement Error And Missing Data: A Bayesian Analysis Of Nursing Homes' Structure And Outcomes, Byron J. Gajewski, Robert Lee, Sarah Thomspn, Dunton Nancy, Annette Becker, Valorie Coffland

Byron J Gajewski

Path analytic models are useful tools in quantitative nursing research. They allow researchers to hypothesize causal inferential paths and test the significance of these paths both directly and indirectly through a mediating variable. A standard statistical method in the path analysis literature is to treat the variables as having a normal distribution and to estimate paths using several least squares regression equations. The parameters corresponding to the direct paths have point and interval estimates based on normal distribution theory. Indirect paths are a product of the direct path from the independent variable to the mediating variable and the direct path …


Inter-Rater Reliability Of Nursing Home Surveys: A Bayesian Latent Class Approach, Byron J. Gajewski, Sarah Thompson, Nancy Dunton, Annette Becker, Marcia Wrona Jan 2006

Inter-Rater Reliability Of Nursing Home Surveys: A Bayesian Latent Class Approach, Byron J. Gajewski, Sarah Thompson, Nancy Dunton, Annette Becker, Marcia Wrona

Byron J Gajewski

In the U.S., federal and state governments perform routine inspections of nursing homes. Results of the inspections allow government to generate nes for ndings of non-compliance as well as allow consumers to rank facilities. The purpose of this study is to investigate the inter-rater reliability of the nursing home survey process. In general, the survey data involves 191 binary deciency variables interpreted as ‘decient’ or ‘non-decient’. To reduce the dimensionality of the problem, our proposed method involves two steps. First, we reduce the deciency categories to sub-categories using previous nursing home studies. Second, looking at the State of Kansas specically, …


On The Robustness Of Robustness Checks Of The Environmental Kuznets Curve, Marzio Galeotti, Matteo Manera, Alessandro Lanza Jan 2006

On The Robustness Of Robustness Checks Of The Environmental Kuznets Curve, Marzio Galeotti, Matteo Manera, Alessandro Lanza

Matteo Manera

Since its first inception in the debate on the relationship between environment and growth in 1992, the Environmental Kuznets Curve has been subject of continuous and intense scrutiny. The literature can be roughly divided in two historical phases. Initially, after the seminal contributions, additional work aimed to extend the investigation to new pollutants and to verify the existence of an inverted-U shape as well as assessing the value of the turning point. The following phase focused instead on the robustness of the empirical relationship, particularly with respect to the omission of relevant explanatory variables other than GDP, alternative datasets, functional …


The Asymmetric Effects Of Oil Shocks On Output Growth: A Markov-Switching Analysis For The G-7 Countries, Alessandro Cologni, Matteo Manera Jan 2006

The Asymmetric Effects Of Oil Shocks On Output Growth: A Markov-Switching Analysis For The G-7 Countries, Alessandro Cologni, Matteo Manera

Matteo Manera

In this paper we specify and estimate different Markov-switching (MS) regime autoregressive models. The empirical performance of the univariate MS models used to describe the switches between different economic regimes for the G-7 countries is in general not satisfactory. We extend these models to verify if the inclusion of asymmetric oil shocks as an exogenous variable improves the ability of each specification to identify the different phases of the business cycle for each country under scrutiny. Following the wide literature on this topic, we have considered six different definitions of oil shocks: oil price changes, asymmetric transformations of oil price …