Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Machine Learning As A Tool For Early Detection: A Focus On Late-Stage Colorectal Cancer Across Socioeconomic Spectrums, Hadiza Galadima, Rexford Anson-Dwamena, Ashley Johnson, Ghalib Bello, Georges Adunlin, James Blando Jan 2024

Machine Learning As A Tool For Early Detection: A Focus On Late-Stage Colorectal Cancer Across Socioeconomic Spectrums, Hadiza Galadima, Rexford Anson-Dwamena, Ashley Johnson, Ghalib Bello, Georges Adunlin, James Blando

Community & Environmental Health Faculty Publications

Purpose: To assess the efficacy of various machine learning (ML) algorithms in predicting late-stage colorectal cancer (CRC) diagnoses against the backdrop of socio-economic and regional healthcare disparities. Methods: An innovative theoretical framework was developed to integrate individual- and census tract-level social determinants of health (SDOH) with sociodemographic factors. A comparative analysis of the ML models was conducted using key performance metrics such as AUC-ROC to evaluate their predictive accuracy. Spatio-temporal analysis was used to identify disparities in late-stage CRC diagnosis probabilities. Results: Gradient boosting emerged as the superior model, with the top predictors for late-stage CRC diagnosis being anatomic site, …


Awegnn: Auto-Parametrized Weighted Element-Specific Graph Neural Networks For Molecules., Timothy Szocinski, Duc Duy Nguyen, Guo-Wei Wei Jul 2021

Awegnn: Auto-Parametrized Weighted Element-Specific Graph Neural Networks For Molecules., Timothy Szocinski, Duc Duy Nguyen, Guo-Wei Wei

Mathematics Faculty Publications

While automated feature extraction has had tremendous success in many deep learning algorithms for image analysis and natural language processing, it does not work well for data involving complex internal structures, such as molecules. Data representations via advanced mathematics, including algebraic topology, differential geometry, and graph theory, have demonstrated superiority in a variety of biomolecular applications, however, their performance is often dependent on manual parametrization. This work introduces the auto-parametrized weighted element-specific graph neural network, dubbed AweGNN, to overcome the obstacle of this tedious parametrization process while also being a suitable technique for automated feature extraction on these internally complex …


Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Quantifying Climate Sensitivity And Climate-Driven Change In North American Amphibian Communities, David A. W. Miller, Evan H Campbell Grant, Erin Muths, Staci M. Amburgey, Michael J. Adams, Maxwell B. Joseph, J. Hardin Waddle, Pieter T. J. Johnson, Maureen E. Ryan, Benedikt R. Schmidt, Daniel L. Calhoun, Courtney L. Davis, Robert N. Fisher, David M. Green, Blake R. Hossack, Tracy A. G. Rittenhouse, Susan C. Walls, Larissa L. Bailey, Sam S. Cruickshank, Gary M. Fellers, Thomas A. Gorman, Carola A. Haas, Ward Hughson, David S. Pilliod, Steve J. Price, Andrew M. Ray, Walt Sadinski, Daniel Saenz, William J. Barichivich, Adrianne Brand Sep 2018

Quantifying Climate Sensitivity And Climate-Driven Change In North American Amphibian Communities, David A. W. Miller, Evan H Campbell Grant, Erin Muths, Staci M. Amburgey, Michael J. Adams, Maxwell B. Joseph, J. Hardin Waddle, Pieter T. J. Johnson, Maureen E. Ryan, Benedikt R. Schmidt, Daniel L. Calhoun, Courtney L. Davis, Robert N. Fisher, David M. Green, Blake R. Hossack, Tracy A. G. Rittenhouse, Susan C. Walls, Larissa L. Bailey, Sam S. Cruickshank, Gary M. Fellers, Thomas A. Gorman, Carola A. Haas, Ward Hughson, David S. Pilliod, Steve J. Price, Andrew M. Ray, Walt Sadinski, Daniel Saenz, William J. Barichivich, Adrianne Brand

Forestry and Natural Resources Faculty Publications

Changing climate will impact species’ ranges only when environmental variability directly impacts the demography of local populations. However, measurement of demographic responses to climate change has largely been limited to single species and locations. Here we show that amphibian communities are responsive to climatic variability, using > 500,000 time-series observations for 81 species across 86 North American study areas. The effect of climate on local colonization and persistence probabilities varies among eco-regions and depends on local climate, species life-histories, and taxonomic classification. We found that local species richness is most sensitive to changes in water availability during breeding and changes in …


Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang Feb 2018

Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang

Computer Science Faculty Publications

Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.

Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor …


Detecting And Accounting For Multiple Sources Of Positional Variance In Peak List Registration Analysis And Spin System Grouping, Andrey Smelter, Eric C. Rouchka, Hunter N. B. Moseley Aug 2017

Detecting And Accounting For Multiple Sources Of Positional Variance In Peak List Registration Analysis And Spin System Grouping, Andrey Smelter, Eric C. Rouchka, Hunter N. B. Moseley

Molecular and Cellular Biochemistry Faculty Publications

Peak lists derived from nuclear magnetic resonance (NMR) spectra are commonly used as input data for a variety of computer assisted and automated analyses. These include automated protein resonance assignment and protein structure calculation software tools. Prior to these analyses, peak lists must be aligned to each other and sets of related peaks must be grouped based on common chemical shift dimensions. Even when programs can perform peak grouping, they require the user to provide uniform match tolerances or use default values. However, peak grouping is further complicated by multiple sources of variance in peak position limiting the effectiveness of …


A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali Jul 2016

A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing (HPC) resources are housed in large datacenters, which consume exorbitant amounts of energy and are quickly demanding attention from businesses as they result in high operating costs. On the other hand HPC environments have been very useful to researchers in many emerging areas in life sciences such as Bioinformatics and Medical Informatics. In an earlier work, we introduced a dynamic model for energy aware scheduling (EAS) in a HPC environment; the model is domain agnostic and incorporates both the deadline parameter as well as energy parameters for computationally intensive applications. Our proposed EAS model incorporates 2-phases. In …


Trip: Tracking Rhythms In Plants, An Automated Leaf Movement Analysis Program For Circadian Period Estimation, Kathleen Greenham, Ping Lou, Sara E. Remsen, Hany Farid, C Robertson Mcclung May 2015

Trip: Tracking Rhythms In Plants, An Automated Leaf Movement Analysis Program For Circadian Period Estimation, Kathleen Greenham, Ping Lou, Sara E. Remsen, Hany Farid, C Robertson Mcclung

Dartmouth Scholarship

Background: A well characterized output of the circadian clock in plants is the daily rhythmic movement of leaves. This process has been used extensively in Arabidopsis to estimate circadian period in natural accessions as well as mutants with known defects in circadian clock function. Current methods for estimating circadian period by leaf movement involve manual steps throughout the analysis and are often limited to analyzing one leaf or cotyledon at a time.

Methods: In this study, we describe the development of TRiP (Tracking Rhythms in Plants), a new method for estimating circadian period using a motion estimation algorithm that can …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth Jan 2015

Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present …


Methane Retrievals From Greenhouse Gases Observing Satellite (Gosat) Shortwave Infrared Measurements: Performance Comparison Of Proxy And Physics Retrieval Algorithms, D Schepers, S Guerlet, A Butz, J Landgraf, C Frankenberg, O Hasekamp, J-F Blavier, N M. Deutscher, D W. T Griffith, F Hase, E Kyro, I Morino, V Sherlock, R Sussmann, I Aben Jan 2012

Methane Retrievals From Greenhouse Gases Observing Satellite (Gosat) Shortwave Infrared Measurements: Performance Comparison Of Proxy And Physics Retrieval Algorithms, D Schepers, S Guerlet, A Butz, J Landgraf, C Frankenberg, O Hasekamp, J-F Blavier, N M. Deutscher, D W. T Griffith, F Hase, E Kyro, I Morino, V Sherlock, R Sussmann, I Aben

Faculty of Science - Papers (Archive)

We compare two conceptually different methods for determining methane column-averaged mixing ratios (XCH4) from Greenhouse Gases Observing Satellite (GOSAT) shortwave infrared (SWIR) measurements. These methods account differently for light scattering by aerosol and cirrus. The proxy method retrieves a CO2 column which, in conjunction with prior knowledge on CO2 acts as a proxy for scattering effects. The physics-based method accounts for scattering by retrieving three effective parameters of a scattering layer. Both retrievals are validated on a 19-month data set using ground-based XCH4 measurements at 12 stations of the Total Carbon Column Observing Network (TCCON), …


Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi Dec 2011

Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi

Computer Science and Engineering Faculty Publications

Measuring inconsistency in knowledge bases has been recognized as an important problem in several research areas. Many methods have been proposed to solve this problem and a main class of them is based on some kind of paraconsistent semantics. However, existing methods suffer from two limitations: (i) they are mostly restricted to propositional knowledge bases; (ii) very few of them discuss computational aspects of computing inconsistency measures. In this article, we try to solve these two limitations by exploring algorithms for computing an inconsistency measure of first-order knowledge bases. After introducing a four-valued semantics for first-order logic, we define an …


Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong Jan 2011

Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong

Computer Science Faculty Publications

Background: One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing extensive task. In addition, the advancement in DNA/RNA and Protein sequencing techniques has created a vast amount of sequences to be analyzed that exceeding the capability of traditional computing models. Therefore, an effective parallel multiple sequence alignment model capable of resolving these issues is in a great demand.

Results: We design O(1) run-time solutions …


A Comparison Of The Functional Modules Identified From Time Course And Static Ppi Network Data, Xiwei Tang, Jianxin Wang, Binbin Liu, Min Li, Gang Chen, Yi Pan Jan 2011

A Comparison Of The Functional Modules Identified From Time Course And Static Ppi Network Data, Xiwei Tang, Jianxin Wang, Binbin Liu, Min Li, Gang Chen, Yi Pan

Computer Science Faculty Publications

Background: Cellular systems are highly dynamic and responsive to cues from the environment. Cellular function and response patterns to external stimuli are regulated by biological networks. A protein-protein interaction (PPI) network with static connectivity is dynamic in the sense that the nodes implement so-called functional activities that evolve in time. The shift from static to dynamic network analysis is essential for further understanding of molecular systems.

Results: In this paper, Time Course Protein Interaction Networks (TC- PINs) are reconstructed by incorporating time series gene expression into PPI networks. Then, a clustering algorithm is used to create functional modules from three …


Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller Apr 2010

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller

Life Sciences Faculty Research

Background

Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.

Methodology/Principal Findings

Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have …


Identifying Protein Complexes From Interaction Networks Based On Clique Percolation And Distance Restriction, Jianxin Wang, Binbin Liu, Min Li, Yi Pan Jan 2010

Identifying Protein Complexes From Interaction Networks Based On Clique Percolation And Distance Restriction, Jianxin Wang, Binbin Liu, Min Li, Yi Pan

Computer Science Faculty Publications

Background: Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks.Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic.

Result: As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie Jan 2006

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.


A Novel Approach To Phylogenetic Tree Construction Using Stochastic Optimization And Clustering, Ling Qin, Yixin Chen, Yi Pan, Ling Chen Jan 2006

A Novel Approach To Phylogenetic Tree Construction Using Stochastic Optimization And Clustering, Ling Qin, Yixin Chen, Yi Pan, Ling Chen

Computer Science Faculty Publications

Background: The problem of inferring the evolutionary history and constructing the phylogenetic tree with high performance has become one of the major problems in computational biology.

Results: A new phylogenetic tree construction method from a given set of objects (proteins, species, etc.) is presented. As an extension of ant colony optimization, this method proposes an adaptive phylogenetic clustering algorithm based on a digraph to find a tree structure that defines the ancestral relationships among the given objects.

Conclusion: Our phylogenetic tree construction method is tested to compare its results with that of the genetic algorithm (GA). Experimental results show that …


Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota Nov 2005

Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota

Dartmouth Scholarship

The responses to interleukin 1 (IL-1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD …


A Subgroup Algorithm To Identify Cross-Rotation Peaks Consistent With Non-Crystallographic Symmetry, Ryan H. Lilien, Chris Bailey-Kellogg, Amy C. Anderson, Bruce R. Donald Mar 2004

A Subgroup Algorithm To Identify Cross-Rotation Peaks Consistent With Non-Crystallographic Symmetry, Ryan H. Lilien, Chris Bailey-Kellogg, Amy C. Anderson, Bruce R. Donald

Dartmouth Scholarship

Molecular replacement (MR) often plays a prominent role in determining initial phase angles for structure determination by X-ray crystallography. In this paper, an efficient quaternion-based algorithm is presented for analyzing peaks from a cross-rotation function in order to identify model orientations consistent with proper non-crystallographic symmetry (NCS) and to generate proper NCS-consistent orientations missing from the list of cross-rotation peaks. The algorithm, CRANS, analyzes the rotation differences between each pair of cross-rotation peaks to identify finite subgroups. Sets of rotation differences satisfying the subgroup axioms correspond to orientations compatible with the correct proper NCS. The CRANS algorithm was first …