Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Physical Sciences and Mathematics

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


The Accuracy, Fairness, And Limits Of Predicting Recidivism, Julie Dressel, Hany Farid Jan 2018

The Accuracy, Fairness, And Limits Of Predicting Recidivism, Julie Dressel, Hany Farid

Dartmouth Scholarship

Algorithms for predicting recidivism are commonly used to assess a criminal defendant’s likelihood of committing a crime. These predictions are used in pretrial, parole, and sentencing decisions. Proponents of these systems argue that big data and advanced machine learning make these analyses more accurate and less biased than humans. We show, however, that the widely used commercial risk assessment software COMPAS is no more accurate or fair than predictions made by people with little or no criminal justice expertise. We further show that a simple linear predictor provided with only two features is nearly equivalent to COMPAS with its 137 …


Development And Validation Of An Epitope Prediction Tool For Swine (Pigmatrix) Based On The Pocket Profile Method, Andres H. Gutiérrez, William D. Martin, Chris Bailey-Kellogg, Frances Terry, Leonard Moise, Anee S. De Groot Sep 2015

Development And Validation Of An Epitope Prediction Tool For Swine (Pigmatrix) Based On The Pocket Profile Method, Andres H. Gutiérrez, William D. Martin, Chris Bailey-Kellogg, Frances Terry, Leonard Moise, Anee S. De Groot

Dartmouth Scholarship

Background: T cell epitope prediction tools and associated vaccine design algorithms have accelerated the development of vaccines for humans. Predictive tools for swine and other food animals are not as well developed, primarily because the data required to develop the tools are lacking. Here, we overcome a lack of T cell epitope data to construct swine epitope predictors by systematically leveraging available human information. Applying the “pocket profile method ”, we use sequence and structural similarities in the binding pockets of human and swine major histocompatibility complex proteins to infer Swine Leukocyte Antigen (SLA) peptide binding preferences. We developed epitope-prediction …


Trip: Tracking Rhythms In Plants, An Automated Leaf Movement Analysis Program For Circadian Period Estimation, Kathleen Greenham, Ping Lou, Sara E. Remsen, Hany Farid, C Robertson Mcclung May 2015

Trip: Tracking Rhythms In Plants, An Automated Leaf Movement Analysis Program For Circadian Period Estimation, Kathleen Greenham, Ping Lou, Sara E. Remsen, Hany Farid, C Robertson Mcclung

Dartmouth Scholarship

Background: A well characterized output of the circadian clock in plants is the daily rhythmic movement of leaves. This process has been used extensively in Arabidopsis to estimate circadian period in natural accessions as well as mutants with known defects in circadian clock function. Current methods for estimating circadian period by leaf movement involve manual steps throughout the analysis and are often limited to analyzing one leaf or cotyledon at a time.

Methods: In this study, we describe the development of TRiP (Tracking Rhythms in Plants), a new method for estimating circadian period using a motion estimation algorithm that can …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


Orienteering In Knowledge Spaces: The Hyperbolic Geometry Of Wikipedia Mathematics, Gregory Leibon, Daniel N. Rockmore Jul 2013

Orienteering In Knowledge Spaces: The Hyperbolic Geometry Of Wikipedia Mathematics, Gregory Leibon, Daniel N. Rockmore

Dartmouth Scholarship

In this paper we show how the coupling of the notion of a network with directions with the adaptation of the four-point probe from materials testing gives rise to a natural geometry on such networks. This four-point probe geometry shares many of the properties of hyperbolic geometry wherein the network directions take the place of the sphere at infinity, enabling a navigation of the network in terms of pairs of directions: the geodesic through a pair of points is oriented from one direction to another direction, the pair of which are uniquely determined. We illustrate this in the interesting example …


Measures Of Centrality Based On The Spectrum Of The Laplacian, Scott D. Pauls, Daniel Remondini Dec 2012

Measures Of Centrality Based On The Spectrum Of The Laplacian, Scott D. Pauls, Daniel Remondini

Dartmouth Scholarship

We introduce a family of new centralities, the k-spectral centralities. k-Spectral centrality is a measurement of importance with respect to the deformation of the graph Laplacian associated with the graph. Due to this connection, k-spectral centralities have various interpretations in terms of spectrally determined information.

We explore this centrality in the context of several examples. While for sparse unweighted net- works 1-spectral centrality behaves similarly to other standard centralities, for dense weighted net- works they show different properties. In summary, the k-spectral centralities provide a novel and useful measurement of relevance (for single network elements as well as whole subnetworks) …


Information-Preserving Structures: A General Framework For Quantum Zero-Error Information, Robin Blume-Kohout, Hui Khoon Ng, David Poulin, Lorenza Viola Dec 2010

Information-Preserving Structures: A General Framework For Quantum Zero-Error Information, Robin Blume-Kohout, Hui Khoon Ng, David Poulin, Lorenza Viola

Dartmouth Scholarship

Quantum systems carry information. Quantum theory supports at least two distinct kinds of information (classical and quantum), and a variety of different ways to encode and preserve information in physical systems. A system’s ability to carry information is constrained and defined by the noise in its dynamics. This paper introduces an operational framework, using information-preserving structures, to classify all the kinds of information that can be perfectly (i.e., with zero error) preserved by quantum dynamics. We prove that every perfectly preserved code has the same structure as a matrix algebra, and that preserved information can always be corrected. We …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie Jan 2006

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.


Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota Nov 2005

Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota

Dartmouth Scholarship

The responses to interleukin 1 (IL-1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD …


A Subgroup Algorithm To Identify Cross-Rotation Peaks Consistent With Non-Crystallographic Symmetry, Ryan H. Lilien, Chris Bailey-Kellogg, Amy C. Anderson, Bruce R. Donald Mar 2004

A Subgroup Algorithm To Identify Cross-Rotation Peaks Consistent With Non-Crystallographic Symmetry, Ryan H. Lilien, Chris Bailey-Kellogg, Amy C. Anderson, Bruce R. Donald

Dartmouth Scholarship

Molecular replacement (MR) often plays a prominent role in determining initial phase angles for structure determination by X-ray crystallography. In this paper, an efficient quaternion-based algorithm is presented for analyzing peaks from a cross-rotation function in order to identify model orientations consistent with proper non-crystallographic symmetry (NCS) and to generate proper NCS-consistent orientations missing from the list of cross-rotation peaks. The algorithm, CRANS, analyzes the rotation differences between each pair of cross-rotation peaks to identify finite subgroups. Sets of rotation differences satisfying the subgroup axioms correspond to orientations compatible with the correct proper NCS. The CRANS algorithm was first …


Low-Degree Spanning Trees Of Small Weight, Samir Khuller, Balaji Raghavachari, Neal Young Apr 1996

Low-Degree Spanning Trees Of Small Weight, Samir Khuller, Balaji Raghavachari, Neal Young

Dartmouth Scholarship

Given n points in the plane, the degree-K spanning-tree problem asks for a spanning tree of minimum weight in which the degree of each vertex is at most K. This paper addresses the problem of computing low-weight degree-K spanning trees for $K > 2$. It is shown that for an arbitrary collection of n points in the plane, there exists a spanning tree of degree 3 whose weight is at most 1.5 times the weight of a minimum spanning tree. It is shown that there exists a spanning tree of degree 4 whose weight is at most 1.25 times …