Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Temporally Consistent Urban-Rural Delineations For Global Urban Heat Island Monitoring, Tc Chakraborty Dec 2019

Temporally Consistent Urban-Rural Delineations For Global Urban Heat Island Monitoring, Tc Chakraborty

Yale Day of Data

Urbanization leads to local-scale modification of climate, particularly the urban heat island (UHI) effect - the high temperature in cities compared to their surroundings. The UHI effect is generally quantified by measuring the temperature differential between the city and its surrounding rural reference. Choices of both the city and the rural reference are prone to assumptions, which may affect, among other things, temporal variability in UHI intensity. To reduce these uncertainties, I create a global dataset of urban-rural delineations that can be used to better constrain the temporal trends in UHI intensity throughout the globe using the European Space Agency's …


Generating Contextual Text Embeddings For Emergency Department Chief Complaints Using Bert, David Chang Dec 2019

Generating Contextual Text Embeddings For Emergency Department Chief Complaints Using Bert, David Chang

Yale Day of Data

We applied BERT, a state-of-the-art natural language processing model, on chief complaint data from the Yale Emergency Department to map free-text notes to structured chief complaint categories.


Saving Software And Using Emulation To Reproduce Computationally Dependent Research Results, Euan Cochrane, Limor Peer, Ethan Gates, Seth Anderson Dec 2019

Saving Software And Using Emulation To Reproduce Computationally Dependent Research Results, Euan Cochrane, Limor Peer, Ethan Gates, Seth Anderson

Yale Day of Data

Using digital data necessarily involves software. How do institutions think about software in the context of the long-term usability of their data assets? How do they address usability challenges uniquely posed by software such as, license restrictions, legacy software, code rot, and dependencies? These questions are germane to the agenda set forth by the FAIR principles. At Yale University, a team in the Library is looking into the application of a novel approach to emulation as a potential solution. In this presentation, we will outline the work of the Emulation as a Service Infrastructure (EaaSI) program, discuss our plans for …


A Global Database Of Surface Urban Heat Island Intensity, Tc Chakraborty, Xuhui Lee Jan 2019

A Global Database Of Surface Urban Heat Island Intensity, Tc Chakraborty, Xuhui Lee

Yale Day of Data

The urban heat island (UHI) effect - the phenomenon of higher temperatures in urban environments - is one of the most well-known consequences of urbanization on local climate. We develop the simplified urban-extent (SUE) algorithm, a new algorithm to estimate the urban heat island (UHI) intensity at a global scale. This algorithm is implemented on the Google Earth Engine platform and uses satellite-derived images to calculate the surface UHI intensity for over 9500 urban clusters covering 15 years, making this the most comprehensive global UHI database. The data are validated against previous multi-city studies and then used to estimate the …


Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-Hwa Chu, Wenlan Zang Jan 2019

Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-Hwa Chu, Wenlan Zang

Yale Day of Data

Rationale:Alpha-1 antitrypsin deficiency (AATD) is a genetic condition that predisposes to early onset pulmonary emphysema and airways obstruction. The exact mechanism through which AATD leads to lung disease is incompletely understood.

Objectives: To investigate the effect of AAT genotype and augmentation therapy on bronchoalveolar lavage (BAL) and peripheral blood mononuclear cells (PBMC) transcriptome, while examining the link between gene expression profiles, and clinical features of AATD.

Methods: We performed RNA-Seq on RNA extracted from BAL and PBMC on samples obtained from 89 AATD patients enrolled in the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study. Differential …


Topovar90m: Global High-Resolution Topographic Variables For Environmental Modeling, Giuseppe Amatulli Dr. Jan 2019

Topovar90m: Global High-Resolution Topographic Variables For Environmental Modeling, Giuseppe Amatulli Dr.

Yale Day of Data

Topographical relief involves the vertical and horizontal variation of the Earth's terrain and it drives processes in hydrology, climatology, geography and ecology. Its assessment and characterization is fundamental for various types of modeling and simulation analysis. In this regard, the Multi-Error-Removed Improved Terrain (MERIT) Digital Elevation Model (DEM) currently provides the best high-resolution DEM globally available, at a 3 arc-second resolution (90m), due to the removal of multiple error components from the underlying SRTM3 and AW3D30 DEMs. To depict topographical variations worldwide, we developed a new dataset comprising different terrain features derived from the MERIT-DEM. The fully standardized topographical variables …


Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan Jan 2019

Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan

Yale Day of Data

Whole transcriptome wide gene expression profiles in the sputum and circulation from 100 asthma patients were measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from KEGG were used to identify TEA clusters of patients from the sputum gene expression profiles. The identified TEA clusters have significantly different pre-bronchodilator FEV1, bronchodilator responsiveness, exhaled nitric oxide levels, history of hospitalization for asthma and history of intubation. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed the identified differences in intubation and hospitalization. Furthermore, evaluation of the TH2 gene signatures suggested a much lower prevalence of …


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan Jan 2019

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both …


Analyzing Neuronal Dendritic Trees With Convolutional Neural Networks, Olivier Trottier, Jonathon Howard Jan 2019

Analyzing Neuronal Dendritic Trees With Convolutional Neural Networks, Olivier Trottier, Jonathon Howard

Yale Day of Data

In the biological sciences, image analysis software are used to detect, segment or classify a variety of features encountered in living matter. However, the algorithms that accomplish these tasks are often designed for a specific dataset, making them hardly portable to accomplish the same tasks on images of different biological structures. Recently, convolutional neural networks have been used to perform complex image analysis on a multitude of datasets. While applications of these networks abound in the technology industry and computer science, use cases are not as common in the academic sciences. Motivated by the generalizability of neural networks, we aim …


Yale’S Environmental Performance Index: The Construction And Use Of A Composite Index For Global Sustainability, Zach Wendling Jan 2017

Yale’S Environmental Performance Index: The Construction And Use Of A Composite Index For Global Sustainability, Zach Wendling

Yale Day of Data

No abstract provided.


Closing The Water Budget In An Experimental Urban Watershed: A Comparative Assessment Of Methods For Measuring Evapotranspiration, Leana M. Weissberg, Gaboury Benoit Jan 2017

Closing The Water Budget In An Experimental Urban Watershed: A Comparative Assessment Of Methods For Measuring Evapotranspiration, Leana M. Weissberg, Gaboury Benoit

Yale Day of Data

No abstract provided.


Extracting Geography From Datasets In Social Sciences, Yuke Li, Tianhao Wu, Nicholas Marshall, Stefan Steinerberger Jan 2017

Extracting Geography From Datasets In Social Sciences, Yuke Li, Tianhao Wu, Nicholas Marshall, Stefan Steinerberger

Yale Day of Data

No abstract provided.


Safer Chemicals Design Diagrams, Longzhu Shen, Fjodor Melnikov, John Roethle, Aditya Gudibanda, Richard Judson, Julie Zimmerman, Paul Anastas Jan 2017

Safer Chemicals Design Diagrams, Longzhu Shen, Fjodor Melnikov, John Roethle, Aditya Gudibanda, Richard Judson, Julie Zimmerman, Paul Anastas

Yale Day of Data

The NRF2-ARE antioxidant pathway is an important biological sensing and regulating system that responds to chemical insults. At minute level, it protects a living species to go through hard environmental conditions. However, when the external disruption exceeds the inherent resilience, cellular damage can occur, eventually leading to cytotoxicity. Therefore, studying the likelihood of a chemical activating the NRF2-ARE pathway is interesting to discovering therapeutic agents and designing safer chemicals. In this research, we engaged a combination of computational chemistry, statistical learning and mechanistic toxicology to estimate the likelihood of a chemical to perturb this critical toxicological pathway and derive a …


Data Collection And Analysis At The Atlas Detector, Savannah Thais Jan 2017

Data Collection And Analysis At The Atlas Detector, Savannah Thais

Yale Day of Data

No abstract provided.


A Nonlinear Filter For Markov Chains And Its Effect On Diffusion Maps, Stefan Steinerberger Sep 2015

A Nonlinear Filter For Markov Chains And Its Effect On Diffusion Maps, Stefan Steinerberger

Yale Day of Data

Diffusion maps are a modern mathematical tool that helps to find structure in large data sets - we present a new filtering technique that is based on the assumption that errors in the data are intrinsically random to isolate and filter errors and thus boost the efficiency of diffusion maps. Applications include data sets from medicine (the Cleveland Heart Disease Data set and the Wisconsin Breast Cancer Data set) and engineering (the Ionosphere data set).


Crowdsourcing Global Wastewater Data, Don Mosteller, Sam Cohen, Cory Nestor, Angel Hsu, Omar Malik Sep 2015

Crowdsourcing Global Wastewater Data, Don Mosteller, Sam Cohen, Cory Nestor, Angel Hsu, Omar Malik

Yale Day of Data

No time to waste: Crowdsourcing global wastewater treatment data

Worldwide, over 80 percent of wastewater is discharged into water bodies without undergoing treatment, severely impairing human well-being and ecosystem vitality along the way. National performance on wastewater treatment is difficult to quantify and is poorly understood due to a lack of common definitions, poor data collection standards, and limited historical data. To address this, the Yale Environmental Performance Index (EPI), a research group that produces a biennial ranking of country-level environmental performance, developed a first-of-its kind national wastewater treatment indicator.[1]

The indicator assesses wastewater treatment performance for 183 countries, …


A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross Sep 2015

A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross

Yale Day of Data

Post-market surveillance is a collection of processes and activities used by product manufacturers and regulators, such as the U.S. Food and Drug Administration (FDA) to monitor the safety and effectiveness of medical devices once they are available for use “on the market”. These activities are designed to generate information to identify poorly performing devices and other safety problems, accurately characterize real-world device performance and clinical outcomes, and facilitate the development of new devices, or new uses for existing devices. Typically, a device is monitored by comparing adverse events in the exposed population to a matched unexposed population. This research considers …


K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein Sep 2015

K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein

Yale Day of Data

The regulation of gene expression involves interaction between transcriptional enhancers and core promoters. However, the separation between developmental and housekeeping gene regulation remains unknown. Here, we present a method to detect if different core promoters exhibit specificity to certain enhancers within massively parallel assays for enhancer detection. We use k-mers of various length (3-8bp) as sequence features and compare k-mer frequencies between developmental and housekeeping enhancers. This method shows promoter specificity of enhancers in D. melanogaster.


Using Graphs To Characterize Nationwide Physician Referral Networks, Ding Tong, Shu-Xia Li, Isuru Ranasinghe, Sudhakar Nuti, Hongyu Zhao, Harlan Krumholz Sep 2014

Using Graphs To Characterize Nationwide Physician Referral Networks, Ding Tong, Shu-Xia Li, Isuru Ranasinghe, Sudhakar Nuti, Hongyu Zhao, Harlan Krumholz

Yale Day of Data

AIM:

Evaluating physician referral network characteristics can help to understand how physicians and hospitals interact to provide patient services within the US healthcare system and ultimately how this may influence patient outcomes.

METHOD:

We used the 2012-2013 national Physician Referral data from the Centers for Medicare & Medicaid Services (CMS), which consists of 73,071,804 pairs of referrals from one health provider to another in calendar year 2012 and the first two quarters of year 2013 within 30 days of care. These referrals are from 642,144 national-wide physicians and 4,811 hospitals. We obtained information for each provider, physician or hospital, from …


Partitioning Bipartite Graphs: A Modified Louvain, Emily Diana Sep 2014

Partitioning Bipartite Graphs: A Modified Louvain, Emily Diana

Yale Day of Data

Abstract

How do we find communities in a graph? How does this change if the graph is bipartite? The Louvain method maximizes links within communities and minimizes those between in order to determine an optimal grouping. Yet, because it may fail when bipartite restrictions are introduced, we have adjusted the null model so as to improve performance in these conditions.

Conclusion

Our Bipartite Louvain is more robust with respect to permutations of vertices than the standard Louvain. For our synthetic examples, Bipartite Louvain typically yields a higher modularity and uncovers the ground truth communities with a higher probability. In the …


A Study Of The N-D-K Scalability Problem In Large-Scale Image Classification, Carlos E. Del-Castillo-Negrete, Sreenivas R. Sukumar Sep 2014

A Study Of The N-D-K Scalability Problem In Large-Scale Image Classification, Carlos E. Del-Castillo-Negrete, Sreenivas R. Sukumar

Yale Day of Data

Image classification is a extensively studied problem that lies at the heart of computer vision. However, the challenge remains to develop a system that can identify and classify thousands of objects like the human visual system. The accumulation of massive image data sets has permitted the study of this problem at a big-data scale. However current algorithms have been shown to fall short of being practical and accurate at scale. To further understand how these algorithms scale, we developed a library of functions to explore the scalability of the support vector machine (SVM) linear classification algorithm when applied to problems …


Stratified Meta-Analysis To Examine Data Biases In Lung Cancer Studies Of Refinery Workers, Sherman Selix Sep 2014

Stratified Meta-Analysis To Examine Data Biases In Lung Cancer Studies Of Refinery Workers, Sherman Selix

Yale Day of Data

Petroleum refineries employ a variety of workers who historically experienced different potentials for asbestos exposure depending on job tasks. Associations between petroleum refinery work and lung cancer related to occupational asbestos exposure have been quantified among various locations, corporations, and time periods. To combine the data from several individual refinery studies and examine an overall effect, a systematic review and stratified meta-analysis was employed. Using set search terms among four databases, 112 potential publications were identified, of which 29 qualified for meta-analysis. Risk estimates and confidence intervals were extracted from these publications to construct four separate datasets. Inverse variance weighting …


Rotating Optical Microcavities With Broken Chiral Symmetry, Raktim Sarma, Li Ge, Jan Wiersig, Hui Cao Sep 2014

Rotating Optical Microcavities With Broken Chiral Symmetry, Raktim Sarma, Li Ge, Jan Wiersig, Hui Cao

Yale Day of Data

We develop a finite difference time domain simulation algorithm to simulate photonic structures in a rotating frame. Using, the algorithm, We numerically compute and demonstrate in open microcavities with broken chiral symmetry, quasi-degenerate pairs of co-propagating modes in a non-rotating cavity evolve to counter-propagating modes with rotation. The emission patterns change dramatically by rotation, due to distinct output directions of CW and CCW waves. By tuning the degree of spatial chirality, we maximize the sensitivity of microcavity emission to rotation. The rotation-induced change of emission is orders of magnitude larger than the Sagnac effect, pointing to a promising direction for …


Incorporating Satellite Derived Cloud Climatologies To Improve High Resolution Interpolation Of Daily Precipitation., Adam M. Wilson, Benoit Parmentier, Brian Mcgill, Rob Guralnick, Walter Jetz Sep 2013

Incorporating Satellite Derived Cloud Climatologies To Improve High Resolution Interpolation Of Daily Precipitation., Adam M. Wilson, Benoit Parmentier, Brian Mcgill, Rob Guralnick, Walter Jetz

Yale Day of Data

Conservation of biodiversity demands comprehension of evolutionary and ecological patterns and processes that occur over vast spatial and temporal scales. A central goal of ecology is to understand the factors that control the spatial distribution of species and this has become even more important in the face of climate change. However, at global scales there can be enormous uncertainty in environmental data used to model species distributions. Even ‘simple’ metrics such as mean annual precipitation are difficult to estimate in areas with few weather stations and available data sets do not quantify uncertainty in these surfaces. We are developing a …


Two Suns In The Sky: Stellar Multiplicity Influence On Planet Formation, Ji Wang, Debra Fischer Sep 2013

Two Suns In The Sky: Stellar Multiplicity Influence On Planet Formation, Ji Wang, Debra Fischer

Yale Day of Data

We found that a planet is less likely to exist around a binary star, and thus Tatooine may be just a dream.


The Future Of Research And Collaboration – The Dedicated Science Network, Andrew Sherman, David Galassi, William Boos, Daisuke Nagai Sep 2013

The Future Of Research And Collaboration – The Dedicated Science Network, Andrew Sherman, David Galassi, William Boos, Daisuke Nagai

Yale Day of Data

This poster describes the new high-speed Science Network and Science DMZ at Yale that will be used for data movement, data sharing, and scientific collaboration.


Data Analysis Using Regression Modeling: Visual Display And Setup Of Simple And Complex Statistical Models, Emil N. Coman, Maria A. Coman, Eugen Iordache, Russell Barbour, Lisa Dierker Sep 2013

Data Analysis Using Regression Modeling: Visual Display And Setup Of Simple And Complex Statistical Models, Emil N. Coman, Maria A. Coman, Eugen Iordache, Russell Barbour, Lisa Dierker

Yale Day of Data

We present visual modeling solutions for testing simple and more advanced statistical hypotheses in any research field. All models can be directly specified in analytical software like Mplus or R.

Data analysis in any substantive field can be easily accomplished by translating statistical tests in the intuitive language of regression-based path diagrams with observed and unobserved variables. All models we presented can be directly specified and estimated in analytical software.

Students can particularly benefit from being taught the simple regression modeling setup of the path analytical method, as it empowers them to apply the techniques to any data to test …