Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Other Computer Sciences

2021

Institution
Keyword
Publication
Publication Type

Articles 1 - 29 of 29

Full-Text Articles in Data Science

Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi Dec 2021

Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi

Electronic Theses and Dissertations

Feature selection is a data processing approach that has been successfully and effectively used in developing machine learning algorithms for various applications. It has been proven to effectively reduce the dimensionality of the data and increase the accuracy and interpretability of machine learning algorithms. Conventional feature selection algorithms assume that there is an optimal global subset of features for the whole sample space. Thus, only one global subset of relevant features is learned. An alternative approach is based on the concept of Local Feature Selection (LFS), where each training sample can have its own subset of relevant features. Multiple Instance …


Visualizing Features From Deep Neural Networks Trained On Alzheimer’S Disease And Few-Shot Learning Models For Alzheimer’S Disease, John Reeder Dec 2021

Visualizing Features From Deep Neural Networks Trained On Alzheimer’S Disease And Few-Shot Learning Models For Alzheimer’S Disease, John Reeder

All Theses

Alzheimer’s disease is an incurable neural disease, usually affecting the elderly. The afflicted suffer from cognitive impairments that get dramatically worse at each stage. Previous research on Alzheimer’s disease analysis in terms of classification leveraged statistical models such as support vector machines. However, statistical models such as support vector machines train the from numerical data instead of medical images. Today, convolutional neural networks (CNN) are widely considered as the one which can achieve the state-of-the- art image classification performance. However, due to their black box nature, there can be reluctance amongst medical professionals for their use. On the other hand, …


Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen Nov 2021

Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen

Mathematics, Physics, and Computer Science Faculty Articles and Research

During the lithospheric buildup to an earthquake, complex physical changes occur within the earthquake hypocenter. Data pertaining to the changes in the ionosphere may be obtained by satellites, and the analysis of data anomalies can help identify earthquake precursors. In this paper, we present a deep-learning model, SeqNetQuake, that uses data from the first China Seismo-Electromagnetic Satellite (CSES) to identify ionospheric perturbations prior to earthquakes. SeqNetQuake achieves the best performance [F-measure (F1) = 0.6792 and Matthews correlation coefficient (MCC) = 0.427] when directly trained on the CSES dataset with a spatial window centered on the earthquake epicenter with the Dobrovolsky …


Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori Nov 2021

Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori

Masters Theses & Doctoral Dissertations

Traditional means of on-farm weed control mostly rely on manual labor. This process is time-consuming, costly, and contributes to major yield losses. Further, the conventional application of chemical weed control can be economically and environmentally inefficient. Site-specific weed management (SSWM) counteracts this by reducing the amount of chemical application with localized spraying of weed species. To solve this using computer vision, precision agriculture researchers have used remote sensing weed maps, but this has been largely ineffective for early season weed control due to problems such as solar reflectance and cloud cover in satellite imagery. With the current advances in artificial …


The Forestecology R Package For Fitting And Assessing Neighborhood Models Of The Effect Of Interspecific Competition On The Growth Of Trees, Albert Y. Kim, David N. Allen, Simon P. Couch Nov 2021

The Forestecology R Package For Fitting And Assessing Neighborhood Models Of The Effect Of Interspecific Competition On The Growth Of Trees, Albert Y. Kim, David N. Allen, Simon P. Couch

Statistical and Data Sciences: Faculty Publications

Neighborhood competition models are powerful tools to measure the effect of interspecific competition. Statistical methods to ease the application of these models are currently lacking. We present the forestecology package providing methods to (a) specify neighborhood competition models, (b) evaluate the effect of competitor species identity using permutation tests, and (cs) measure model performance using spatial cross-validation. Following Allen and Kim (PLoS One, 15, 2020, e0229930), we implement a Bayesian linear regression neighborhood competition model. We demonstrate the package's functionality using data from the Smithsonian Conservation Biology Institute's large forest dynamics plot, part of the ForestGEO global network of research …


Facilitating Team-Based Data Science: Lessons Learned From The Dsc-Wav Project, Chelsey Legacy, Andrew Zieffler, Benjamin S. Baumer, Valerie Barr, Nicholas J. Horton Oct 2021

Facilitating Team-Based Data Science: Lessons Learned From The Dsc-Wav Project, Chelsey Legacy, Andrew Zieffler, Benjamin S. Baumer, Valerie Barr, Nicholas J. Horton

Statistical and Data Sciences: Faculty Publications

While coursework provides undergraduate data science students with some relevant analytic skills, many are not given the rich experiences with data and computing they need to be successful in the workplace. Additionally, students often have limited exposure to team-based data science and the principles and tools of collaboration that are encountered outside of school. In this paper, we describe the DSC-WAV program, an NSF-funded data science workforce development project in which teams of undergraduate sophomores and juniors work with a local non-profit organization on a data-focused problem. To help students develop a sense of agency and improve confidence in their …


Human Mobility Monitoring Using Wifi: Analysis, Modeling, And Applications, Amee Trivedi Oct 2021

Human Mobility Monitoring Using Wifi: Analysis, Modeling, And Applications, Amee Trivedi

Doctoral Dissertations

Understanding and modeling humans and device mobility has fundamental importance in mobile computing, with implications ranging from network design and location-aware technologies to urban infrastructure planning. Today's users carry a plethora of devices such as smartphones, laptops, tablets, and smartwatches, with each device offering a different set of services resulting in different usage and mobility leading to the research question of understanding and modeling multiple user device trajectories. Additionally, prior research on mobility focuses on outdoor mobility when it is known that users spend 80% of their time indoors resulting in wide gaps in knowledge in the area of indoor …


Infer: An R Package For Tidyverse-Friendly Statistical Inference, Simon P. Couch, Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, B. Baumer, Mine Cetinkaya-Rundel Sep 2021

Infer: An R Package For Tidyverse-Friendly Statistical Inference, Simon P. Couch, Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, B. Baumer, Mine Cetinkaya-Rundel

Statistical and Data Sciences: Faculty Publications

infer implements an expressive grammar to perform statistical inference that adheres to the tidyverse design framework (Wickham et al., 2019). Rather than providing methods for specific statistical tests, this package consolidates the principles that are shared among common hypothesis tests and confidence intervals into a set of four main verbs (functions), supplemented with many utilities to visualize and extract value from their outputs.


Monitoring At-Home Care Patients Through A Scalar Polar Plot Visualization Of Motion Sensor Data, Michael Mcgavin Aug 2021

Monitoring At-Home Care Patients Through A Scalar Polar Plot Visualization Of Motion Sensor Data, Michael Mcgavin

Undergraduate Student Research Internships Conference

In Canada, approximately 18 percent (6.6 million) of the total population are age 65 or older, and 88 percent of people over age 65 want to stay in their residence for as long as possible. This older demographic is a group that is dependent on proactive and preventative healthcare. Using motion sensor data collected from a local company providing home-care services to this demographic, a data visualization was constructed to assist users in observing patient behavior and improving their quality of life while maintaining their independence. However, since the collected data is time-based, it results in a dataset that is …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Towards Understanding The Temporal Accuracy Of Openstreetmap: A Quantitative Experiment, Levente Juhasz Jul 2021

Towards Understanding The Temporal Accuracy Of Openstreetmap: A Quantitative Experiment, Levente Juhasz

GIS Center

No abstract provided.


A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic Jun 2021

A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic

Dartmouth College Undergraduate Theses

Our world has never been more connected, and the size of the social media landscape draws a great deal of attention from academia. However, social networks are also a growing challenge for the Institutional Review Boards concerned with the subjects’ privacy. These networks contain a monumental variety of personal information of almost 4 billion people, allow for precise social profiling, and serve as a primary news source for many users. They are perfect environments for influence operations that are becoming difficult to defend against. Motivated to study online social influence via IRB-approved experiments, we designed and implemented a flexible, scalable, …


Advancing The Ability To Predict Cognitive Decline And Alzheimer’S Disease Based On Genetic Variants Beyond Amyloid-Beta And Tau, Naveen Rawat Jun 2021

Advancing The Ability To Predict Cognitive Decline And Alzheimer’S Disease Based On Genetic Variants Beyond Amyloid-Beta And Tau, Naveen Rawat

Master's Projects

A growing amount of neurodegenerative R&D is focused on identifying genomic- based explanations of AD that are beyond Amyloid-b and Tau. The proposed effort involves identifying some of the genomic variations, such as single nucleotide polymorphisms (SNPs), allele , chromosome, epigenetic contributors to MCI and AD that are beyond Aβ and Tau.

The project involves building a prediction model based on a support vector machine (SVM) classifier that takes into account the genomic variations and epigenetic factors to predict the early stage of mild cognitive impairment (MCI) and Alzheimer disease (AD). To achieve this, picking up important feature sets which …


Machine Learning For Terminal Procedure Chart Change Detection, Anthony M. Marchiafava May 2021

Machine Learning For Terminal Procedure Chart Change Detection, Anthony M. Marchiafava

University of New Orleans Theses and Dissertations

Terminal Procedure Charts are a constantly updated and necessary tool for aircraft personnel to approach and take off from airport runways safely. Detecting changes within these charts is a time-consuming and laborious process. Here machine learning techniques were used to predict regions of change in charts based on detecting the charts image regions and comparing features extracted from those regions. Outlined are methodologies to detect differences between two separate charts to produce images with changed regions clearly indicated. Both more conventional computer vision and machine learning techniques were applied. For images with minor shifts, the proposed model is able to …


Convolutional Neural Networks For Deflate Data Encoding Classification Of High Entropy File Fragments, Nehal Ameen May 2021

Convolutional Neural Networks For Deflate Data Encoding Classification Of High Entropy File Fragments, Nehal Ameen

University of New Orleans Theses and Dissertations

Data reconstruction is significantly improved in terms of speed and accuracy by reliable data encoding fragment classification. To date, work on this problem has been successful with file structures of low entropy that contain sparse data, such as large tables or logs. Classifying compressed, encrypted, and random data that exhibit high entropy is an inherently difficult problem that requires more advanced classification approaches. We explore the ability of convolutional neural networks and word embeddings to classify deflate data encoding of high entropy file fragments after establishing ground truth using controlled datasets. Our model is designed to either successfully classify file …


Prediction Of Financial Capacity Using Diffusion Compartment Imaging, Lok Yi Tai May 2021

Prediction Of Financial Capacity Using Diffusion Compartment Imaging, Lok Yi Tai

Master's Projects

Financial Capacity (FC) is the ability to manage one’s financial affairs, which is essential for autonomy and independence particularly for aging adults. Since dementia develops gradually, it is often difficult to detect the early signs that this cognitive dysfunction is developing This project aims to use Neurite orientation dispersion and density imaging (NODDI) to identify the white matter tracts that are associated with FC. Diffusion Tensor Images (DTI) and T1 Magnetic Resonance Images (MRI) of 18 Alzheimer’s Disease (AD) subjects, 47 Mild Cognitive Impaired (MCI) subjects, and 193 healthy control (CN) are compared to neuropsychological tests. Orientation Dispersion Index (ODI) …


Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu May 2021

Spaceflight And The Differential Gene Expression Of Human Stem Cell-Derived Cardiomyocytes, Eugenie Zhu

Master's Projects

The National Aeronautics and Space Administration (NASA) has performed many experiments on the International Space Station (ISS) to further understand how conditions in space can affect life on Earth. This project analyzed GLDS-258, a gene set from NASA’s GeneLab repository which examines the impact of microgravity on human induced pluripotent stem-cell-derived cardiomyocytes (hiPSC-CMs). While many datasets have been run through NASA’s RNA-Seq Consensus Pipeline (RCP) to study differential gene expression in space, a Homo sapiens dataset has yet to be analyzed using the RCP. The aim of this project was to run the first Homo sapiens dataset, GLDS-258, through the …


Wildfire Risk Prediction For A Smart City, Rekha Rani May 2021

Wildfire Risk Prediction For A Smart City, Rekha Rani

Master's Projects

Wildfires are uncontrolled fires that may lead to the destruction of biodiversity, soil fertility, and human resources. There is a need for timely detection and prediction of wildfires to minimize their disastrous effects. In this research, we propose a wildfire prediction model that relies on multi-criteria decision making (MCDM) to explicitly evaluates multiple conflicting criteria in decision making and weave the wildfire risks into the city’s resiliency plan. We incorporate fuzzy set theory to handle imprecision and uncertainties. In the process, we create a new data set that includes California cities’ weather, vegetation, topography, and population density records. The model …


Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian May 2021

Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian

LSU Doctoral Dissertations

Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data.

In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either …


Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie May 2021

Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie

Undergraduate Honors Theses

Density functional theory (DFT) has become a popular method for computational work involving larger molecular systems as it provides accuracy that rivals ab initio methods while lowering computational cost. Nevertheless, computational cost is still high for systems greater than ten atoms in size, preventing their application in modeling realistic atmospheric systems at the molecular level. Machine learning techniques, however, show promise as cost-effective tools in predicting chemical properties when properly trained. In the interest of furthering chemical machine learning in the field of atmospheric science, I have developed a training method for predicting cluster energetics of newly characterized nitrogen-based brown …


Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo Apr 2021

Interrupting The Propaganda Supply Chain, Kyle Hamilton, Bojan Bozic, Luc Longo

Conference papers

In this early-stage research, a multidisciplinary approach is presented for the detection of propaganda in the media, and for modeling the spread of propaganda and disinformation using semantic web and graph theory. An ontology will be designed which has the theoretical underpinnings from multiple disciplines including the social sciences and epidemiology. An additional objective of this work is to automate triple extraction from unstructured text which surpasses the state-of-the-art performance.


Network-Based Analysis Of Early Pandemic Mitigation Strategies: Solutions, And Future Directions, Pegah Hozhabrierdi, Raymond Zhu, Maduakolam Onyewu, Sucheta Soundarajan Mar 2021

Network-Based Analysis Of Early Pandemic Mitigation Strategies: Solutions, And Future Directions, Pegah Hozhabrierdi, Raymond Zhu, Maduakolam Onyewu, Sucheta Soundarajan

Northeast Journal of Complex Systems (NEJCS)

Despite the large amount of literature on mitigation strategies for pandemic spread, in practice, we are still limited by naive strategies, such as lockdowns, that are not effective in controlling the spread of the disease in long term. One major reason behind adopting basic strategies in real-world settings is that, in the early stages of a pandemic, we lack knowledge of the behavior of a disease, and so cannot tailor a more sophisticated response. In this study, we design different mitigation strategies for early stages of a pandemic and perform a comprehensive analysis among them. We then propose a novel …


A Consent Framework For The Internet Of Things In The Gdpr Era, Gerald Chikukwa Mar 2021

A Consent Framework For The Internet Of Things In The Gdpr Era, Gerald Chikukwa

Masters Theses & Doctoral Dissertations

The Internet of Things (IoT) is an environment of connected physical devices and objects that communicate amongst themselves over the internet. The IoT is based on the notion of always-connected customers, which allows businesses to collect large volumes of customer data to give them a competitive edge. Most of the data collected by these IoT devices include personal information, preferences, and behaviors. However, constant connectivity and sharing of data create security and privacy concerns. Laws and regulations like the General Data Protection Regulation (GDPR) of 2016 ensure that customers are protected by providing privacy and security guidelines to businesses. Data …


Jrevealpeg: A Semi-Blind Jpeg Steganalysis Tool Targeting Current Open-Source Embedding Programs, Charles A. Badami Mar 2021

Jrevealpeg: A Semi-Blind Jpeg Steganalysis Tool Targeting Current Open-Source Embedding Programs, Charles A. Badami

Masters Theses & Doctoral Dissertations

Steganography in computer science refers to the hiding of messages or data within other messages or data; the detection of these hidden messages is called steganalysis. Digital steganography can be used to hide any type of file or data, including text, images, audio, and video inside other text, image, audio, or video data. While steganography can be used to legitimately hide data for non-malicious purposes, it is also frequently used in a malicious manner. This paper proposes JRevealPEG, a software tool written in Python that will aid in the detection of steganography in JPEG images with respect to identifying a …


Uncovering Object Categories In Infant Views, Naiti S. Bhatt Jan 2021

Uncovering Object Categories In Infant Views, Naiti S. Bhatt

Scripps Senior Theses

While adults recognize objects in a near-instant, infants must learn how to categorize the objects in their visual environments. Recent work has shown that egocentric head-mounted camera videos contain rich data that illuminate the infant experience (Clerkin et al., 2017; Franchak et al., 2011; Yoshida & Smith, 2008). While past work has focused on the social information in view, in this work, we aim to characterize the objects in infants’ at-home visual environments by modifying modern computer vision models for the infant view. To do so, we collected manual annotations of objects that infants seemed to be interacting within a …


Automatic Hierarchy Expansion For Improved Structure And Chord Evaluation, Katherine M. Kinnaird, Brian Mcfee Jan 2021

Automatic Hierarchy Expansion For Improved Structure And Chord Evaluation, Katherine M. Kinnaird, Brian Mcfee

Statistical and Data Sciences: Faculty Publications

No abstract provided.


Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle Jan 2021

Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle

All Master's Theses

Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


The Data Science Corps Wrangle-Analyze- Visualize Program: Building Data Acumen For Undergraduate Students, Nicholas J. Horton, Benjamin Baumer, Andrew Zieffler, Valerie Barr Jan 2021

The Data Science Corps Wrangle-Analyze- Visualize Program: Building Data Acumen For Undergraduate Students, Nicholas J. Horton, Benjamin Baumer, Andrew Zieffler, Valerie Barr

Statistical and Data Sciences: Faculty Publications

We congratulate Kolaczyk, Wright, and Yajima on their innovative statistics practicum that places “practice” at the center of data science education (Kolaczyk et al., 2021, this issue). Their year-long practicum course focuses on the data science life cycle with engagement with external partners and university consulting projects. We agree that training postgraduates in practice needs to be foregrounded in the curriculum in order for students to develop necessary depth in data science practice.