Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Series

2020

Institution
Keyword
Publication
File Type

Articles 1 - 30 of 113

Full-Text Articles in Physical Sciences and Mathematics

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao Dec 2020

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao

Articles

It is often the case with new technologies that it is very hard to predict their long-term impacts and as a result, although new technology may be beneficial in the short term, it can still cause problems in the longer term. This is what happened with oil by-products in different areas: the use of plastic as a disposable material did not take into account the hundreds of years necessary for its decomposition and its related long-term environmental damage. Data is said to be the new oil. The message to be conveyed is associated with its intrinsic value. But as in …


Analysis And Implementation Of The Maximum Likelihood Expectation Maximization Algorithm For Find, Angus Boyd Jameson Dec 2020

Analysis And Implementation Of The Maximum Likelihood Expectation Maximization Algorithm For Find, Angus Boyd Jameson

Student Research Projects

This thesis presents an organized explanation and breakdown of the Maximum Likelihood Expectation Maximization image reconstruction algorithm. This background research was used to develop a means of implementing the algorithm into the imaging code for UNH's Field Deployable Imaging Neutron Detector to improve its ability to resolve complex neutron sources. This thesis provides an overview for this implementation scheme, and include the results of a couple of reconstruction tests for the algorithm. A discussion is given on the current state of the algorithm and its integration with the neutron detector system, and suggestions are given for how the work and …


Data Science In The Time Of Covid-19, Tony Breitzman Dec 2020

Data Science In The Time Of Covid-19, Tony Breitzman

Faculty Scholarship for the College of Science & Mathematics

No abstract provided.


Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi Dec 2020

Principal Component Analysis For Predicting The Party Of The Legislators, Afsana Mimi

Publications and Research

In Spring 2020, I did a project, "Decision Tree Predicting the Party of Legislators," and construct a decision tree model to predict legislators' parties' based on their votes. We also use this model to identify legislators who frequently voted against their parties. We used the legislators' roll call votes, Office of Clerk U.S. House of Representatives Data Sets (Categorical values) collected in 2018 and 2019. In this new project, We study the 2018 and 2019 vote data using Principal Component Analysis (PCA). The goal is to find a (compressed) model using unsupervised learning to distinguish the legislators' parties, and PCA …


Introduction To Data Science Lti 110, Joanna Burkhardt Dec 2020

Introduction To Data Science Lti 110, Joanna Burkhardt

Library Impact Statements

No abstract provided.


Introduction To Data Science, Joanna Burkhardt Dec 2020

Introduction To Data Science, Joanna Burkhardt

Library Impact Statements

No abstract provided.


Spatial Frequency Implications For Global And Local Processing In Autistic Children, Riya Mody, Ayra Tusneem, Louanne Boyd, Vincent Berardi Dec 2020

Spatial Frequency Implications For Global And Local Processing In Autistic Children, Riya Mody, Ayra Tusneem, Louanne Boyd, Vincent Berardi

Student Scholar Symposium Abstracts and Posters

Visual processing in humans is done by integrating and updating multiple streams of global and local sensory input. Interaction between these two systems can be disrupted in individuals with ASD and other learning disabilities. When this integration is not done smoothly, it becomes difficult to see the “big picture”, which has been found to have implications on emotion recognition, social skills, and conversation skills. An example of this phenomenon is local interference, which is when local details are prioritized over the global features. Previous research in this field has aimed to decrease local interference by developing and evaluating a filter …


Factors Affecting Computer Science Research Productivity And Impact In Nigeria: A Bibliometric Evidence, Azubuike Ezenwoke Dec 2020

Factors Affecting Computer Science Research Productivity And Impact In Nigeria: A Bibliometric Evidence, Azubuike Ezenwoke

Library Philosophy and Practice (e-journal)

Computer science is a burgeoning research field and has the potential to accelerate the rate of industrialisation and subsequently, economic development. Using bibliometric data obtained from Scopus, this study employed a 15-year bibliometric analysis to highlight Nigeria’s productivity and impact trends in the computer science research landscape. Our findings are summarised as follows: First, Nigeria’s computer science research contribution and citations are meager in comparison to the global output. Secondly, international collaboration is generally weak as most collaborations are national in scope. Third, Nigeria’s computer science-related research is published in low-quality outlets, as Scopus has discontinued the indexing of most …


Evaluating The Reproducibility Of Physiological Stress Detection Models, Varun Mishra, Sougata Sen, Grace Chen, Tian Hao, Jeffrey Rogers, Ching-Hua Chen, David Kotz Dec 2020

Evaluating The Reproducibility Of Physiological Stress Detection Models, Varun Mishra, Sougata Sen, Grace Chen, Tian Hao, Jeffrey Rogers, Ching-Hua Chen, David Kotz

Dartmouth Scholarship

Recent advances in wearable sensor technologies have led to a variety of approaches for detecting physiological stress. Even with over a decade of research in the domain, there still exist many significant challenges, including a near-total lack of reproducibility across studies. Researchers often use some physiological sensors (custom-made or off-the-shelf), conduct a study to collect data, and build machine-learning models to detect stress. There is little effort to test the applicability of the model with similar physiological data collected from different devices, or the efficacy of the model on data collected from different studies, populations, or demographics.

This paper takes …


Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker Dec 2020

Open Data, Collaborative Working Platforms, And Interdisciplinary Collaboration: Building An Early Career Scientist Community Of Practice To Leverage Ocean Observatories Initiative Data To Address Critical Questions In Marine Science, Robert M. Levine, Kristen E. Fogaren, Johna E. Rudzin, Christopher J. Russoniello, Dax C. Soule, Justine M. Whitaker

Publications and Research

Ocean observing systems are well-recognized as platforms for long-term monitoring of near-shore and remote locations in the global ocean. High-quality observatory data is freely available and accessible to all members of the global oceanographic community—a democratization of data that is particularly useful for early career scientists (ECS), enabling ECS to conduct research independent of traditional funding models or access to laboratory and field equipment. The concurrent collection of distinct data types with relevance for oceanographic disciplines including physics, chemistry, biology, and geology yields a unique incubator for cutting-edge, timely, interdisciplinary research. These data are both an opportunity and an incentive …


Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz Dec 2020

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how …


Healthcare Regulation And Governance: Big Data Analytics And Healthcare Data Protection, Xuejuan Zhang Dec 2020

Healthcare Regulation And Governance: Big Data Analytics And Healthcare Data Protection, Xuejuan Zhang

School of Continuing and Professional Studies Student Papers

No abstract provided.


Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer Nov 2020

Creating Optimal Conditions For Reproducible Data Analysis In R With ‘Fertile’, Audrey M. Bertin, Benjamin Baumer

Statistical and Data Sciences: Faculty Publications

The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively, to prevent reproducibility mistakes from happening …


Viral Data, Agnieszka Leszczynski, Matthew Zook Nov 2020

Viral Data, Agnieszka Leszczynski, Matthew Zook

Geography Faculty Publications

We are experiencing a historical moment characterized by unprecedented conditions of virality: a viral pandemic, the viral diffusion of misinformation and conspiracy theories, the viral momentum of ongoing Hong Kong protests, and the viral spread of #BlackLivesMatter demonstrations and related efforts to defund policing. These co-articulations of crises, traumas, and virality both implicate and are implicated by big data practices occurring in a present that is pervasively mediated by data materialities, deeply rooted dataist ideologies that entrench processes of datafication as granting objective access to truth and attendant practices of tracking, data analytics, algorithmic prediction, and data-driven targeting of individuals …


Lis Online Graduate Certificate In Data Science, Joanna Burkhardt Nov 2020

Lis Online Graduate Certificate In Data Science, Joanna Burkhardt

Library Impact Statements

No abstract provided.


Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua Nov 2020

Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua

Research Collection School Of Computing and Information Systems

Education is very important to Singapore, and the government has continued to invest heavily in our education system to become one of the world-class systems today. A strong foundation of Science, Technology, Engineering, and Mathematics (STEM) was what underpinned Singapore's development over the past 50 years. PISA is a triennial international survey that evaluates education systems worldwide by testing the skills and knowledge of 15-year-old students who are nearing the end of compulsory education. In this paper, the authors used the PISA data from 2012 and 2015 and developed machine learning techniques to predictive the students' scores and understand the …


A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui Oct 2020

A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui

Markey Cancer Center Faculty Publications

BACKGROUND: Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of …


Comparing Variable Importance In Prediction Of Silence Behaviours Between Random Forest And Conditional Inference Forest Models., Stephen Barrett Dr, Geraldine Gray Dr, Colm Mcguinness Dr, Michael Knoll Dr. Oct 2020

Comparing Variable Importance In Prediction Of Silence Behaviours Between Random Forest And Conditional Inference Forest Models., Stephen Barrett Dr, Geraldine Gray Dr, Colm Mcguinness Dr, Michael Knoll Dr.

Articles

This paper explores variable importance metrics of Conditional Inference Trees (CIT) and classical Classification And Regression Trees (CART) based Random Forests. The paper compares both algorithms variable importance rankings and highlights why CIT should be used when dealing with data with different levels of aggregation. The models analysed explored the role of cultural factors at individual and societal level when predicting Organisational Silence behaviours.


Towards High Performance Stock Market Prediction Methods, Warren M. Landis, Sangwhan Cha Oct 2020

Towards High Performance Stock Market Prediction Methods, Warren M. Landis, Sangwhan Cha

Other Student Works

Stock markets of today, and will continue to in the future, rely on the metrics of timeliness and efficiency to reach optimal profits. A way stock investors have continued to strive for the best of these two factors of the business is through the use of predictive machine learning systems to help aid in their decision making. However, among the many systems currently in use, it could be said that the myriad of data that they are based on may not be sufficient. In an effort to devise an ensemble learning predictive system that will utilize an array of big …


Tapping Twitter Data For Analyzing And Visualizing Public Sentiments On Censorship, Naveen Kumar Yadav, Akhilesh K.S. Yadav Oct 2020

Tapping Twitter Data For Analyzing And Visualizing Public Sentiments On Censorship, Naveen Kumar Yadav, Akhilesh K.S. Yadav

Library Philosophy and Practice (e-journal)

The main objective of this research study is to analyse and visualize Twitter data with tags “#Censorship”. A connection was established with twitter using Twitter API, and receiving the tweets on Google Spreadsheets. Data visualization was performed using various tools such as Voyant Tools, Tableau, Google Spreadsheet and Orange in order to generate different visualizations based upon, language, geographical areas, retweets etc. The sentiment analysis was performed for the sentiments that were attached to the given set of data by the public in their respective tweets. The 23680 tweets were retrieved during the data collection time and there were 13,771 …


Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari Oct 2020

Espade: An Efficient And Semantically Secure Shortest Path Discovery For Outsourced Location-Based Services, Bharath K. Samanthula, Divyadharshini Karthikeyan, Boxiang Dong, K. Anitha Kumari

Department of Computer Science Faculty Scholarship and Creative Works

With the rapid growth of smart devices and technological advancements in tracking geospatial data, the demand for Location-Based Services (LBS) is facing a constant rise in several domains, including military, healthcare and transportation. It is a natural step to migrate LBS to a cloud environment to achieve on-demand scalability and increased resiliency. Nonetheless, outsourcing sensitive location data to a third-party cloud provider raises a host of privacy concerns as the data owners have reduced visibility and control over the outsourced data. In this paper, we consider outsourced LBS where users want to retrieve map directions without disclosing their location information. …


Project In Data Science Dsp 499, Joanna Burkhardt Oct 2020

Project In Data Science Dsp 499, Joanna Burkhardt

Library Impact Statements

No abstract provided.


Research In Data Science Dsp 599, Harrison Dekker Oct 2020

Research In Data Science Dsp 599, Harrison Dekker

Library Impact Statements

No abstract provided.


Data Science Internship Dsp 477, Harrison Dekker Oct 2020

Data Science Internship Dsp 477, Harrison Dekker

Library Impact Statements

No abstract provided.


Data Analytics Beyond Traditional Probabilistic Approach To Uncertainty, Vladik Kreinovich Oct 2020

Data Analytics Beyond Traditional Probabilistic Approach To Uncertainty, Vladik Kreinovich

Departmental Technical Reports (CS)

Data for processing mostly comes from measurements, and measurements are never absolutely accurate: there is always the "measurement error" -- the difference between the measurement result and the actual (unknown) value of the measured quantity. In many applications, it is important to find out how these measurement errors affect the accuracy of the result of data processing. Traditional data processing techniques implicitly assume that we know the probability distributions. In many practical situations, however, we only have partial information about these distributions. In some cases, all we know is the upper bound on the absolute value of the measurement error. …


Imaging Data On Characterization Of Retinal Autofluorescent Lesions In A Mouse Model Of Juvenile Neuronal Ceroid Lipofuscinosis (Cln3 Disease), Qing Jun Wang, Kyung Sik Jung, Kabhilan Mohan, Mark E. Kleinman Oct 2020

Imaging Data On Characterization Of Retinal Autofluorescent Lesions In A Mouse Model Of Juvenile Neuronal Ceroid Lipofuscinosis (Cln3 Disease), Qing Jun Wang, Kyung Sik Jung, Kabhilan Mohan, Mark E. Kleinman

Ophthalmology and Visual Science Faculty Publications

Juvenile neuronal ceroid lipofuscinosis (JNCL, aka. juvenile Batten disease or CLN3 disease), a lethal pediatric neurodegenerative disease without cure, often presents with vision impairment and characteristic ophthalmoscopic features including focal areas of hyper-autofluorescence. In the associated research article “Loss of CLN3, the gene mutated in juvenile neuronal ceroid lipofuscinosis, leads to metabolic impairment and autophagy induction in retinal pigment epithelium” (Zhong et al., 2020) [1], we reported ophthalmoscopic observations of focal autofluorescent lesions or puncta in the Cln3Δex7/8 mouse retina at as young as 8 month old. In this data article, we performed differential interference contrast and …


A Tree Frog (Boana Pugnax) Dataset Of Skin Transcriptome For The Identification Of Biomolecules With Potential Antimicrobial Activities, Yamil Liscano Martinez, Claudia Marcela Arenas Gómez, Jeramiah J. Smith, Jean Paul Delgado Oct 2020

A Tree Frog (Boana Pugnax) Dataset Of Skin Transcriptome For The Identification Of Biomolecules With Potential Antimicrobial Activities, Yamil Liscano Martinez, Claudia Marcela Arenas Gómez, Jeramiah J. Smith, Jean Paul Delgado

Biology Faculty Publications

Increases in the prevalence of multiply resistant microbes have necessitated the search for new molecules with antimicrobial properties. One noteworthy avenue in this search is inspired by the presence of native antimicrobial peptides in the skin of amphibians. Having the second highest diversity of frogs worldwide, Colombian anurans represent an extensive natural reservoir that could be tapped in this search. Among this diversity, species such as Boana pugnax (the Chirique-Flusse Treefrog) are particularly notable, in that they thrive in a diversity of marginal habitats, utilize both aquatic and arboreal habitats, and are members of one of few genera that are …


Automated Discussion Analysis - Framework For Knowledge Analysis From Class Discussions, Swapna Gottipati, Venky Shankararaman, Mallikan Gokarn Nitin Oct 2020

Automated Discussion Analysis - Framework For Knowledge Analysis From Class Discussions, Swapna Gottipati, Venky Shankararaman, Mallikan Gokarn Nitin

Research Collection School Of Computing and Information Systems

This research full paper, describes knowledge management of class discussions using an analytics based framework. Discussions, either live classroom or through online forums, when used as a teaching method can help stimulate critical thinking. It allows the teacher to explore in-depth the key concepts covered in the course, motivates students to articulate their ideas clearly and challenge the students to think more deeply. Analysing the discussions helps instructors gain better insights on the personal and collaborative learning behaviour of students. However, knowledge from in-class discussions and online forums is not effectively captured and mined due to lack of appropriate automated …


Visual Sentiment Analysis For Review Images With Item-Oriented And User-Oriented Cnn: Reproducibility Companion Paper, Quoc Tuan Truong, Hady W. Lauw, Martin Aumuller, Naoko Nitta Oct 2020

Visual Sentiment Analysis For Review Images With Item-Oriented And User-Oriented Cnn: Reproducibility Companion Paper, Quoc Tuan Truong, Hady W. Lauw, Martin Aumuller, Naoko Nitta

Research Collection School Of Computing and Information Systems

We revisit our contributions on visual sentiment analysis for online review images published at ACM Multimedia 2017, where we develop item-oriented and user-oriented convolutional neural networks that better capture the interaction of image features with specific expressions of users or items. In this work, we outline the experimental claims as well as describe the procedures to reproduce the results therein. In addition, we provide artifacts including data sets and code to replicate the experiments.


European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong Oct 2020

European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong

Research Collection School Of Computing and Information Systems

This research utilized the intrinsic quality of European floating strike lookback call options, alongside selected return and volatility parameters, in a K-means clustering environment, to recommend an alpha generative trading strategy. The result is an elegant easy-to-use alpha strategy based on the option mechanisms which identifies investment assets with high degree of significance. In an upward trending market, the research had identified European floating strike lookback call option as an evaluative criterion and investable asset, which would both allow investors to predict and profit from alpha opportunities. The findings will be useful for (i) buy-side investors seeking alpha generation and/or …