Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,483 Full-Text Articles 2,962 Authors 435,013 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,483 full-text articles. Page 23 of 73.

Safe Sharing For Sensitive Data, Kristi Thompson 2022 Western University

Safe Sharing For Sensitive Data, Kristi Thompson

Western Libraries Presentations

This workshop focused on the question of when and how human subjects' data can be safely shared. It introduced the basics of data anonymization and discussed how to tell if a dataset has been de-identified. Case studies of successful anonymization and some spectacular failures were shared


Enhancing The Performance Of The Mtcnn For The Classification Of Cancer Pathology Reports: From Data Annotation To Model Deployment, Kevin De Angeli 2022 University of Tennessee, Knoxville

Enhancing The Performance Of The Mtcnn For The Classification Of Cancer Pathology Reports: From Data Annotation To Model Deployment, Kevin De Angeli

Doctoral Dissertations

Information contained in electronic health records (EHR) combined with the latest advances in machine learning (ML) have the potential to revolutionize the medical sciences. In particular, information contained in cancer pathology reports is essential to investigate cancer trends across the country. Unfortunately, large parts of information in EHRs are stored in the form of unstructured, free-text which limit their usability and research potential. To overcome this accessibility barrier, cancer registries depend on expert personnel who read, interpret, and extract relevant information. Naturally, as the number of stored pathology reports increases every day, depending on human experts presents scalability challenges. Recently, …


Denoising And Deconvolving Sperm Whale Data In The Northern Gulf Of Mexico Using Fourier And Wavelet Techniques, Kendal McCain Leftwich 2022 University of New Orleans, New Orleans

Denoising And Deconvolving Sperm Whale Data In The Northern Gulf Of Mexico Using Fourier And Wavelet Techniques, Kendal Mccain Leftwich

University of New Orleans Theses and Dissertations

The use of underwater acoustics can be an important component in obtaining information from the oceans of the world. It is desirable (but difficult) to compile an acoustic catalog of sounds emitted by various underwater objects to complement optical catalogs. For example, the current visual catalog for whale tail flukes of large marine mammals (whales) can identify even individual whales from their individual fluke characteristics. However, since sperm whales, Physeter microcephalus, do not fluke up when they dive, they cannot be identified in this manner. A corresponding acoustic catalog for sperm whale clicks could be compiled to identify individual …


From Computer Curriculum That Works For The Use Of Computer Intellignece Computer Science, Malachi B. Bacchus 2022 CUNY New York City College of Technology

From Computer Curriculum That Works For The Use Of Computer Intellignece Computer Science, Malachi B. Bacchus

Publications and Research

Computer interconnection can link different networks by using electrical artificial flow ways that can travel through different connections. these are called data network which travels through different sectors of the network simulation of service computer network using artificial intelligence to enhanced further understanding the computations, I've also demonstrated knowing by using the network to get better understanding of how ethical computing can be learned through universities and collegiate that can help established knowledge and healthy computer information. The main tools for the research are using data networking, ethical learning and translation towards different computer systems.


The Role Of Generative Adversarial Networks In Bioimage Analysis And Computational Diagnostics., Ahmed Naglah 2022 University of Louisville

The Role Of Generative Adversarial Networks In Bioimage Analysis And Computational Diagnostics., Ahmed Naglah

Electronic Theses and Dissertations

Computational technologies can contribute to the modeling and simulation of the biological environments and activities towards achieving better interpretations, analysis, and understanding. With the emergence of digital pathology, we can observe an increasing demand for more innovative, effective, and efficient computational models. Under the umbrella of artificial intelligence, deep learning mimics the brain’s way in learn complex relationships through data and experiences. In the field of bioimage analysis, models usually comprise discriminative approaches such as classification and segmentation tasks. In this thesis, we study how we can use generative AI models to improve bioimage analysis tasks using Generative Adversarial Networks …


Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany 2022 Technological University Dublin

Identity Term Sampling For Measuring Gender Bias In Training Data, Nasim Sobhani, Sarah Jane Delany

Conference Papers

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure …


Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy 2022 University of Maine

Performance Enhancement Of Hyperspectral Semantic Segmentation Leveraging Ensemble Networks, Nicholas Soucy

Electronic Theses and Dissertations

Hyperspectral image (HSI) semantic segmentation is a growing field within computer vision, machine learning, and forestry. Due to the separate nature of these communities, research applying deep learning techniques to ground-type semantic segmentation needs improvement, along with working to bring the research and expectations of these three communities together. Semantic segmentation consists of classifying individual pixels within the image based on the features present. Many issues need to be resolved in HSI semantic segmentation including data preprocessing, feature reduction, semantic segmentation techniques, and adversarial training. In this thesis, we tackle these challenges by employing ensemble methods for HSI semantic segmentation. …


Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li 2022 CUNY New York City College of Technology

Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li

Publications and Research

Our goal is to establish an automatic model that identifies which tweets are about natural disasters based on the content of the tweets. Our method is to construct a decision tree based on keyword searching. We will construct the model using 7,645 tweets and test our model on 3,465 tweets as an assessment of the performance.


Extracellular Dnases Facilitate Antagonism And Coexistence In Bacterial Competitor-Sensing Interference Competition, Aoi Ogawa, Christophe Golé, Maria Bermudez, Odrine Habarugira, Gabrielle Joslin, Taylor McCain, Autumn Mineo, Jennifer Wise, Julie Xiong, Katherine Yan, Jan A.C. Vriezen 2022 Smith College

Extracellular Dnases Facilitate Antagonism And Coexistence In Bacterial Competitor-Sensing Interference Competition, Aoi Ogawa, Christophe Golé, Maria Bermudez, Odrine Habarugira, Gabrielle Joslin, Taylor Mccain, Autumn Mineo, Jennifer Wise, Julie Xiong, Katherine Yan, Jan A.C. Vriezen

Biological Sciences: Faculty Publications

Over the last 4 decades, the rate of discovery of novel antibiotics has decreased drastically, ending the era of fortuitous antibiotic discovery. A better understanding of the biology of bacteriogenic toxins potentially helps to prospect for new antibiotics. To initiate this line of research, we quantified antagonists from two different sites at two different depths of soil and found the relative number of antagonists to correlate with the bacterial load and carbon-to-nitrogen (C/N) ratio of the soil. Consecutive studies show the importance of antagonist interactions between soil isolates and the lack of a predicted role for nutrient availability and, therefore, …


Sentiment Analysis In Application To Behavior Prediction, Anna Singley 2022 University of Portland

Sentiment Analysis In Application To Behavior Prediction, Anna Singley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Ischaemic Hepatitis (Ih): Modeling Outcome Based On Ih Patients' Attributes, Madison Utterback, Christiana Beard 2022 Illinois State University

Ischaemic Hepatitis (Ih): Modeling Outcome Based On Ih Patients' Attributes, Madison Utterback, Christiana Beard

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman 2022 Kennesaw State University

A New Kind Of Data Science: The Need For Ethical Analytics, Jonathan Boardman

Published and Grey Literature from PhD Candidates

Ethics can no longer be regarded as an add-on in data science and analytics. This paper argues for the necessity of formalizing a new, practically-oriented sub-discipline of AI ethics by outlining the needs, highlighting shortcomings in current approaches, and providing a framework for ethical analytics, which is concerned with the study of the ethical issues surrounding the development, deployment, and/or dissemination of ML/AI systems and data science research, as well as the development of tools and procedures to mitigate ethical harms. While data science and machine learning are primarily concerned with data from start to finish, ethical analytics is concerned …


Polarimetric Radar And Vhf Lightning Observations In A Significantly Tornadic Supercell, Jacob Bruss 2022 Purdue University

Polarimetric Radar And Vhf Lightning Observations In A Significantly Tornadic Supercell, Jacob Bruss

The Journal of Purdue Undergraduate Research

No abstract provided.


Supporting The Protect Initiative, Josh Lefton, Jackson Murray, Ahmed Thabet, Sriram Baireddy, Prakash Shukla, Mridul Gupta, Reagan Becker, Julie Ertle, Tony Doan, Aerin Yang 2022 Purdue University

Supporting The Protect Initiative, Josh Lefton, Jackson Murray, Ahmed Thabet, Sriram Baireddy, Prakash Shukla, Mridul Gupta, Reagan Becker, Julie Ertle, Tony Doan, Aerin Yang

Purdue Journal of Service-Learning and International Engagement

Recently, medication dosage errors have received more political and media attention. Dosage errors are the most common medical errors, affecting about 1.5 million people annually.

Furthermore, U.S. poison-control centers reported more than 200,000 cases per year of medication errors. These cases result in medical costs of around $3.5 billion, and children under 6 years old constitute approximately 30% of these cases.

The PROTECT Initiative (Preventing Overdoses and Treatment Errors in Children Taskforce) was launched in 2008 as a collaborative effort between public health agencies and patient advocates to minimize dosage errors.

In alignment with the PROTECT Initiative effort, this project …


Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr. 2022 Manipal Institute of Technology

Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr.

Technical Collection

The fast growth in Internet-of-Things (IoT) based applications, has increased the number of end-devices communicating over the Internet. The end devices are made with fewer resources and are low battery-powered. These resource-constrained devices are exposed to various security and privacy concerns over publicly available Internet communication. Thus, it becomes essential to provide lightweight security solutions to safeguard data and user privacy. Elliptic Curve Cryptography (ECC) can be used to generate the digital signature and also encrypt the data. The method can be evaluated on a real-time testbed deployed using Raspberry Pi3 devices and every message transmitted is subjected to ECC. …


Getting Started Analyzing Data In Spss, Kristi Thompson 2022 Western University

Getting Started Analyzing Data In Spss, Kristi Thompson

Western Libraries Presentations

SPSS is a popular package for analyzing data. This session will discuss how to get started on a simple quantitative analysis project using SPSS. Topics covered will include getting summary statistics, creating and modifying variables, creating graphs, running simple analyses, and interpreting SPSS output.


Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal 2022 Roger Williams University

Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal

Arts & Sciences Faculty Publications

LSTM-SDM is a python-based integrated computational framework built on the top of Tensorflow/Keras and written in the Jupyter notebook. It provides several object-oriented functionalities for implementing single layer and multilayer LSTM models for sequential data modeling and time series forecasting. Multiple subroutines are blended to create a conducive user-friendly environment that facilitates data exploration and visualization, normalization and input preparation, hyperparameter tuning, performance evaluations, visualization of results, and statistical analysis. We utilized the LSTM-SDM framework in predicting the stock market index and observed impressive results. The framework can be generalized to solve several other real-world time series problems.


Recall Distortion In Neural Network Pruning And The Undecayed Pruning Algorithm, Aidan Good, Jiaqi Lin, Hannah Sieg, Mikey Ferguson, Xin Yu, Shandian Zhe, Jerzy Wieczorek, Thiago Serra 2022 Bucknell University

Recall Distortion In Neural Network Pruning And The Undecayed Pruning Algorithm, Aidan Good, Jiaqi Lin, Hannah Sieg, Mikey Ferguson, Xin Yu, Shandian Zhe, Jerzy Wieczorek, Thiago Serra

Faculty Conference Papers and Presentations

Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed …


'Flux+Mutability': A Conditional Generative Approach To One-Class Classification And Anomaly Detection, Cristiano Fanelli, James Giroux, Z. Papandreou 2022 William & Mary

'Flux+Mutability': A Conditional Generative Approach To One-Class Classification And Anomaly Detection, Cristiano Fanelli, James Giroux, Z. Papandreou

Arts & Sciences Articles

Anomaly Detection is becoming increasingly popular within the experimental physics community. At experiments such as the Large Hadron Collider, anomaly detection is growing in interest for finding new physics beyond the Standard Model. This paper details the implementation of a novel Machine Learning architecture, called Flux+Mutability, which combines cutting-edge conditional generative models with clustering algorithms. In the 'flux' stage we learn the distribution of a reference class. The 'mutability' stage at inference addresses if data significantly deviates from the reference class. We demonstrate the validity of our approach and its connection to multiple problems spanning from one-class classification to anomaly …


Unobtrusive Assessment Of Upper-Limb Motor Impairment Using Wearable Inertial Sensors, Brandon R. Oubre 2022 University of Massachusetts Amherst

Unobtrusive Assessment Of Upper-Limb Motor Impairment Using Wearable Inertial Sensors, Brandon R. Oubre

Doctoral Dissertations

Many neurological diseases cause motor impairments that limit autonomy and reduce health-related quality of life. Upper-limb motor impairments, in particular, significantly hamper the performance of essential activities of daily living, such as eating, bathing, and changing clothing. Assessment of impairment is necessary for tracking disease progression, measuring the efficacy of interventions, and informing clinical decision making. Impairment is currently assessed by trained clinicians using semi-quantitative rating scales that are limited by their reliance on subjective, visual assessments. Furthermore, existing scales are often burdensome to administer and do not capture patients' motor performance in home and community settings, resulting in a …


Digital Commons powered by bepress