Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Central Washington University

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 20 of 20

Full-Text Articles in Data Science

Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino Aug 2022

Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino

Library Scholarship

This study aims to analyze the needs of researchers in a regional comprehensive university for research data management services; discuss the options for developing a research data management program at the university; and then propose a phased three-year implementation plan for the university libraries. The method was to design a survey to collect information from researchers and assess and evaluate their needs for research data management services. The results show that researchers’ needs in a regional comprehensive university could be quite different from those of researchers in a research-intensive university. Also, the results verify the hypothesis that researchers in the …


Wilderness And The Geotag: Exploring The Claim That "Geotagging Ruins Nature" In The Alpine Lakes Wilderness, Wa, Mara Gans Jan 2022

Wilderness And The Geotag: Exploring The Claim That "Geotagging Ruins Nature" In The Alpine Lakes Wilderness, Wa, Mara Gans

All Master's Theses

This research explores the claim that “geotagging ruins nature” by quantifying and qualifying patterns in geotag use and visitors’ experiences in the Alpine Lakes Wilderness, in Washington, United States. Many have raised concerns that geotags increase recreational visitation to public lands, which subsequently contributes to negative resource impacts. Others, however, claim that geotagging has made the outdoors more accessible to less privileged communities and raise concerns that condemning geotags will perpetuate the exclusion of certain groups from outdoor recreation. This debate is studied within federally designated Wilderness, which is legally defined as “untrammeled by man,” a definition rooted in problematic …


Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido Jan 2022

Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido

All Master's Theses

This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm.

Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree …


Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie Oct 2021

Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie

Computer Science Faculty Scholarship

Data classification in streams where the underlying distribution changes over time is known to be difficult. This problem—known as concept drift detection—involves two aspects: (i) detecting the concept drift and (ii) adapting the classifier. Online training only considers the most recent samples; they form the so-called shifting window. Dynamic adaptation to concept drift is performed by varying the width of the window. Defining an online Support Vector Machine (SVM) classifier able to cope with concept drift by dynamically changing the window size and avoiding retraining from scratch is currently an open problem. We introduce the Adaptive Incremental–Decremental SVM (AIDSVM), a …


Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein Apr 2021

Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein

All Faculty Scholarship for the College of Education and Professional Studies

Aside from statistics courses, accessible data analytics skills are often excluded from traditional non-technical university programs. These are topics that are typically the domain of programs that focus on math, statistics and computer science. Yet the need for these skills in non-technical disciplines is changing. A rapid expansion of data-related processes in organizations of many types requires individuals who have at least a working knowledge of common analytic tools. This article briefly describes three categories of data analytics tools that can be useful for graduates in any discipline. The first category covers descriptive tools that allow students to learn what …


Full Interpretable Machine Learning Method With In-Line Coordinates, Hoang Phan Jan 2021

Full Interpretable Machine Learning Method With In-Line Coordinates, Hoang Phan

All Master's Theses

This thesis explores a new approach for machine learning classification task in 2-dimensional space (2-D ML) with In-line Coordinates. This is a full machine learning approach that does not require to deal with n-dimensional data in n-dimensional space. In-line coordinates method allows discovering n-D patterns in 2-D space without loss of n-D information using graph representation of n-D data in 2-D. Specifically, this thesis shows that it can be done with In-line Based Coordinates in different modifications, which are defined, including static and dynamic ones. Some classification and regression algorithms based on these In-line Coordinates were explored. Two successful cases …


Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle Jan 2021

Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle

All Master's Theses

Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and …


An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis Jan 2020

An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis

Undergraduate Honors Theses

There is a lack of research regarding how following a vegetarian or omnivores diet effects C-Reactive Protein (CRP) levels of people as seen through results from an analysis of data gathered from the National Health and Nutrition Examination Survey (NHANES). The level of CRP is a reflection of how much inflammation there is in one’s body and is a popular indicator of risk for heart disease. Thus, in this research, I use the NHANES data to look at the relationship of CRP levels of people who identified themselves as vegetarian or not, while also considering the general healthiness of each …


Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper Jan 2020

Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper

All Master's Theses

Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …


Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie Oct 2019

Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

The classification of stellar spectra is a fundamental task in stellar astrophysics. Stellar spectra from the Sloan Digital Sky Survey are applied to standard classification methods, k-nearest neighbors and random forest, to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify the stellar spectra into a complete Morgan Keenan classification (spectral and luminosity) using a single classifier. The motion of stars (radial velocity) causes machine-learning complications through the feature matrix when classifying stellar spectra. Due to the nature …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …


Using Data Analytics To Further Understand The Role That Boredom, Loneliness, Social Anxiety, Social Gratification, And Social Relationships (Brag) Play In A Driver’S Decision To Text, Nathan White, Yair Levy, Steven R. Terrell, Steve Bronsburg Jan 2016

Using Data Analytics To Further Understand The Role That Boredom, Loneliness, Social Anxiety, Social Gratification, And Social Relationships (Brag) Play In A Driver’S Decision To Text, Nathan White, Yair Levy, Steven R. Terrell, Steve Bronsburg

All Faculty Scholarship for the College of Education and Professional Studies

Texting while driving is a growing problem that current efforts have failed to curtail. This behavior has serious, and sometimes fatal, consequences, and the factors that cause a driver to text are not well understood. This study investigates the influence that boredom, social relationships, social anxiety, and social gratification (BRAG) have upon the texting driver. A survey instrument was used to collect data from 297 respondents at a mid-sized regional university in the Pacific Northwest of the United States. The data was evaluated with PLS-SEM, which indicated that social gratification plays a very significant role in a driver’s decision to …


Visualization Of Multidimensional Data With Collocated Paired Coordinates And General Line Coordinates, Boris Kovalerchuk Feb 2014

Visualization Of Multidimensional Data With Collocated Paired Coordinates And General Line Coordinates, Boris Kovalerchuk

All Faculty Scholarship for the College of the Sciences

Often multidimensional data are visualized by splitting n-D data to a set of low dimensional data. While it is useful it destroys integrity of n-D data, and leads to a shallow understanding complex n-D data. To mitigate this challenge a difficult perceptual task of assembling low-dimensional visualized pieces to the whole n-D vectors must be solved. Another way is a lossy dimension reduction by mapping n-D vectors to 2-D vectors (e.g., Principal Component Analysis). Such 2-D vectors carry only a part of information from n-D vectors, without a way to restore n-D vectors exactly from it. An alternative way for …


Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu Nov 2013

Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu

All Faculty Scholarship for the College of the Sciences

Background

We initiate in silico rigidity-theoretical studies of biological assemblies and small crystals for protein structures. The goal is to determine if, and how, the interactions among neighboring cells and subchains affect the flexibility of a molecule in its crystallized state. We use experimental X-ray crystallography data from the Protein Data Bank (PDB). The analysis relies on an effcient graph-based algorithm. Computational experiments were performed using new protein rigidity analysis tools available in the new release of our KINARI-Web server http://kinari.cs.umass.edu.

Results

We provide two types of results: on biological assemblies and on crystals. We found that when only isolated …


A Conservation And Rigidity Based Method For Detecting Critical Protein Residues, Bahar Akbal-Delibas, Filip Jagodzinski, Nurit Haspel Oct 2013

A Conservation And Rigidity Based Method For Detecting Critical Protein Residues, Bahar Akbal-Delibas, Filip Jagodzinski, Nurit Haspel

All Faculty Scholarship for the College of the Sciences

Background

Certain amino acids in proteins play a critical role in determining their structural stability and function. Examples include flexible regions such as hinges which allow domain motion, and highly conserved residues on functional interfaces which allow interactions with other proteins. Detecting these regions can aid in the analysis and simulation of protein rigidity and conformational changes, and helps characterizing protein binding and docking. We present an analysis of critical residues in proteins using a combination of two complementary techniques. One method performs in-silico mutations and analyzes the protein's rigidity to infer the role of a point substitution to Glycine …


Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff May 2011

Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff

All Faculty Scholarship for the College of the Sciences

As the availability of geospatial data increases, there is a growing need to match these datasets together. However, since these datasets often vary in their origins and spatial accuracy, they frequently do not correspond well to each other, which create multiple problems. To accurately align with imagery, analysts currently either: 1) manually move the vectors, 2) perform a labor-intensive spatial registration of vectors to imagery, 3) move imagery to vectors, or 4) redigitize the vectors from scratch and transfer the attributes. All of these are time consuming and labor-intensive operations. Automated matching and fusing vector datasets has been a subject …


Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie Sep 2010

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.

We have the following goals: i. To …


Relational Methodology For Data Mining And Knowledge Discovery, Engenii Vityaev, Boris Kovalerchuk Apr 2008

Relational Methodology For Data Mining And Knowledge Discovery, Engenii Vityaev, Boris Kovalerchuk

All Faculty Scholarship for the College of the Sciences

Knowledge discovery and data mining methods have been successful in many domains. However, their abilities to build or discover a domain theory remain unclear. This is largely due to the fact that many fundamental KDD&DM methodological questions are still unexplored such as (1) the nature of the information contained in input data relative to the domain theory, and (2) the nature of the knowledge that these methods discover. The goal of this paper is to clarify methodological questions of KDD&DM methods. This is done by using the concept of Relational Data Mining (RDM), representative measurement theory, an ontology of a …


Symbolic Methodology For Numeric Data Mining, Boris Kovalerchuk, Engenii Vityaev Apr 2008

Symbolic Methodology For Numeric Data Mining, Boris Kovalerchuk, Engenii Vityaev

All Faculty Scholarship for the College of the Sciences

Currently statistical and artificial neural network methods dominate in data mining applications. Alternative relational (symbolic) data mining methods have shown their effectiveness in robotics, drug design, and other areas. Neural networks and decision tree methods have serious limitations in capturing relations that may have a variety of forms. Learning systems based on symbolic first-order logic (FOL) representations capture relations naturally. The learned regularities are understandable directly in domain terms that help to build a domain theory. This paper describes relational data mining methodology and develops it further for numeric data such as financial and spatial data. This includes (1) comparing …


Detecting Patterns Of Fraudulent Behavior In Forensic Accounting, Boris Kovalerchuk, Evgenii Vityaev Sep 2003

Detecting Patterns Of Fraudulent Behavior In Forensic Accounting, Boris Kovalerchuk, Evgenii Vityaev

All Faculty Scholarship for the College of the Sciences

Often evidence from a single case does not reveal any suspicious patterns to aid investigations in forensic accounting and other forensic fields. In contrast, correlation of sets of evidence from several cases with suitable background knowledge may reveal suspicious patterns. Link Discovery (LD) has recently emerged as a promising new area for such tasks. Currently LD mostly relies on deterministic graphical techniques. Other relevant techniques are Bayesian probabilistic and causal networks. These techniques need further development to handle rare events. This paper combines first-order logic (FOL) and probabilistic semantic inference (PSI) to address this challenge. Previous research has shown this …