Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (14)
- Social and Behavioral Sciences (5)
- Life Sciences (4)
- Artificial Intelligence and Robotics (2)
- Biology (2)
-
- Education (2)
- Geographic Information Sciences (2)
- Geography (2)
- Numerical Analysis and Scientific Computing (2)
- Other Computer Sciences (2)
- Anthropology (1)
- Arts and Humanities (1)
- Astrophysics and Astronomy (1)
- Bioinformatics (1)
- Biotechnology (1)
- Cognitive Science (1)
- Computational Engineering (1)
- Digital Humanities (1)
- Engineering (1)
- Environmental Sciences (1)
- Forensic Science and Technology (1)
- Forest Management (1)
- Forest Sciences (1)
- Graphics and Human Computer Interfaces (1)
- Higher Education (1)
- Human Ecology (1)
- Human Geography (1)
- Inequality and Stratification (1)
- Keyword
-
- Data mining (3)
- Classification (2)
- Data visualization (2)
- KDD (2)
- Machine Learning (2)
-
- Neural networks (2)
- Relational data mining (2)
- Alignment (1)
- Analytics (1)
- Astronomical Feature Selection (1)
- Asymmetric Unit (1)
- Automated feature extraction (1)
- Automatic Classification (1)
- Automatic TB screening (1)
- Automatic chest x-ray analysis (1)
- Axis reconfiguration (1)
- Big data (1)
- Boredom (1)
- Bovine Pancreatic Trypsin Inhibitor (1)
- C-Reactive Protein (1)
- Causal Network (1)
- Clustering (1)
- Collocated paired coordinates (1)
- Common data tools (1)
- Concept drift (1)
- Conflation (1)
- Conservation Analysis (1)
- Conservation Score (1)
- Coordinate Order Optimizer (1)
- Critical Residue (1)
- Publication Year
- Publication
- Publication Type
Articles 1 - 20 of 20
Full-Text Articles in Data Science
Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino
Developing Research Data Management Services In A Regional Comprehensive University: The Case Of Central Washington University, Ping Fu, Maurice Blackson, Maura Valentino
Library Scholarship
This study aims to analyze the needs of researchers in a regional comprehensive university for research data management services; discuss the options for developing a research data management program at the university; and then propose a phased three-year implementation plan for the university libraries. The method was to design a survey to collect information from researchers and assess and evaluate their needs for research data management services. The results show that researchers’ needs in a regional comprehensive university could be quite different from those of researchers in a research-intensive university. Also, the results verify the hypothesis that researchers in the …
Wilderness And The Geotag: Exploring The Claim That "Geotagging Ruins Nature" In The Alpine Lakes Wilderness, Wa, Mara Gans
All Master's Theses
This research explores the claim that “geotagging ruins nature” by quantifying and qualifying patterns in geotag use and visitors’ experiences in the Alpine Lakes Wilderness, in Washington, United States. Many have raised concerns that geotags increase recreational visitation to public lands, which subsequently contributes to negative resource impacts. Others, however, claim that geotagging has made the outdoors more accessible to less privileged communities and raise concerns that condemning geotags will perpetuate the exclusion of certain groups from outdoor recreation. This debate is studied within federally designated Wilderness, which is legally defined as “untrammeled by man,” a definition rooted in problematic …
Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido
Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido
All Master's Theses
This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm.
Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree …
Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie
Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie
Computer Science Faculty Scholarship
Data classification in streams where the underlying distribution changes over time is known to be difficult. This problem—known as concept drift detection—involves two aspects: (i) detecting the concept drift and (ii) adapting the classifier. Online training only considers the most recent samples; they form the so-called shifting window. Dynamic adaptation to concept drift is performed by varying the width of the window. Defining an online Support Vector Machine (SVM) classifier able to cope with concept drift by dynamically changing the window size and avoiding retraining from scratch is currently an open problem. We introduce the Adaptive Incremental–Decremental SVM (AIDSVM), a …
Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein
Integrating Common Data Analytics Tools Into Non-Technical Undergraduate Curricula, Kurt Kirstein
All Faculty Scholarship for the College of Education and Professional Studies
Aside from statistics courses, accessible data analytics skills are often excluded from traditional non-technical university programs. These are topics that are typically the domain of programs that focus on math, statistics and computer science. Yet the need for these skills in non-technical disciplines is changing. A rapid expansion of data-related processes in organizations of many types requires individuals who have at least a working knowledge of common analytic tools. This article briefly describes three categories of data analytics tools that can be useful for graduates in any discipline. The first category covers descriptive tools that allow students to learn what …
Full Interpretable Machine Learning Method With In-Line Coordinates, Hoang Phan
Full Interpretable Machine Learning Method With In-Line Coordinates, Hoang Phan
All Master's Theses
This thesis explores a new approach for machine learning classification task in 2-dimensional space (2-D ML) with In-line Coordinates. This is a full machine learning approach that does not require to deal with n-dimensional data in n-dimensional space. In-line coordinates method allows discovering n-D patterns in 2-D space without loss of n-D information using graph representation of n-D data in 2-D. Specifically, this thesis shows that it can be done with In-line Based Coordinates in different modifications, which are defined, including static and dynamic ones. Some classification and regression algorithms based on these In-line Coordinates were explored. Two successful cases …
Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle
Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle
All Master's Theses
Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and …
An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis
An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis
Undergraduate Honors Theses
There is a lack of research regarding how following a vegetarian or omnivores diet effects C-Reactive Protein (CRP) levels of people as seen through results from an analysis of data gathered from the National Health and Nutrition Examination Survey (NHANES). The level of CRP is a reflection of how much inflammation there is in one’s body and is a popular indicator of risk for heart disease. Thus, in this research, I use the NHANES data to look at the relationship of CRP levels of people who identified themselves as vegetarian or not, while also considering the general healthiness of each …
Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper
Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper
All Master's Theses
Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …
Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie
Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie
All Faculty Scholarship for the College of the Sciences
The classification of stellar spectra is a fundamental task in stellar astrophysics. Stellar spectra from the Sloan Digital Sky Survey are applied to standard classification methods, k-nearest neighbors and random forest, to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify the stellar spectra into a complete Morgan Keenan classification (spectral and luminosity) using a single classifier. The motion of stars (radial velocity) causes machine-learning complications through the feature matrix when classifying stellar spectra. Due to the nature …
Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets
Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets
Computer Science Faculty Scholarship
The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …
Using Data Analytics To Further Understand The Role That Boredom, Loneliness, Social Anxiety, Social Gratification, And Social Relationships (Brag) Play In A Driver’S Decision To Text, Nathan White, Yair Levy, Steven R. Terrell, Steve Bronsburg
Using Data Analytics To Further Understand The Role That Boredom, Loneliness, Social Anxiety, Social Gratification, And Social Relationships (Brag) Play In A Driver’S Decision To Text, Nathan White, Yair Levy, Steven R. Terrell, Steve Bronsburg
All Faculty Scholarship for the College of Education and Professional Studies
Texting while driving is a growing problem that current efforts have failed to curtail. This behavior has serious, and sometimes fatal, consequences, and the factors that cause a driver to text are not well understood. This study investigates the influence that boredom, social relationships, social anxiety, and social gratification (BRAG) have upon the texting driver. A survey instrument was used to collect data from 297 respondents at a mid-sized regional university in the Pacific Northwest of the United States. The data was evaluated with PLS-SEM, which indicated that social gratification plays a very significant role in a driver’s decision to …
Visualization Of Multidimensional Data With Collocated Paired Coordinates And General Line Coordinates, Boris Kovalerchuk
Visualization Of Multidimensional Data With Collocated Paired Coordinates And General Line Coordinates, Boris Kovalerchuk
All Faculty Scholarship for the College of the Sciences
Often multidimensional data are visualized by splitting n-D data to a set of low dimensional data. While it is useful it destroys integrity of n-D data, and leads to a shallow understanding complex n-D data. To mitigate this challenge a difficult perceptual task of assembling low-dimensional visualized pieces to the whole n-D vectors must be solved. Another way is a lossy dimension reduction by mapping n-D vectors to 2-D vectors (e.g., Principal Component Analysis). Such 2-D vectors carry only a part of information from n-D vectors, without a way to restore n-D vectors exactly from it. An alternative way for …
Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu
Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu
All Faculty Scholarship for the College of the Sciences
Background
We initiate in silico rigidity-theoretical studies of biological assemblies and small crystals for protein structures. The goal is to determine if, and how, the interactions among neighboring cells and subchains affect the flexibility of a molecule in its crystallized state. We use experimental X-ray crystallography data from the Protein Data Bank (PDB). The analysis relies on an effcient graph-based algorithm. Computational experiments were performed using new protein rigidity analysis tools available in the new release of our KINARI-Web server http://kinari.cs.umass.edu.
Results
We provide two types of results: on biological assemblies and on crystals. We found that when only isolated …
A Conservation And Rigidity Based Method For Detecting Critical Protein Residues, Bahar Akbal-Delibas, Filip Jagodzinski, Nurit Haspel
A Conservation And Rigidity Based Method For Detecting Critical Protein Residues, Bahar Akbal-Delibas, Filip Jagodzinski, Nurit Haspel
All Faculty Scholarship for the College of the Sciences
Background
Certain amino acids in proteins play a critical role in determining their structural stability and function. Examples include flexible regions such as hinges which allow domain motion, and highly conserved residues on functional interfaces which allow interactions with other proteins. Detecting these regions can aid in the analysis and simulation of protein rigidity and conformational changes, and helps characterizing protein binding and docking. We present an analysis of critical residues in proteins using a combination of two complementary techniques. One method performs in-silico mutations and analyzes the protein's rigidity to infer the role of a point substitution to Glycine …
Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff
Mapsnap System To Perform Vector-To-Raster Fusion, Boris Kovalerchuk, Peter Doucette, Gamal Seedahmed, Jerry Tagestad, Sergei Kovalerchuk, Brian Graff
All Faculty Scholarship for the College of the Sciences
As the availability of geospatial data increases, there is a growing need to match these datasets together. However, since these datasets often vary in their origins and spatial accuracy, they frequently do not correspond well to each other, which create multiple problems. To accurately align with imagery, analysts currently either: 1) manually move the vectors, 2) perform a labor-intensive spatial registration of vectors to imagery, 3) move imagery to vectors, or 4) redigitize the vectors from scratch and transfer the attributes. All of these are time consuming and labor-intensive operations. Automated matching and fusing vector datasets has been a subject …
Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie
Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie
All Faculty Scholarship for the College of the Sciences
Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.
We have the following goals: i. To …
Relational Methodology For Data Mining And Knowledge Discovery, Engenii Vityaev, Boris Kovalerchuk
Relational Methodology For Data Mining And Knowledge Discovery, Engenii Vityaev, Boris Kovalerchuk
All Faculty Scholarship for the College of the Sciences
Knowledge discovery and data mining methods have been successful in many domains. However, their abilities to build or discover a domain theory remain unclear. This is largely due to the fact that many fundamental KDD&DM methodological questions are still unexplored such as (1) the nature of the information contained in input data relative to the domain theory, and (2) the nature of the knowledge that these methods discover. The goal of this paper is to clarify methodological questions of KDD&DM methods. This is done by using the concept of Relational Data Mining (RDM), representative measurement theory, an ontology of a …
Symbolic Methodology For Numeric Data Mining, Boris Kovalerchuk, Engenii Vityaev
Symbolic Methodology For Numeric Data Mining, Boris Kovalerchuk, Engenii Vityaev
All Faculty Scholarship for the College of the Sciences
Currently statistical and artificial neural network methods dominate in data mining applications. Alternative relational (symbolic) data mining methods have shown their effectiveness in robotics, drug design, and other areas. Neural networks and decision tree methods have serious limitations in capturing relations that may have a variety of forms. Learning systems based on symbolic first-order logic (FOL) representations capture relations naturally. The learned regularities are understandable directly in domain terms that help to build a domain theory. This paper describes relational data mining methodology and develops it further for numeric data such as financial and spatial data. This includes (1) comparing …
Detecting Patterns Of Fraudulent Behavior In Forensic Accounting, Boris Kovalerchuk, Evgenii Vityaev
Detecting Patterns Of Fraudulent Behavior In Forensic Accounting, Boris Kovalerchuk, Evgenii Vityaev
All Faculty Scholarship for the College of the Sciences
Often evidence from a single case does not reveal any suspicious patterns to aid investigations in forensic accounting and other forensic fields. In contrast, correlation of sets of evidence from several cases with suitable background knowledge may reveal suspicious patterns. Link Discovery (LD) has recently emerged as a promising new area for such tasks. Currently LD mostly relies on deterministic graphical techniques. Other relevant techniques are Bayesian probabilistic and causal networks. These techniques need further development to handle rare events. This paper combines first-order logic (FOL) and probabilistic semantic inference (PSI) to address this challenge. Previous research has shown this …