Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2017

Clustering

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 23 of 23

Full-Text Articles in Physical Sciences and Mathematics

Pembagian Tingkat Kecanduan Game Online Menggunakan K-Means Clustering Serta Korelasinya Terhadap Prestasi Akademik, Yudi Prastyo Dec 2017

Pembagian Tingkat Kecanduan Game Online Menggunakan K-Means Clustering Serta Korelasinya Terhadap Prestasi Akademik, Yudi Prastyo

Elinvo (Electronics, Informatics, and Vocational Education)

Game online tidak hanya memberikan hiburan tetapi juga memberikan tantangan yang menarik untuk diselesaikan sehingga individu bermain game online tanpa memperhitungkan waktu demi mencapai kepuasan. Salah satu metode yang dapat digunakan untuk mengelompokkan tingkat kecanduan game online adalah metode K-Means Clustering. K-Means Clustering merupakan salah satu metode data clustering non hirarki yang berusaha mempartisi data yang ada ke dalam bentuk satu atau lebih cluster/kelompok.Penelitian ini mengambil data sample kuesioner dari mahasiswa di Universitas Ibn Khaldun Bogor dimana isian kuesioner akan diolah sebagai acuan pengelompokkan tingkat kecanduan game online.Hasil clusteringdigunakan untuk mengetahui hubungannya antara tingkat kecanduan game …


Automated Species Classification Methods For Passive Acoustic Monitoring Of Beaked Whales, John Lebien Dec 2017

Automated Species Classification Methods For Passive Acoustic Monitoring Of Beaked Whales, John Lebien

University of New Orleans Theses and Dissertations

The Littoral Acoustic Demonstration Center has collected passive acoustic monitoring data in the northern Gulf of Mexico since 2001. Recordings were made in 2007 near the Deepwater Horizon oil spill that provide a baseline for an extensive study of regional marine mammal populations in response to the disaster. Animal density estimates can be derived from detections of echolocation signals in the acoustic data. Beaked whales are of particular interest as they remain one of the least understood groups of marine mammals, and relatively few abundance estimates exist. Efficient methods for classifying detected echolocation transients are essential for mining long-term passive …


Evaluating Spatial Variability In Sediment And Phosphorus Concentration-Discharge Relationships Using Bayesian Inference And Self-Organizing Maps, Kristen L. Underwood, Donna M. Rizzo, Andrew W. Schroth, Mandar M. Dewoolkar Dec 2017

Evaluating Spatial Variability In Sediment And Phosphorus Concentration-Discharge Relationships Using Bayesian Inference And Self-Organizing Maps, Kristen L. Underwood, Donna M. Rizzo, Andrew W. Schroth, Mandar M. Dewoolkar

College of Engineering and Mathematical Sciences Faculty Publications

Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously …


A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou Dec 2017

A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm is designed to quickly identify intricate-shaped clusters with high dimensionality by finding high-density peaks in a non-iterative manner and using only one threshold parameter. However, DPC has certain limitations in processing low-density data points because it only takes the global data density distribution into account. As such, DPC may confine in forming low-density data clusters, or in other words, DPC may fail in detecting anomalies and borderline points. In this paper, we analyze the limitations of DPC and propose a novel density peak clustering algorithm to better handle low-density clustering tasks. Specifically, our algorithm …


Graph-Based Latent Embedding, Annotation And Representation Learning In Neural Networks For Semi-Supervised And Unsupervised Settings, Ismail Ozsel Kilinc Nov 2017

Graph-Based Latent Embedding, Annotation And Representation Learning In Neural Networks For Semi-Supervised And Unsupervised Settings, Ismail Ozsel Kilinc

USF Tampa Graduate Theses and Dissertations

Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called …


A Conceptual Framework For Analyzing Students' Feedback, Venky Shankararaman, Swapna Gottipati, Sandy Gan Oct 2017

A Conceptual Framework For Analyzing Students' Feedback, Venky Shankararaman, Swapna Gottipati, Sandy Gan

Research Collection School Of Computing and Information Systems

In academic institutions it is normal practice that at the end of each term,students are required to complete a questionnaire that is designed to gather students’perceptions of the instructor and their learning experience in the course. This questionnaire comprises of Likert-scale questions and qualitative questions.One of the important goals of this exercise is to enable the instructor and the senior management to examine the feedback and then enhance students’ learning experience. In most universities, including our own, a lot of attention is paid to the quantitative feedback, which is summarized and statistical comparisons are computed, analysed and presented. However, the …


Data Analysis Methods Using Persistence Diagrams, Andrew Marchese Aug 2017

Data Analysis Methods Using Persistence Diagrams, Andrew Marchese

Doctoral Dissertations

In recent years, persistent homology techniques have been used to study data and dynamical systems. Using these techniques, information about the shape and geometry of the data and systems leads to important information regarding the periodicity, bistability, and chaos of the underlying systems. In this thesis, we study all aspects of the application of persistent homology to data analysis. In particular, we introduce a new distance on the space of persistence diagrams, and show that it is useful in detecting changes in geometry and topology, which is essential for the supervised learning problem. Moreover, we introduce a clustering framework directly …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …


Enhancing Informative Frame Filtering By Water And Bubble Detection In Colonoscopy Videos, Ashok Dahal, Junghwan Oh, Wallapak Tavanapong, Johnny S. Wong, Piet C. De Groen Jun 2017

Enhancing Informative Frame Filtering By Water And Bubble Detection In Colonoscopy Videos, Ashok Dahal, Junghwan Oh, Wallapak Tavanapong, Johnny S. Wong, Piet C. De Groen

Johnny Wong

Colonoscopy has contributed to a marked decline in the number of colorectal cancer related deaths. However, recent data suggest that there is a significant (4-12%) miss-rate for the detection of even large polyps and cancers. To address this, we have been investigating an ‘automated feedback system’ which informs the endoscopist of possible sub-optimal inspection during colonoscopy. A fundamental step of this system is to distinguish non-informative frames from informative ones. Existing methods for this cannot classify water/bubble frames as non-informative even though they do not carry any useful visual information of the colon mucosa. In this paper, we propose a …


Unsupervised Machine Learning In Agent-Based Modeling, Luke D. Robinson May 2017

Unsupervised Machine Learning In Agent-Based Modeling, Luke D. Robinson

Celebration of Learning

Agent-based models (ABMs) are used by researchers in a variety of fields to model natural phenomena. In an ABM, a wide range of behaviors and outcomes can be observed based on the parameters of the model. In many cases, these behaviors can be categorized into discrete outcomes identifiable by human observers. Our goal was to use clustering algorithms to identify those outcomes from model output data. For this project, we used data from the NetLogo Wolf Sheep Predation model to explore and evaluate three clustering algorithms from Python's scikit-learn package. If this task can be completed reliably by a computer, …


Image Segmentation Using De-Textured Images, Yaswanth Kodavali May 2017

Image Segmentation Using De-Textured Images, Yaswanth Kodavali

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Image segmentation is one of the fundamental problems in computer vision. The outputs of segmentation are used to extract regions of interest and carry out identification or classification tasks. For these tasks to be reliable, segmentation has to be made more reliable. Although there are exceptionally well-built algorithms available today, they perform poorly in many instances by producing over-merged (combining many unrelated objects) or under-merged (one object appeared as many) results. This leads to far fewer or more segments than expected. Such problems primarily arise due to varying textures within a single object and/or common textures near borders of adjacent …


Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger Apr 2017

Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate …


Identifying Major Tasks From On-Line Reviews, Feras Al-Obeidat, Bruce Spencer Jan 2017

Identifying Major Tasks From On-Line Reviews, Feras Al-Obeidat, Bruce Spencer

All Works

© 2017 The Authors. Published by Elsevier B.V. Many e-commerce websites allow customers to provide reviews that reflect their experiences and opinions about the business's products or services. Such published reviews potentially benefit the business's reputation, improve both current and future customers' trust in the business, and accordingly improve the business. Negative reviews can inform the merchant of issues that, when addressed, also improve the business. However, when reviews reflect negative experiences and the merchant fails to respond, the business faces potential loss of reputation, trust, and damage. We present the Sentiminder system that identifies reviews with negative sentiment, organizes …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Xic Clustering By Baseyian Network, Kyle J. Handy Jan 2017

Xic Clustering By Baseyian Network, Kyle J. Handy

Graduate Student Theses, Dissertations, & Professional Papers

No abstract provided.


A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini Jan 2017

A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini

Turkish Journal of Electrical Engineering and Computer Sciences

Data clustering is one of the most popular techniques of information management, which is used in many applications of science and engineering such as machine learning, pattern reorganization, image processing, data mining, and web mining. Different algorithms have been suggested by researchers, where the evolutionary algorithms are the best in data clustering and especially in big datasets. It is illustrated that GSA-KM, which is a combination of the gravitational search algorithm (GSA) and K-means (KM), is superior over some other comparative evolutionary methods. One of the drawbacks of this approach is dependency on the initial seeds. In this paper, a …


An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain Jan 2017

An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain

Turkish Journal of Electrical Engineering and Computer Sciences

To provide a reliable and efficient service, load balancing plays an important role in wireless sensor networks (WSNs). There is a need to maximize the network lifetime for WSNs applications with periodic generation of data. Due to the relationship between energy consumption and network sensor node lifetime, energy consumption in a network should be minimized and balanced in order to increase network lifetime. Energy-efficient load-balancing techniques are needed to solve this problem. In this paper, a particle swarm optimization (PSO)-based energy-efficient load-balancing technique is proposed, in which the required number of routing paths and energy consumption of different nodes and …


Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi Jan 2017

Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi

Turkish Journal of Electrical Engineering and Computer Sciences

Phishing websites are fake ones that are developed by ill-intentioned people to imitate real and legal websites. Most of these types of web pages have high visual similarities to hustle the victims. The victims of phishing websites may give their bank accounts, passwords, credit card numbers, and other important information to the designers and owners of phishing websites. The increasing number of phishing websites has become a great challenge in e-business in general and in electronic banking specifically. In the present study, a novel framework based on model-based clustering is introduced to fight against phishing websites. First, a model is …


A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul Jan 2017

A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul

Turkish Journal of Electrical Engineering and Computer Sciences

The number and length of massive datasets have increased day by day and this yields more complex machine learning stages due to the high computational costs. To decrease the computational cost many methods were proposed in the literature such as data condensing, feature selection, and filtering. Although clustering methods are generally employed to divide samples into groups, another way of data condensing is by determining ideal exemplars (or prototypes), which can be used instead of the whole dataset. In this study, first the efficiency of traditional data condensing by clustering approach was confirmed according to obtained accuracies and condensing ratios …


Unsupervised Learning Of Allomorphs In Turkish, Burcu Can Jan 2017

Unsupervised Learning Of Allomorphs In Turkish, Burcu Can

Turkish Journal of Electrical Engineering and Computer Sciences

One morpheme may have several surface forms that correspond to allomorphs. In English, ed and $d$ are surface forms of the past tense morpheme, and $s$, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, tü and di are Turkish allomorphs (i.e. past tense morpheme), but …


An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang Jan 2017

An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang

Turkish Journal of Electrical Engineering and Computer Sciences

The cluster number and the initial clustering centers must be reasonably set before the analysis of clustering in most cases. Traditional clustering segmentation algorithms have many shortcomings, such as high reliance on the specially established initial clustering center, tendency to fall into the local maximum point, and poor performance with multithreshold values. To overcome these defects, an adaptive fuzzy C-means segmentation algorithm based on a histogram (AFCMH), which synthesizes both main peaks of the histogram and optimized Otsu criterion, is proposed. First, the main peaks of the histogram are chosen by operations like histogram smoothing, merging of adjacent peaks, and …


Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna Jan 2017

Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna

Browse all Theses and Dissertations

The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the …


Daily Traffic Flow Pattern Recognition By Spectral Clustering, Matthew Aven Jan 2017

Daily Traffic Flow Pattern Recognition By Spectral Clustering, Matthew Aven

CMC Senior Theses

This paper explores the potential applications of existing spectral clustering algorithms to real life problems through experiments on existing road traffic data. The analysis begins with an overview of previous unsupervised machine learning techniques and constructs an effective spectral clustering algorithm that demonstrates the analytical power of the method. The paper focuses on the spectral embedding method’s ability to project non-linearly separable, high dimensional data into a more manageable space that allows for accurate clustering. The key step in this method involves solving a normalized eigenvector problem in order to construct an optimal representation of the original data.

While this …