Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

Clustering

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 54

Full-Text Articles in Computer Engineering

Exploring Human Aging Proteins Based On Deep Autoencoders And K-Means Clustering, Sondos M. Hammad, Mohamed Talaat Saidahmed, Elsayed A. Sallam, Reda Elbasiony Mar 2024

Exploring Human Aging Proteins Based On Deep Autoencoders And K-Means Clustering, Sondos M. Hammad, Mohamed Talaat Saidahmed, Elsayed A. Sallam, Reda Elbasiony

Journal of Engineering Research

Aging significantly affects human health and the overall economy, yet understanding of the underlying molecular mechanisms remains limited. Among all human genes, almost three hundred and five have been linked to human aging. While certain subsets of these genes or specific aging-related genes have been extensively studied. There has been a lack of comprehensive examination encompassing the entire set of aging-related genes. Here, the main objective is to overcome understanding based on an innovative approach that combines the capabilities of deep learning. Particularly using One-Dimensional Deep AutoEncoder (1D-DAE). Followed by the K-means clustering technique as a means of unsupervised learning. …


Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch Jan 2023

Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch

Civil, Architectural and Environmental Engineering Faculty Research & Creative Works

This paper explores using Cluster Validity Indices Fuzzy Adaptative Resonance Theory (CVI Fuzzy ART) to cluster ground motion records (GMRs). Clustering the features extracted from a supervised network trained for predicting the structure damage results in less overfitting from the trained network. Using Cluster Validity Indices (CVIs) to evaluate the clustering gives feedback to how well the data is being classified, allowing further separation of the data. By using CVI Fuzzy ART in combination with features extracted from a trained Convolutional Neural Network (CNN), we were able to form additional clusters in the data. Within the primary clusters, accuracy was …


K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar Apr 2022

K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar

Masters Theses & Specialist Projects

Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.

We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different …


Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown Jan 2022

Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown

Browse all Theses and Dissertations

Hierarchical clustering is a class of algorithms commonly used in exploratory data analysis (EDA) and supervised learning. However, they suffer from some drawbacks, including the difficulty of interpreting the resulting dendrogram, arbitrariness in the choice of cut to obtain a flat clustering, and the lack of an obvious way of comparing individual clusters. In this dissertation, we develop the notion of a topological hierarchy on recursively-defined subsets of a metric space. We look to the field of topological data analysis (TDA) for the mathematical background to associate topological structures such as simplicial complexes and maps of covers to clusters in …


A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead Jun 2021

A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead

Mathematics, Physics, and Computer Science Faculty Articles and Research

In previous works, we have shown the efficacy of using Deep Belief Networks, paired with clustering, to identify distinct classes of objects within remotely sensed data via cluster analysis and qualitative analysis of the output data in comparison with reference data. In this paper, we quantitatively validate the methodology against datasets currently being generated and used within the remote sensing community, as well as show the capabilities and benefits of the data fusion methodologies used. The experiments run take the output of our unsupervised fusion and segmentation methodology and map them to various labeled datasets at different levels of global …


Can Generative Adversarial Networks Help Us Fight Financial Fraud?, Sean Mciver Jan 2021

Can Generative Adversarial Networks Help Us Fight Financial Fraud?, Sean Mciver

Dissertations

Transactional fraud datasets exhibit extreme class imbalance. Learners cannot make accurate generalizations without sufficient data. Researchers can account for imbalance at the data level, algorithmic level or both. This paper focuses on techniques at the data level. We evaluate the evidence of the optimal technique and potential enhancements. Global fraud losses totalled more than 80 % of the UK’s GDP in 2019. The improvement of preprocessing is inherently valuable in fighting these losses. Synthetic minority oversampling technique (SMOTE) and extensions of SMOTE are currently the most common preprocessing strategies. SMOTE oversamples the minority classes by randomly generating a point between …


Clustered Mobile Data Collection In Wsns: An Energy-Delay Trade-Of, İzzet Fati̇h Şentürk Jan 2021

Clustered Mobile Data Collection In Wsns: An Energy-Delay Trade-Of, İzzet Fati̇h Şentürk

Turkish Journal of Electrical Engineering and Computer Sciences

Wireless sensor networks enable monitoring remote areas with limited human intervention. However, the network connectivity between sensor nodes and the base station (BS) may not be always possible due to the limited transmission range of the nodes. In such a case, one or more mobile data collectors (MDCs) can be employed to visit nodes for data collection. If multiple MDCs are available, it is desirable to minimize the energy cost of mobility while distributing the cost among the MDCs in a fair manner. Despite availability of various clustering algorithms, there is no single fits all clustering solution when different requirements …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Slashing Quality Index Modeling And Simulation Based On Data Dispersion Clustering, Yuxian Zhang, Xiaoyi Qian, Dong Xiao, Jianhui Wang Aug 2020

Slashing Quality Index Modeling And Simulation Based On Data Dispersion Clustering, Yuxian Zhang, Xiaoyi Qian, Dong Xiao, Jianhui Wang

Journal of System Simulation

Abstract: For the sensitivity of noise and outliers data in the typical partitioning clustering algorithm, a clustering algorithm based on data dispersion was proposed. The data dispersion was defined and introduced to a non-Euclidean distance. The similarity metric was established, and the data clustering was realized. The optimal clustering number was obtained by the validity function based on improved partition coefficient. Then the proposed clustering algorithm was applied to quality index model in slashing process. A size add-on quality index model was built by radial basis function neural networks. The node number of hidden layer was determined and the center …


Key Technologies Of Precaution And Prediction Of Abnormal Spatial-Temporal Trajectory: A Review Of Recent Advances, Gongda Qiu, He Ming, Yang Jie, Yuting Cao, Jihong Sun Jun 2020

Key Technologies Of Precaution And Prediction Of Abnormal Spatial-Temporal Trajectory: A Review Of Recent Advances, Gongda Qiu, He Ming, Yang Jie, Yuting Cao, Jihong Sun

Journal of System Simulation

Abstract: The ex-post disposition of a major incident, which is expected to transform into prediction and precaution of abnormal behavior, is increasingly unable to meet the urgent needs of the society.Therapid development and popularization of sensor network and positioning technology lay the foundation for mining spatial-temporal trajectory data. With the key objective of prediction and precaution of abnormal trajectory based on big data mining, the future research directions and prospects on trajectory clustering and recognitionareanalyzed, discussed and elaboratedinthis paper.Temporal trajectory prediction applied in prediction and precaution of abnormal spatial-temporal trajectory is also presented, providing a reference for further research on …


An Efficient Storage-Optimizing Tick Data Clustering Model, Haleh Amintoosi, Masood Niazi Torshiz, Yahya Forghani, Sara Alinejad Jan 2020

An Efficient Storage-Optimizing Tick Data Clustering Model, Haleh Amintoosi, Masood Niazi Torshiz, Yahya Forghani, Sara Alinejad

Turkish Journal of Electrical Engineering and Computer Sciences

Tick data is a large volume of data, related to a phenomenon such as stock market or weather change, with data values changing rapidly over time. An important issue is to store tick data table in a way that it occupies minimum storage space while at the same time it can provide fast execution of queries. In this paper, a mathematical model is proposed to partition tick data tables into clusters with the aim of minimizing the required storage space. The genetic algorithm is then used to solve the mathematical model which is indeed a clustering model. The proposed method …


Bibsqlqc: Brown Infomax Boosted Sql Query Clustering Algorithm To Detectanti-Patterns In The Query Log, Vinothsaravanan Ramakrishnan, Palanisamy Chenniappan Jan 2020

Bibsqlqc: Brown Infomax Boosted Sql Query Clustering Algorithm To Detectanti-Patterns In The Query Log, Vinothsaravanan Ramakrishnan, Palanisamy Chenniappan

Turkish Journal of Electrical Engineering and Computer Sciences

Discovery of antipatterns from arbitrary SQL query log depends on the static code analysis used to enhance the quality and performance of software applications. The existence of antipatterns reduces the quality and leads to redundant SQL statements. SQL log includes a large load on the database and it is difficult for an analyst to extract large patterns in a minimal time. Existing techniques which discover antipatterns in SQL query face a lot of innumerable challenges to discover the normal sequences of queries within the log. In order to discover the antipatterns in the log, an efficient technique called Brown infomax …


Spatiotemporal Mode Analysis Of Urban Dockless Shared Bikes Based On Point Of Interests Clustering, Zhang Fang, Bin Chen, Yanghua Tang, Dong Jian, Chuan Ai, Xiaogang Qiu Dec 2019

Spatiotemporal Mode Analysis Of Urban Dockless Shared Bikes Based On Point Of Interests Clustering, Zhang Fang, Bin Chen, Yanghua Tang, Dong Jian, Chuan Ai, Xiaogang Qiu

Journal of System Simulation

Abstract: The city’s dockless shared bikes have developed rapidly, and its features of convenience, economy and efficiency have been widely welcomed. The digital footprint they generate reveals the movement of people in time and space within the city, which makes it possible to quantify the activities of people in the city using shared bikes. In this paper, based on the collected shared bikes data of Beijing, a clustering method based on the point of interests is proposed to divide the urban space, so as to construct a mobile network of urban shared bikes, and analysis the spatiotemporal mode of bike …


Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao May 2019

Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao

Research Collection School Of Computing and Information Systems

The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into …


Exploring Bigram Character Features For Arabic Text Clustering, Dia Eddin Abuzeina Jan 2019

Exploring Bigram Character Features For Arabic Text Clustering, Dia Eddin Abuzeina

Turkish Journal of Electrical Engineering and Computer Sciences

The vector space model (VSM) is an algebraic model that is widely used for data representation in text mining applications. However, the VSM poses a critical challenge, as it requires a high-dimensional feature space. Therefore, many feature selection techniques, such as employing roots or stems (i.e. words without infixes and prefixes, and/or suffixes) instead of using complete word forms, are proposed to tackle this space challenge problem. Recently, the literature shows that one more basic unit feature can be used to handle the textual features, which is the twoneighboring character form that we call microword. To evaluate this feature type, …


A New Model To Determine The Hierarchical Structure Of The Wireless Sensor Networks, Resmi̇ye Nasi̇boğlu, Zülküf Teki̇n Erten Jan 2019

A New Model To Determine The Hierarchical Structure Of The Wireless Sensor Networks, Resmi̇ye Nasi̇boğlu, Zülküf Teki̇n Erten

Turkish Journal of Electrical Engineering and Computer Sciences

Wireless sensor networks are one of the rising areas of scientific research. Common purpose of these investigations is usually constructing optimal structure of the network by prolonging its lifetime. In this study, a new model has been proposed to construct a hierarchical structure of wireless sensor networks. Methods used in the model to determine clusters and appropriate cluster heads are k-means clustering and fuzzy inference system (FIS), respectively. The weighted averaging based on levels (WABL) defuzzification method is used to calculate crisp outputs of the FIS. A new theorem for calculation of WABL values has been proved in order to …


Evaluating The Attributes Of Remote Sensing Image Pixels For Fast K-Means Clustering, Ali̇ Sağlam, Nurdan Baykan Jan 2019

Evaluating The Attributes Of Remote Sensing Image Pixels For Fast K-Means Clustering, Ali̇ Sağlam, Nurdan Baykan

Turkish Journal of Electrical Engineering and Computer Sciences

Clustering process is an important stage for many data mining applications. In this process, data elements are grouped according to their similarities. One of the most known clustering algorithms is the k-means algorithm. The algorithm initially requires the number of clusters as a parameter and runs iteratively. Many remote sensing image processing applications usually need the clustering stage like many image processing applications. Remote sensing images provide more information about the environments with the development of the multispectral sensor and laser technologies. In the dataset used in this paper, the infrared (IR) and the digital surface maps (DSM) are also …


Efficient Hierarchical Temporal Segmentation Method For Facial Expression Sequences, Jiali Bian, Xue Mei, Yu Xue, Liang Wu, Yao Ding Jan 2019

Efficient Hierarchical Temporal Segmentation Method For Facial Expression Sequences, Jiali Bian, Xue Mei, Yu Xue, Liang Wu, Yao Ding

Turkish Journal of Electrical Engineering and Computer Sciences

Temporal segmentation of facial expression sequences is important to understand and analyze human facial expressions. It is, however, challenging to deal with the complexity of facial muscle movements by finding a suitable metric to distinguish among different expressions and to deal with the uncontrolled environmental factors in the real world. This paper presents a two-step unsupervised segmentation method composed of rough segmentation and fine segmentation stages to compute the optimal segmentation positions in video sequences to facilitate the segmentation of different facial expressions. The proposed method performs localization of facial expression patches to aid in recognition and extraction of specific …


Scalable Clustering For Immune Repertoire Sequence Analysis, Prem Bhusal Jan 2019

Scalable Clustering For Immune Repertoire Sequence Analysis, Prem Bhusal

Browse all Theses and Dissertations

The development of the next-generation sequencing technology has enabled systems immunology researchers to conduct detailed immune repertoire analysis at the molecule level. Large sequence datasets (e.g., millions of sequences) are being collected to comprehensively understand how the immune system of a patient evolves over different stages of disease development. A recent study has shown that the hierarchical clustering (HC) algorithm gives the best results for B-cell clones analysis - an important type of immune repertoire sequencing (IR-Seq) analysis. However, due to the inherent complexity, the classical hierarchical clustering algorithm does not scale well to large sequence datasets. Surprisingly, no algorithms …


Clustering Method Based On Graph Data Model And Reliability Detection, Yanyun Cheng, Huisong Bian, Changsheng Bian Jun 2018

Clustering Method Based On Graph Data Model And Reliability Detection, Yanyun Cheng, Huisong Bian, Changsheng Bian

Journal of System Simulation

Abstract: For the data in feature space, traditional clustering algorithm can take clustering analysis directly. High-dimensional spatial data cannot achieve intuitive and effective graphical visualization of clustering results in 2D plane. Graph data can clearly reflect the similarity relationship between objects. According to the distance of the data objects, the feature space data are modeled as graph data by iteration. Cluster analysis based on modularity is carried out on the modeling graph data. The two-dimensional visualization of non-spherical-shape distribution data cluster and result is achieved. The concept of credibility of the clustering result is proposed, and a method is proposed, …


Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti Jan 2018

Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, we study the location optimization problem of remote antenna units (RAUs) in generalized distributed antenna systems (GDASs). We propose a composite vector quantization (CVQ) algorithm that consists of unsupervised and supervised terms for RAU location optimization. We show that the CVQ can be used i) to minimize an \textit{upper bound} to the cell-averaged SNR error for a desired/demanded location-specific SNR function, and ii) to maximize the cell-averaged \textit{effective} \textit{SNR}. The CVQ-DAS includes the standard VQ, and thus the well-known squared distance criterion (SDC) as a special case. Computer simulations confirm the findings and suggest that the proposed …


Machine Learning Techniques Implementation In Power Optimization, Data Processing, And Bio-Medical Applications, Khalid Khairullah Mezied Al-Jabery Jan 2018

Machine Learning Techniques Implementation In Power Optimization, Data Processing, And Bio-Medical Applications, Khalid Khairullah Mezied Al-Jabery

Doctoral Dissertations

"The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for …


Enhancing Informative Frame Filtering By Water And Bubble Detection In Colonoscopy Videos, Ashok Dahal, Junghwan Oh, Wallapak Tavanapong, Johnny S. Wong, Piet C. De Groen Jun 2017

Enhancing Informative Frame Filtering By Water And Bubble Detection In Colonoscopy Videos, Ashok Dahal, Junghwan Oh, Wallapak Tavanapong, Johnny S. Wong, Piet C. De Groen

Johnny Wong

Colonoscopy has contributed to a marked decline in the number of colorectal cancer related deaths. However, recent data suggest that there is a significant (4-12%) miss-rate for the detection of even large polyps and cancers. To address this, we have been investigating an ‘automated feedback system’ which informs the endoscopist of possible sub-optimal inspection during colonoscopy. A fundamental step of this system is to distinguish non-informative frames from informative ones. Existing methods for this cannot classify water/bubble frames as non-informative even though they do not carry any useful visual information of the colon mucosa. In this paper, we propose a …


Unsupervised Learning Of Allomorphs In Turkish, Burcu Can Jan 2017

Unsupervised Learning Of Allomorphs In Turkish, Burcu Can

Turkish Journal of Electrical Engineering and Computer Sciences

One morpheme may have several surface forms that correspond to allomorphs. In English, ed and $d$ are surface forms of the past tense morpheme, and $s$, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, tü and di are Turkish allomorphs (i.e. past tense morpheme), but …


An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang Jan 2017

An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang

Turkish Journal of Electrical Engineering and Computer Sciences

The cluster number and the initial clustering centers must be reasonably set before the analysis of clustering in most cases. Traditional clustering segmentation algorithms have many shortcomings, such as high reliance on the specially established initial clustering center, tendency to fall into the local maximum point, and poor performance with multithreshold values. To overcome these defects, an adaptive fuzzy C-means segmentation algorithm based on a histogram (AFCMH), which synthesizes both main peaks of the histogram and optimized Otsu criterion, is proposed. First, the main peaks of the histogram are chosen by operations like histogram smoothing, merging of adjacent peaks, and …


An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain Jan 2017

An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain

Turkish Journal of Electrical Engineering and Computer Sciences

To provide a reliable and efficient service, load balancing plays an important role in wireless sensor networks (WSNs). There is a need to maximize the network lifetime for WSNs applications with periodic generation of data. Due to the relationship between energy consumption and network sensor node lifetime, energy consumption in a network should be minimized and balanced in order to increase network lifetime. Energy-efficient load-balancing techniques are needed to solve this problem. In this paper, a particle swarm optimization (PSO)-based energy-efficient load-balancing technique is proposed, in which the required number of routing paths and energy consumption of different nodes and …


Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi Jan 2017

Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi

Turkish Journal of Electrical Engineering and Computer Sciences

Phishing websites are fake ones that are developed by ill-intentioned people to imitate real and legal websites. Most of these types of web pages have high visual similarities to hustle the victims. The victims of phishing websites may give their bank accounts, passwords, credit card numbers, and other important information to the designers and owners of phishing websites. The increasing number of phishing websites has become a great challenge in e-business in general and in electronic banking specifically. In the present study, a novel framework based on model-based clustering is introduced to fight against phishing websites. First, a model is …


A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini Jan 2017

A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini

Turkish Journal of Electrical Engineering and Computer Sciences

Data clustering is one of the most popular techniques of information management, which is used in many applications of science and engineering such as machine learning, pattern reorganization, image processing, data mining, and web mining. Different algorithms have been suggested by researchers, where the evolutionary algorithms are the best in data clustering and especially in big datasets. It is illustrated that GSA-KM, which is a combination of the gravitational search algorithm (GSA) and K-means (KM), is superior over some other comparative evolutionary methods. One of the drawbacks of this approach is dependency on the initial seeds. In this paper, a …


A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul Jan 2017

A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul

Turkish Journal of Electrical Engineering and Computer Sciences

The number and length of massive datasets have increased day by day and this yields more complex machine learning stages due to the high computational costs. To decrease the computational cost many methods were proposed in the literature such as data condensing, feature selection, and filtering. Although clustering methods are generally employed to divide samples into groups, another way of data condensing is by determining ideal exemplars (or prototypes), which can be used instead of the whole dataset. In this study, first the efficiency of traditional data condensing by clustering approach was confirmed according to obtained accuracies and condensing ratios …