Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Computer Sciences

A High-Precision Machine Learning Algorithm To Classify Left And Right Outflow Tract Ventricular Tachycardia, Jianwei Zhang, Guohua Fu, Islam Abudayyeh, Magdi Yacoub, Anthony Chang, William Feaster, Louis Ehwerhemuepha, Hesham El-Askary, Xianfeng Du, Bin He, Mingjun Feng, Yibo Yu, Binhao Wang, Jing Liu, Hai Yao, Hulmin Chu, Cyril Rakovski Feb 2021

A High-Precision Machine Learning Algorithm To Classify Left And Right Outflow Tract Ventricular Tachycardia, Jianwei Zhang, Guohua Fu, Islam Abudayyeh, Magdi Yacoub, Anthony Chang, William Feaster, Louis Ehwerhemuepha, Hesham El-Askary, Xianfeng Du, Bin He, Mingjun Feng, Yibo Yu, Binhao Wang, Jing Liu, Hai Yao, Hulmin Chu, Cyril Rakovski

Mathematics, Physics, and Computer Science Faculty Articles and Research

Introduction: Multiple algorithms based on 12-lead ECG measurements have been proposed to identify the right ventricular outflow tract (RVOT) and left ventricular outflow tract (LVOT) locations from which ventricular tachycardia (VT) and frequent premature ventricular complex (PVC) originate. However, a clinical-grade machine learning algorithm that automatically analyzes characteristics of 12-lead ECGs and predicts RVOT or LVOT origins of VT and PVC is not currently available. The effective ablation sites of RVOT and LVOT, confirmed by a successful ablation procedure, provide evidence to create RVOT and LVOT labels for the machine learning model.

Methods: We randomly sampled training, validation, and testing …


A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi Aug 2020

A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online …


Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison May 2019

Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison

Computational and Data Sciences (MS) Theses

The understanding and treatment of challenging behaviors in individuals with Autism Spectrum Disorder is paramount to enabling the success of behavioral therapy; an essential step in this process being the labeling of challenging behaviors demonstrated in therapy sessions. These manifestations differ across individuals and within individuals over time and thus, the appropriate classification of a challenging behavior when considering purely qualitative factors can be unclear. In this thesis we seek to add quantitative depth to this otherwise qualitative task of challenging behavior classification. We do so through the application of natural language processing techniques to behavioral descriptions extracted from the …


Multiple-Attribute Entity Recommendation Based On Classification, Meina Song, Xuejun Zhao, Haihong E Jan 2019

Multiple-Attribute Entity Recommendation Based On Classification, Meina Song, Xuejun Zhao, Haihong E

Journal of System Simulation

Abstract: In the process of exploring entity recommendation, the entity containing diverse attributes has gained more and more attention. Most of the current researchers mainly select one attribute, and embody it in the related algorithms and their extensions even though the entity is combined with multiple attributes in entity recommendation. In this paper, on the basis of the classification method, we delve into physical properties of the recommended entities, divide entity’s attribute information network into multiple sub ones. In sub information network, bounded by the amount of attributes, the single attribute and even multiple attributes can be diverted into diverse …


Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin Nov 2018

Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin

Research Collection School Of Computing and Information Systems

Education institutions collect feedback from students upon course completion and analyse it to improve curriculum design, delivery methodology and students' learning experience. A large part of feedback comes in the form textual comments, which pose a challenge in quantifying and deriving insights. In this paper, we present a novel approach of the Latent Dirichlet Allocation (LDA) model to address this difficulty in handling textual student feedback. The analysis of quantitative part of student feedback provides generalratings and helps to identify aspects of the teaching that are successful and those that can improve. The reasons for the failure or success, however, …


Predict The Failure Of Hydraulic Pumps By Different Machine Learning Algorithms, Yifei Zhou, Monika Ivantysynova, Nathan Keller Aug 2018

Predict The Failure Of Hydraulic Pumps By Different Machine Learning Algorithms, Yifei Zhou, Monika Ivantysynova, Nathan Keller

The Summer Undergraduate Research Fellowship (SURF) Symposium

Pump failure is a general concerned problem in the hydraulic field. Once happening, it will cause a huge property loss and even the life loss. The common methods to prevent the occurrence of pump failure is by preventative maintenance and breakdown maintenance, however, both of them have significant drawbacks. This research focuses on the axial piston pump and provides a new solution by the prognostic of pump failure using the classification of machine learning. Different kinds of sensors (temperature, acceleration and etc.) were installed into a good condition pump and three different kinds of damaged pumps to measure 10 of …


Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi Feb 2018

Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, …


Automated Species Classification Methods For Passive Acoustic Monitoring Of Beaked Whales, John Lebien Dec 2017

Automated Species Classification Methods For Passive Acoustic Monitoring Of Beaked Whales, John Lebien

University of New Orleans Theses and Dissertations

The Littoral Acoustic Demonstration Center has collected passive acoustic monitoring data in the northern Gulf of Mexico since 2001. Recordings were made in 2007 near the Deepwater Horizon oil spill that provide a baseline for an extensive study of regional marine mammal populations in response to the disaster. Animal density estimates can be derived from detections of echolocation signals in the acoustic data. Beaked whales are of particular interest as they remain one of the least understood groups of marine mammals, and relatively few abundance estimates exist. Efficient methods for classifying detected echolocation transients are essential for mining long-term passive …


Process Models Discovery And Traces Classification: A Fuzzy-Bpmn Mining Approach., Kingsley Okoye Dr, Usman Naeem Dr, Syed Islam Dr, Abdel-Rahman H. Tawil Dr, Elyes Lamine Dr Dec 2017

Process Models Discovery And Traces Classification: A Fuzzy-Bpmn Mining Approach., Kingsley Okoye Dr, Usman Naeem Dr, Syed Islam Dr, Abdel-Rahman H. Tawil Dr, Elyes Lamine Dr

Journal of International Technology and Information Management

The discovery of useful or worthwhile process models must be performed with due regards to the transformation that needs to be achieved. The blend of the data representations (i.e data mining) and process modelling methods, often allied to the field of Process Mining (PM), has proven to be effective in the process analysis of the event logs readily available in many organisations information systems. Moreover, the Process Discovery has been lately seen as the most important and most visible intellectual challenge related to the process mining. The method involves automatic construction of process models from event logs about any domain …


On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim Nov 2016

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling …


Computerized Classification Of Surface Spikes In Three-Dimensional Electron Microscopic Reconstructions Of Viruses, Younes Benkarroum Sep 2016

Computerized Classification Of Surface Spikes In Three-Dimensional Electron Microscopic Reconstructions Of Viruses, Younes Benkarroum

Dissertations, Theses, and Capstone Projects

The purpose of this research is to develop computer techniques for improved three-dimensional (3D) reconstruction of viruses from electron microscopic images of them and for the subsequent improved classification of the surface spikes in the resulting reconstruction. The broader impact of such work is the following.

Influenza is an infectious disease caused by rapidly-changing viruses that appear seasonally in the human population. New strains of influenza viruses appear every year, with the potential to cause a serious global pandemic. Two kinds of spikes – hemagglutinin (HA) and neuraminidase (NA) – decorate the surface of the virus particles and these proteins …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jan 2015

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Zhongmei Yao

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin Mar 2014

Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in …


Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi Jul 2013

Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Multiple kernel learning (MKL) is a promising family of machine learning algorithms using multiple kernel functions for various challenging data mining tasks. Conventional MKL methods often formulate the problem as an optimization task of learning the optimal combinations of both kernels and classifiers, which usually results in some forms of challenging optimization tasks that are often difficult to be solved. Different from the existing MKL methods, in this paper, we investigate a boosting framework of MKL for classification tasks, i.e., we adopt boosting to solve a variant of MKL problem, which avoids solving the complicated optimization tasks. Specifically, we present …


Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi Dec 2012

Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Both cost-sensitive classification and online learning have been extensively studied in data mining and machine learning communities, respectively. However, very limited study addresses an important intersecting problem, that is, “Cost-Sensitive Online Classification". In this paper, we formally study this problem, and propose a new framework for Cost-Sensitive Online Classification by directly optimizing cost-sensitive measures using online gradient descent techniques. Specifically, we propose two novel cost-sensitive online classification algorithms, which are designed to directly optimize two well-known cost-sensitive measures: (i) maximization of weighted sum of sensitivity and specificity, and (ii) minimization of weighted misclassification cost. We analyze the theoretical bounds of …


Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin Aug 2012

Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin

Research Collection School Of Computing and Information Systems

Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. …


Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain Oct 2011

Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last …


A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan May 2009

A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for …


A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim Nov 2006

A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …


Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa Dec 2005

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa

Research Collection School Of Computing and Information Systems

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to …


Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang Aug 2005

Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang

Research Collection School Of Computing and Information Systems

Translation initiation sites (TISs) are important signals in cDNA sequences. Many research efforts have tried to predict TISs in cDNA sequences. In this paper, we propose to use mixture Gaussian models for TIS prediction. Using both local features and some features generated from global measures, the proposed method predicts TISs with a sensitivity of 98 percent and a specificity of 93.6 percent. Our method outperforms many other existing methods in sensitivity while keeping specificity high. We attribute the improvement in sensitivity to the nature of the global features and the mixture Gaussian models. © 2005 IEEE.


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jun 2005

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Computer Science Faculty Publications

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava Oct 2004

Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and …