Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (16)
- Databases and Information Systems (4)
- Earth Sciences (4)
- Engineering (4)
- Statistics and Probability (4)
-
- Computer Engineering (3)
- Electrical and Computer Engineering (3)
- Medicine and Health Sciences (3)
- Other Earth Sciences (3)
- Biostatistics (2)
- Geography (2)
- Social and Behavioral Sciences (2)
- Applied Mathematics (1)
- Biomedical Engineering and Bioengineering (1)
- Business (1)
- Categorical Data Analysis (1)
- Chemistry (1)
- Climate (1)
- Clinical Epidemiology (1)
- Community Health (1)
- Environmental Sciences (1)
- Geographic Information Sciences (1)
- Geomorphology (1)
- Human Ecology (1)
- Information Security (1)
- Management Information Systems (1)
- Mathematics (1)
- Mental and Social Health (1)
- Microarrays (1)
- Institution
-
- Singapore Management University (5)
- Selected Works (3)
- TÜBİTAK (3)
- Virginia Commonwealth University (3)
- SelectedWorks (2)
-
- University of Wisconsin Milwaukee (2)
- COBRA (1)
- Edith Cowan University (1)
- Rochester Institute of Technology (1)
- The Texas Medical Center Library (1)
- University of Arkansas, Fayetteville (1)
- University of Kentucky (1)
- University of South Carolina (1)
- University of Texas at El Paso (1)
- University of Texas at Tyler (1)
- University of Vermont (1)
- Utah State University (1)
- Publication
-
- Theses and Dissertations (6)
- Research Collection School Of Computing and Information Systems (5)
- Turkish Journal of Electrical Engineering and Computer Sciences (3)
- Joshua P Fan (2)
- Przemysław Kupidura (2)
-
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (1)
- Australian Information Security Management Conference (1)
- College of Engineering and Mathematical Sciences Faculty Publications (1)
- Computer Science Faculty Publications and Presentations (1)
- Dissertations & Theses (Open Access) (1)
- Graduate Theses and Dissertations (1)
- Open Access Theses & Dissertations (1)
- Professor Salim Bouzerdoum (1)
- Theses (1)
- Theses and Dissertations--Computer Science (1)
- UW Biostatistics Working Paper Series (1)
- Publication Type
- File Type
Articles 1 - 29 of 29
Full-Text Articles in Physical Sciences and Mathematics
Determining What Characteristics Constitute A Darknet, Symon Aked, Christopher Bolan, Murray Brand
Determining What Characteristics Constitute A Darknet, Symon Aked, Christopher Bolan, Murray Brand
Australian Information Security Management Conference
Privacy on the Internet has always been a concern, but monitoring of content by both private corporations and Government departments has pushed people to search for ways to communicate over the Internet in a more secure manner. This has given rise to the creations of Darknets, which are networks that operate “inside” the Internet, and allow anonymous participation via a de‐centralised, encrypted, peer‐to‐peer network topology. This research investigates some sources of known Internet content monitoring, and how they provided the template for the creation of a system to avoid such surveillance. It then highlights how communications on the Clearnet is …
Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass
Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass
Theses and Dissertations
Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …
Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad
Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad
Theses and Dissertations
One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …
Analysis Of Polarimetric Synthetic Aperture Radar And Passive Visible Light Polarimetric Imaging Data Fusion For Remote Sensing Applications, Sanjit Maitra
Theses
The recent launch of spaceborne (TerraSAR-X, RADARSAT-2, ALOS-PALSAR, RISAT) and airborne (SIRC, AIRSAR, UAVSAR, PISAR) polarimetric radar sensors, with capability of imaging through day and night in almost all weather conditions, has made polarimetric synthetic aperture radar (PolSAR) image interpretation and analysis an active area of research. PolSAR image classification is sensitive to object orientation and scattering properties. In recent years, significant work has been done in many areas including agriculture, forestry, oceanography, geology, terrain analysis. Visible light passive polarimetric imaging has also emerged as a powerful tool in remote sensing for enhanced information extraction. The intensity image provides information …
Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou
Theses and Dissertations
Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in …
Coupling Self-Organizing Maps With A Naïve Bayesian Classifier: Stream Classification Studies Using Multiple Assessment Data, Nikolaos Fytilis, Donna M. Rizzo
Coupling Self-Organizing Maps With A Naïve Bayesian Classifier: Stream Classification Studies Using Multiple Assessment Data, Nikolaos Fytilis, Donna M. Rizzo
College of Engineering and Mathematical Sciences Faculty Publications
Organizing or clustering data into natural groups is one of the most fundamental aspects of understanding and mining information. The recent explosion in sensor networks and data storage associated with hydrological monitoring has created a huge potential for automating data analysis and classification of large, high-dimensional data sets. In this work, we develop a new classification tool that couples a Naïve Bayesian classifier with a neural network clustering algorithm (i.e., Kohonen Self-Organizing Map (SOM)). The combined Bayesian-SOM algorithm reduces classification error by leveraging the Bayesian's ability to accommodate parameter uncertainty with the SOM's ability to reduce high-dimensional data to lower …
Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi
Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi
Research Collection School Of Computing and Information Systems
In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous …
An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu
An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu
Research Collection School Of Computing and Information Systems
In modern society, more and more people are suffering from some type of stress. Monitoring and timely detecting of stress level will be very valuable for the person to take counter measures. In this paper, we investigate the use of decision analytics methodologies to detect stress. We present a new feature selection method based on the principal component analysis (PCA), compare three feature selection methods, and evaluate five information fusion methods for stress detection. A driving stress data set created by the MIT Media lab is used to evaluate the relative performance of these methods. Our study show that the …
Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo
Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo
Research Collection School Of Computing and Information Systems
Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers …
Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya
Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya
Theses and Dissertations
In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic …
Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi
Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi
Research Collection School Of Computing and Information Systems
Multiple kernel learning (MKL) is a promising family of machine learning algorithms using multiple kernel functions for various challenging data mining tasks. Conventional MKL methods often formulate the problem as an optimization task of learning the optimal combinations of both kernels and classifiers, which usually results in some forms of challenging optimization tasks that are often difficult to be solved. Different from the existing MKL methods, in this paper, we investigate a boosting framework of MKL for classification tasks, i.e., we adopt boosting to solve a variant of MKL problem, which avoids solving the complicated optimization tasks. Specifically, we present …
Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack
Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack
Theses and Dissertations
The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) …
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Dissertations & Theses (Open Access)
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays …
Classification Of Satellite Time Series-Derived Land Surface Phenology Focused On The Northern Fertile Crescent, Brian Embree Bunker
Classification Of Satellite Time Series-Derived Land Surface Phenology Focused On The Northern Fertile Crescent, Brian Embree Bunker
Graduate Theses and Dissertations
Land surface phenology describes events in a seasonal vegetation cycle and can be used in a variety of applications from predicting onset of future drought conditions, to revealing potential limits of historical dry farming, to guiding more accurate dating of archeological sites. Traditional methods of monitoring vegetation phenology use data collected in situ. However, vegetation health indices derived from satellite remote sensor data, such as the normalized difference vegetation index (NDVI), have been used as proxy for vegetation phenology due to their repeated acquisition and broad area coverage. Land surface phenology is accessible in the NDVI satellite record when images …
Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas
Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Statistical classification is widely used in many areas where there is a need to make a data-driven decision, or to classify complicated cases or objects. For instance: disease diagnostics (is a patient sick or healthy, based on the blood test results?); weather forecasting (will there be a storm tomorrow, based on today's atmospheric pressure, air temperature, and wind velocity?); speech recognition (what was said over the phone, based on the caller's voice level and articulation); spam detection (can the unsolicited commercial e-mails be identified by their content?); and so on.
Classification trees …
The Net Reclassification Index (Nri): A Misleading Measure Of Prediction Improvement With Miscalibrated Or Overfit Models, Margaret Pepe, Jin Fang, Ziding Feng, Thomas Gerds, Jorgen Hilden
The Net Reclassification Index (Nri): A Misleading Measure Of Prediction Improvement With Miscalibrated Or Overfit Models, Margaret Pepe, Jin Fang, Ziding Feng, Thomas Gerds, Jorgen Hilden
UW Biostatistics Working Paper Series
The Net Reclassification Index (NRI) is a very popular measure for evaluating the improvement in prediction performance gained by adding a marker to a set of baseline predictors. However, the statistical properties of this novel measure have not been explored in depth. We demonstrate the alarming result that the NRI statistic calculated on a large test dataset using risk models derived from a training set is likely to be positive even when the new marker has no predictive information. A related theoretical example is provided in which a miscalibrated risk model that includes an uninformative marker is proven to erroneously …
Ontology Matching Techniques: A 3-Tier Classification Framework, Nelson Leung, Seung Kang, Sim Lau, Joshua Fan
Ontology Matching Techniques: A 3-Tier Classification Framework, Nelson Leung, Seung Kang, Sim Lau, Joshua Fan
Joshua P Fan
No abstract provided.
Ontology Matching Techniques: A 3-Tier Classification Framework, Nelson Leung, Seung Kang, S Lau, Joshua Fan
Ontology Matching Techniques: A 3-Tier Classification Framework, Nelson Leung, Seung Kang, S Lau, Joshua Fan
Joshua P Fan
No abstract provided.
Compressed Sensing-Based Frequency Selection For Classification Of Ground Penetrating Radar Signals, Wenbin Shao, Abdesselam Bouzerdoum, Son Lam Phung
Compressed Sensing-Based Frequency Selection For Classification Of Ground Penetrating Radar Signals, Wenbin Shao, Abdesselam Bouzerdoum, Son Lam Phung
Professor Salim Bouzerdoum
In this paper we present an automatic classification system for ground penetrating radar (GPR) signals. The system extracts the magnitude spectra at resonant frequencies and classifies them using support vector machines. To locate the resonant frequencies, we propose an approach based on compressed sensing and orthogonal matching pursuit. The performance of the system is evaluated by classifying GPR traces from different ballast fouling conditions. The experimental results show that the proposed approach, compared to the approach of using frequencies at local maxima, represents the GPR signal more efficiently using a small number of coefficients, and obtains higher classification accuracy. 2012 …
Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang
Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang
Research Collection School Of Computing and Information Systems
Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and …
Distinction Of Lakes And Rivers On Satellite Images Using Mathematical Morphology, Przemysław Kupidura
Distinction Of Lakes And Rivers On Satellite Images Using Mathematical Morphology, Przemysław Kupidura
Przemysław Kupidura
This paper concerns the application of mathematical morphology for object-oriented classification of satellite images. The example of distinguishing different bodies of water using the author-made algorithm will be presented. Different types of water bodies like lakes and rivers are easy to differentiate when visually interpreted. However, it is much more difficult to differentiate using a traditional, pixel-based classification process. Mathematical morphology operations, which take into account such important features of objects like shape and size, allow these two types of water bodies to be distinguished in object classification. The proposed algorithm allows one practically error-free classification. The results show, that …
Cyfrowe Przetwarzanie Zdjęć Satelitarnych, Przemysław Kupidura, Piotr Podlasiak
Cyfrowe Przetwarzanie Zdjęć Satelitarnych, Przemysław Kupidura, Piotr Podlasiak
Przemysław Kupidura
No abstract provided.
A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias
A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias
Open Access Theses & Dissertations
In pattern recognition and machine learning, a classification problem refers to finding an algorithm for assigning a given input data into one of several categories. Many natural signals are sparse or compressible in the sense that they have short representations when expressed in a suitable basis. Motivated by the recent successful development of algorithms for sparse signal recovery, we apply the selective nature of sparse representation to perform classification. Any test sample is represented in an overcomplete dictionary with the training sample as base elements. A given test sample can be expressed as a linear combination of only those training …
Knowledge Extraction From Survey Data Using Neural Networks, Khan Imran, Arun Kulkarni
Knowledge Extraction From Survey Data Using Neural Networks, Khan Imran, Arun Kulkarni
Computer Science Faculty Publications and Presentations
Surveys are an important tool for researchers. It is increasingly important to develop powerful means for analyzing such data and to extract knowledge that could help in decision-making. Survey attributes are typically discrete data measured on a Likert scale. The process of classification becomes complex if the number of survey attributes is large. Another major issue in Likert-Scale data is the uniqueness of tuples. A large number of unique tuples may result in a large number of patterns. The main focus of this paper is to propose an efficient knowledge extraction method that can extract knowledge in terms of rules. …
A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek
A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek
Turkish Journal of Electrical Engineering and Computer Sciences
Classification and rule induction are key topics in the fields of decision making and knowledge discovery. The objective of this study is to present a new algorithm developed for automatic knowledge acquisition in data mining. The proposed algorithm has been named RES-2 (Rule Extraction System). It aims at eliminating the pitfalls and disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. In this study, 15 datasets consisting of multiclass values with …
Detection Of Microcalcification Clusters In Digitized X-Ray Mammograms Using Unsharp Masking And Image Statistics, Peli̇n Kuş, İrfan Karagöz
Detection Of Microcalcification Clusters In Digitized X-Ray Mammograms Using Unsharp Masking And Image Statistics, Peli̇n Kuş, İrfan Karagöz
Turkish Journal of Electrical Engineering and Computer Sciences
A fully automated method for detecting microcalcification (MC) clusters in regions of interest (ROIs) extracted from digitized X-ray mammograms is proposed. In the first stage, an unsharp masking is used to perform the contrast enhancement of the MCs. In the second stage, the ROIs are decomposed into a 2-level contourlet representation and the reconstruction is obtained by eliminating the low-frequency subband in the second level. In the third stage, statistical textural features are extracted from the ROIs and they are classified using support vector machines. To test the performance of the method, 57 ROIs selected from the Mammographic Image Analysis …
Hybrid Of Genetic Algorithm And Great Deluge Algorithm For Rough Set Attribute Reduction, Najmeh Sadat Jaddi, Salwani Abdullah
Hybrid Of Genetic Algorithm And Great Deluge Algorithm For Rough Set Attribute Reduction, Najmeh Sadat Jaddi, Salwani Abdullah
Turkish Journal of Electrical Engineering and Computer Sciences
The attribute reduction problem is the process of reducing unimportant attributes from a decision system to decrease the difficulty of data mining or knowledge discovery tasks. Many algorithms have been used to optimize this problem in rough set theory. The genetic algorithm (GA) is one of the algorithms that has already been applied to optimize this problem. This paper proposes 2 kinds of memetic algorithms, which are a hybridization of the GA, with 2 versions (linear and nonlinear) of the great deluge (GD) algorithm. The purpose of this hybridization is to investigate the ability of this local search algorithm to …
Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa
Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa
Theses and Dissertations--Computer Science
Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern …
Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom
Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom
Theses and Dissertations
Multivariate optical computing (MOC) is an all-optical approach of predictive spectroscopy that utilizes multivariate calibration and spectral pattern recognition techniques while operating in a simple filter photometer instrument, removing the need for expensive instrumentation and post-processing of spectral data. This is accomplished with specially designed interference filters called multivariate optical elements (MOEs). MOC can provide analytical solutions for applications requiring low cost, rugged, and simple to operate instrumentation for use in remote and hazardous environments such as open ocean waters. These instrument specifications are central for developing a method for classifying phytoplankton in their natural environment. Phytoplankton are photosynthetic single …