Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Theses/Dissertations

Classification

Institution
Publication Year
Publication
File Type

Articles 91 - 120 of 122

Full-Text Articles in Physical Sciences and Mathematics

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith Apr 2015

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith

Theses and Dissertations

As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of …


Feature Selection And Classification Methods For Decision Making: A Comparative Analysis, Osiris Villacampa Jan 2015

Feature Selection And Classification Methods For Decision Making: A Comparative Analysis, Osiris Villacampa

CCE Theses and Dissertations

The use of data mining methods in corporate decision making has been increasing in the past decades. Its popularity can be attributed to better utilizing data mining algorithms, increased performance in computers, and results which can be measured and applied for decision making. The effective use of data mining methods to analyze various types of data has shown great advantages in various application domains. While some data sets need little preparation to be mined, whereas others, in particular high-dimensional data sets, need to be preprocessed in order to be mined due to the complexity and inefficiency in mining high dimensional …


Novel Classification Of Slow Movement Objects In Urban Traffic Environments Using Wideband Pulse Doppler Radar, Berta Rodriguez Hervas Jan 2015

Novel Classification Of Slow Movement Objects In Urban Traffic Environments Using Wideband Pulse Doppler Radar, Berta Rodriguez Hervas

Open Access Theses & Dissertations

Every year thousands of people are involved in traffic accidents, some of which are fatal. An important percentage of these fatalities are caused by human error, which could be prevented by increasing the awareness of drivers and the autonomy of vehicles. Since driver assistance systems have the potential to positively impact tens of millions of people, the purpose of this research is to study the micro-Doppler characteristics of vulnerable urban traffic components, i.e. pedestrians and bicyclists, based on information obtained from radar backscatter, and to develop a classification technique that allows automatic target recognition with a vehicle integrated system. For …


Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani Jan 2015

Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani

Browse all Theses and Dissertations

Regression and classification techniques play an essential role in many data mining tasks and have broad applications. However, most of the state-of-the-art regression and classification techniques are often unable to adequately model the interactions among predictor variables in highly heterogeneous datasets. New techniques that can effectively model such complex and heterogeneous structures are needed to significantly improve prediction accuracy. In this dissertation, we propose a novel type of accurate and interpretable regression and classification models, named as Pattern Aided Regression (PXR) and Pattern Aided Classification (PXC) respectively. Both PXR and PXC rely on identifying regions in the data space where …


Intelligent Network Intrusion Detection Using An Evolutionary Computation Approach, Samaneh Rastegari Jan 2015

Intelligent Network Intrusion Detection Using An Evolutionary Computation Approach, Samaneh Rastegari

Theses: Doctorates and Masters

With the enormous growth of users' reliance on the Internet, the need for secure and reliable computer networks also increases. Availability of effective automatic tools for carrying out different types of network attacks raises the need for effective intrusion detection systems.

Generally, a comprehensive defence mechanism consists of three phases, namely, preparation, detection and reaction. In the preparation phase, network administrators aim to find and fix security vulnerabilities (e.g., insecure protocol and vulnerable computer systems or firewalls), that can be exploited to launch attacks. Although the preparation phase increases the level of security in a network, this will never completely …


Gender And Ethnicity Classification Using Partial Face In Biometric Applications, Jamie Lyle Dec 2014

Gender And Ethnicity Classification Using Partial Face In Biometric Applications, Jamie Lyle

All Dissertations

As the number of biometric applications increases, the use of non-ideal information such as images which are not strictly controlled, images taken covertly, or images where the main interest is partially occluded, also increases. Face images are a specific example of this. In these non-ideal instances, other information, such as gender and ethnicity, can be determined to narrow the search space and/or improve the recognition results. Some research exists for gender classification using partial-face images, but there is little research involving ethnic classifications on such images. Few datasets have had the ethnic diversity needed and sufficient subjects for each ethnicity …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Sketchart: A Pen-Based Tool For Chart Generation And Interaction., Andres Vargas Gonzalez Jan 2014

Sketchart: A Pen-Based Tool For Chart Generation And Interaction., Andres Vargas Gonzalez

Electronic Theses and Dissertations

It has been shown that representing data with the right visualization increases the understanding of qualitative and quantitative information encoded in documents. However, current tools for generating such visualizations involve the use of traditional WIMP techniques, which perhaps makes free interaction and direct manipulation of the content harder. In this thesis, we present a pen-based prototype for data visualization using 10 different types of bar based charts. The prototype lets users sketch a chart and interact with the information once the drawing is identified. The prototype's user interface consists of an area to sketch and touch based elements that will …


Automated Classification Of Malignant Melanoma Based On Detection Of Atypical Pigment Network In Dermoscopy Images Of Skin Lesions, Nabin K. Mishra Jan 2014

Automated Classification Of Malignant Melanoma Based On Detection Of Atypical Pigment Network In Dermoscopy Images Of Skin Lesions, Nabin K. Mishra

Doctoral Dissertations

“Melanoma causes more deaths than any other form of skin cancer. Early melanoma detection is important to prevent progression to a more deadly stage. Automated computer-based identification of melanoma from dermoscopic images of skin lesions is the most efficient method in early diagnosis. An automated melanoma identification system must include multiple steps, involving lesion segmentation, feature extraction, feature combination and classification. In this research, a classifier-based approach for automatically selecting a lesion border mask for segmentation of dermoscopic skin lesion images is presented. A logistic regression based model selects a single lesion border mask from multiple border masks generated by …


Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass Dec 2013

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass

Theses and Dissertations

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya Jul 2013

Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya

Theses and Dissertations

In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic …


Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack May 2013

Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack

Theses and Dissertations

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) …


A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias Jan 2013

A Convex Optimization Algorithm For Sparse Representation And Applications In Classification Problems, Reinaldo Sanchez Arias

Open Access Theses & Dissertations

In pattern recognition and machine learning, a classification problem refers to finding an algorithm for assigning a given input data into one of several categories. Many natural signals are sparse or compressible in the sense that they have short representations when expressed in a suitable basis. Motivated by the recent successful development of algorithms for sparse signal recovery, we apply the selective nature of sparse representation to perform classification. Any test sample is represented in an overcomplete dictionary with the training sample as base elements. A given test sample can be expressed as a linear combination of only those training …


Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa Jan 2013

Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa

Theses and Dissertations--Computer Science

Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern …


Fast And Efficient Classification, Tracking, And Simulation In Wireless Sensor Networks, Hao Jiang Aug 2012

Fast And Efficient Classification, Tracking, And Simulation In Wireless Sensor Networks, Hao Jiang

All Dissertations

Wireless sensor networks are composed of large numbers of resource-lean sensors that collect low-level inputs from the physical world. The applications present challenges for programmers. On the one hand, lightweight algorithms are required given the limited capacity of the constituent devices. On the other, the algorithms must be scalable to accommodate large networks. In this thesis, we focus on the design and implementation of fast and lean (yet scalable) algorithms for classification, simulation, and target tracking in the context of wireless sensor networks. We briefly consider each of these challenges in turn.
The first challenge is to achieve high precision …


Composite Feature-Based Face Detection Using Skin Color Modeling And Svm Classification, Swathi Rajashekar May 2012

Composite Feature-Based Face Detection Using Skin Color Modeling And Svm Classification, Swathi Rajashekar

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This report proposes a face detection algorithm based on skin color modeling and support vector machine (SVM) classification. Said classification is based on various face features used to detect specific faces in an input color image. A YCbCr color space is used to filter the skin color pixels from the input color image. Template matching is used on the result with various window sizes of the template created from an ORL face database. The candidates obtained above, are then classified by SVM classifiers using the histogram of oriented gradients, eigen features, edge ratio, and edge statistics features.


Recognizing Patterns In Transmitted Signals For Identification Purposes, Baha' A. Alsaify May 2012

Recognizing Patterns In Transmitted Signals For Identification Purposes, Baha' A. Alsaify

Graduate Theses and Dissertations

The ability to identify and authenticate entities in cyberspace such as users, computers, cell phones, smart cards, and radio frequency identification (RFID) tags is usually accomplished by having the entity demonstrate knowledge of a secret key. When the entity is portable and physically accessible, like an RFID tag, it can be difficult to secure given the memory, processing, and economic constraints. This work proposes to use unique patterns in the transmitted signals caused by manufacturing differences to identify and authenticate a wireless device such as an RFID tag. Both manufacturer identification and tag identification are performed on a population of …


Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati Apr 2012

Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati

Theses and Dissertations

Classification is one-out-of several applications in the neural network (NN) world. Multilayer perceptron (MLP) is the common neural network architecture which is used for classification tasks. It is famous for its error back propagation (EBP) algorithm, which opened the new way for solving classification problems given a set of empirical data. In the thesis, we performed experiments by using three different NN structures in order to find the best MLP neural network structure for performing the nonlinear classification of multiclass data sets. A developed learning algorithm used here is the batch EBP algorithm which uses all the data as a …


Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman Apr 2012

Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman

Theses and Dissertations

The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller …


Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han Jan 2012

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han

Theses and Dissertations--Computer Science

This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions.

It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important …


Using Semantic Templates To Study Vulnerabilities Recorded In Large Software Repositories, Yan Wu Oct 2011

Using Semantic Templates To Study Vulnerabilities Recorded In Large Software Repositories, Yan Wu

Student Work

Software vulnerabilities allow an attacker to reduce a system's Confidentiality, Availability, and Integrity by exposing information, executing malicious code, and undermine system functionalities that contribute to the overall system purpose and need. With new vulnerabilities discovered everyday in a variety of applications and user environments, a systematic study of their characteristics is a subject of immediate need for the following reasons:

  • The high rate in which information about past and new vulnerabilities are accumulated makes it difficult to absorb and comprehend.
  • Rather than learning from past mistakes, similar types of vulnerabilities are observed repeatedly.
  • As the scale and complexity of …


Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul Apr 2011

Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul

Theses and Dissertations

Over the last century, physiological signals have been broadly analyzed and processed not only to assess the function of the human physiology, but also to better diagnose illnesses or injuries and provide treatment options for patients. In particular, Electrocardiogram (ECG), blood pressure (BP) and impedance are among the most important biomedical signals processed and analyzed. The majority of studies that utilize these signals attempt to diagnose important irregularities such as arrhythmia or blood loss by processing one of these signals. However, the relationship between them is not yet fully studied using computational methods. Therefore, a system that extract and combine …


Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea Jan 2011

Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea

Open Access Theses & Dissertations

The main contribution of this dissertation is the development of a method to train a Support Vector Regression (SVR) model for the large-scale case where the number of training samples supersedes the computational resources. The proposed scheme consists of posing the SVR problem entirely as a Linear Programming (LP) problem and on the development of a sequential optimization method based on variables decomposition, constraints decomposition, and the use of primal-dual interior point methods. Experimental results demonstrate that the proposed approach has comparable performance with other SV-based classifiers. Particularly, experiments demonstrate that as the problem size increases, the sparser the solution …


Evolutionary Strategies For Data Mining, Rose Lowe Dec 2010

Evolutionary Strategies For Data Mining, Rose Lowe

All Dissertations

Learning classifier systems (LCS) have been successful in generating rules for solving classification problems in data mining. The rules are of the form IF condition THEN action. The condition encodes the features of the input space and the action encodes the class label. What is lacking in those systems is the ability to express each feature using a function that is appropriate for that feature. The genetic algorithm is capable of doing this but cannot because only one type of membership function
is provided. Thus, the genetic algorithm learns only the shape and placement of the membership function, and in …


The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham Jan 2010

The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham

LSU Doctoral Dissertations

Current classification approaches usually do not try to achieve a balance between fitting and generalization when they infer models from training data. Such approaches ignore the possibility of different penalty costs for the false-positive, false-negative, and unclassifiable types. Thus, their performances may not be optimal or may even be coincidental. This dissertation analyzes the above issues in depth. It also proposes two new approaches called the Homogeneity-Based Algorithm (HBA) and the Convexity-Based Algorithm (CBA) to address these issues. These new approaches aim at optimally balancing the data fitting and generalization behaviors of models when some traditional classification approaches are used. …


Integrating Information Theory Measures And A Novel Rule-Set-Reduction Tech-Nique To Improve Fuzzy Decision Tree Induction Algorithms, Nael Mohammed Abu-Halaweh Dec 2009

Integrating Information Theory Measures And A Novel Rule-Set-Reduction Tech-Nique To Improve Fuzzy Decision Tree Induction Algorithms, Nael Mohammed Abu-Halaweh

Computer Science Dissertations

Machine learning approaches have been successfully applied to many classification and prediction problems. One of the most popular machine learning approaches is decision trees. A main advantage of decision trees is the clarity of the decision model they produce. The ID3 algorithm proposed by Quinlan forms the basis for many of the decision trees’ application. Trees produced by ID3 are sensitive to small perturbations in training data. To overcome this problem and to handle data uncertainties and spurious precision in data, fuzzy ID3 integrated fuzzy set theory and ideas from fuzzy logic with ID3. Several fuzzy decision trees algorithms and …


Improving Neural Network Classification Training, Michael Edwin Rimer Sep 2007

Improving Neural Network Classification Training, Michael Edwin Rimer

Theses and Dissertations

The following work presents a new set of general methods for improving neural network accuracy on classification tasks, grouped under the label of classification-based methods. The central theme of these approaches is to provide problem representations and error functions that more directly improve classification accuracy than conventional learning and error functions. The CB1 algorithm attempts to maximize classification accuracy by selectively backpropagating error only on misclassified training patterns. CB2 incorporates a sliding error threshold to the CB1 algorithm, interpolating between the behavior of CB1 and standard error backpropagation as training progresses in order to avoid prematurely saturated network weights. CB3 …


Data Mining And Analysis On Multiple Time Series Object Data, Chunyu Jiang Jan 2007

Data Mining And Analysis On Multiple Time Series Object Data, Chunyu Jiang

Browse all Theses and Dissertations

Huge amount of data is available in our society and the need for turning such data into useful information and knowledge is urgent. Data mining is an important field addressing that need and significant progress has been achieved in the last decade. In several important application areas, data arises in the format of Multiple Time Series Object (MTSO) data, where each data object is an array of time series over a large set of features and each has an associated class or state. Very little research has been conducted towards this kind of data. Examples include computational toxicology, where each …


Toward A Heuristic Model For Evaluating The Complexity Of Computer Security Visualization Interface, Hsiu-Chung Wang Dec 2006

Toward A Heuristic Model For Evaluating The Complexity Of Computer Security Visualization Interface, Hsiu-Chung Wang

Computer Science Theses

Computer security visualization has gained much attention in the research community in the past few years. However, the advancement in security visualization research has been hampered by the lack of standardization in visualization design, centralized datasets, and evaluation methods. We propose a new heuristic model for evaluating the complexity of computer security visualizations. This complexity evaluation method is designed to evaluate the efficiency of performing visual search in security visualizations in terms of measuring critical memory capacity load needed to perform such tasks. Our method is based on research in cognitive psychology along with characteristics found in a majority of …