Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

2013

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Can Clustering Improve Requirements Traceability? A Tracelab-Enabled Study, Brett Taylor Armstrong Dec 2013

Can Clustering Improve Requirements Traceability? A Tracelab-Enabled Study, Brett Taylor Armstrong

Master's Theses

Software permeates every aspect of our modern lives. In many applications, such in the software for airplane flight controls, or nuclear power control systems software failures can have catastrophic consequences. As we place so much trust in software, how can we know if it is trustworthy? Through software assurance, we can attempt to quantify just that.

Building complex, high assurance software is no simple task. The difficult information landscape of a software engineering project can make verification and validation, the process by which the assurance of a software is assessed, very difficult. In order to manage the inevitable information overload …


Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj Nov 2013

Using Machine Learning Techniques To Customize The User's Profile, Helps Intelligent Tv Decoder’S Design, Alketa Hyso, Roneda Mucaj

UBT International Conference

In today's society due to the increase of the quantity of information is becoming more difficult to find the information we search. "Data mining" offers us the most important methods and techniques in data analysis. Through this work, we aim to study the several data mining techniques, methods and applications in specific areas. We experiment with an “open software" WEKA, to perform some data analysis, presenting the reliability and advantages of data mining classification technique. We use the decision trees technique to achieve the task of classification, to customize user profiles based on their requirements and needs. This paper presents …


What You Want Is Not What You Get: Predicting Sharing Policies For Text-Based Content On Facebook, Arunesh Sinha, Li Yan, Lujo Bauer Nov 2013

What You Want Is Not What You Get: Predicting Sharing Policies For Text-Based Content On Facebook, Arunesh Sinha, Li Yan, Lujo Bauer

Research Collection Lee Kong Chian School Of Business

As the amount of content users publish on social networking sites rises, so do the danger and costs of inadvertently sharing content with an unintended audience. Studies repeatedly show that users frequently misconfigure their policies or misunderstand the privacy features offered by social networks. A way to mitigate these problems is to develop automated tools to assist users in correctly setting their policy. This paper explores the viability of one such approach: we examine the extent to which machine learning can be used to deduce users' sharing preferences for content posted on Facebook. To generate data on which to evaluate …


Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen Oct 2013

Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper we present prediction and variable importance (VIM) methods for longitudinal data sets containing both continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can thus provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters can be causally interpreted …


Asymptotically Unbiased Estimator Of The Informational Energy With Knn, Angel Caţaron, Răzvan Andonie, Chinmei Y. Chueh Oct 2013

Asymptotically Unbiased Estimator Of The Informational Energy With Knn, Angel Caţaron, Răzvan Andonie, Chinmei Y. Chueh

All Faculty Scholarship for the College of the Sciences

Motivated by machine learning applications (e.g., classification, function approximation, feature extraction), in previous work, we have introduced a non- parametric estimator of Onicescu’s informational energy. Our method was based on the k-th nearest neighbor distances between the n sample points, where k is a fixed positive integer. In the present contribution, we discuss mathematical properties of this estimator. We show that our estimator is asymptotically unbiased and consistent. We provide further experimental results which illustrate the convergence of the estimator for standard distributions.


Enabling Richer Insight Into Runtime Executions Of Systems, Karthik Swaminathan Nagaraj Oct 2013

Enabling Richer Insight Into Runtime Executions Of Systems, Karthik Swaminathan Nagaraj

Open Access Dissertations

Systems software of very large scales are being heavily used today in various important scenarios such as online retail, banking, content services, web search and social networks. As the scale of functionality and complexity grows in these software, managing the implementations becomes a considerable challenge for developers, designers and maintainers. Software needs to be constantly monitored and tuned for optimal efficiency and user satisfaction. With large scale, these systems incorporate significant degrees of asynchrony, parallelism and distributed executions, reducing the manageability of software including performance management. Adding to the complexity, developers are under pressure between developing new functionality for customers …


Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel Aug 2013

Practical Cost-Conscious Active Learning For Data Annotation In Annotator-Initiated Environments, Robbie A. Haertel

Theses and Dissertations

Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We …


Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose Aug 2013

Online Multi-Stage Deep Architectures For Feature Extraction And Object Recognition, Derek Christopher Rose

Doctoral Dissertations

Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. …


Segmentation And Model Generation For Large-Scale Cyber Attacks, Steven E. Strapp Aug 2013

Segmentation And Model Generation For Large-Scale Cyber Attacks, Steven E. Strapp

Theses

Raw Cyber attack traffic can present more questions than answers to security analysts. Especially with large-scale observables it is difficult to identify which packets are relevant and what attack behaviors are present. Many existing works in Host or Flow Clustering attempt to group similar behaviors to expedite analysis; these works often phrase the problem directly as offline unsupervised machine learning. This work proposes online processing to simultaneously model coordinating actors and segment traffic that is relevant to a target of interest, all while it is being received. The goal is not just to aggregate similar attack behaviors, but to provide …


Computer Sketch Recognition, Richard Steigerwald Jun 2013

Computer Sketch Recognition, Richard Steigerwald

Master's Theses

Tens of thousands of years ago, humans drew sketches that we can see and identify even today. Sketches are the oldest recorded form of human communication and are still widely used. The universality of sketches supersedes that of culture and language. Despite the universal accessibility of sketches by humans, computers are unable to interpret or even correctly identify the contents of sketches drawn by humans with a practical level of accuracy.

In my thesis, I demonstrate that the accuracy of existing sketch recognition techniques can be improved by optimizing the classification criteria. Current techniques classify a 20,000 sketch crowd-sourced dataset …


Mind Change Speed-Up For Learning Languages From Positive Data, Sanjay Jain, Efim Kinber Jun 2013

Mind Change Speed-Up For Learning Languages From Positive Data, Sanjay Jain, Efim Kinber

School of Computer Science & Engineering Faculty Publications

Within the frameworks of learning in the limit of indexed classes of recursive languages from positive data and automatic learning in the limit of indexed classes of regular languages (with automatically computable sets of indices), we study the problem of minimizing the maximum number of mind changes by a learner on all languages with indices not exceeding . For inductive inference of recursive languages, we establish two conditions under which can be made smaller than any recursive unbounded non-decreasing function. We also establish how is affected if at least one of these two conditions does not hold. In the case …


Mobile Computing: Challenges And Opportunities For Autonomy And Feedback, Ole J. Mengshoel, Bob Iannucci, Abe Ishihara May 2013

Mobile Computing: Challenges And Opportunities For Autonomy And Feedback, Ole J. Mengshoel, Bob Iannucci, Abe Ishihara

Ole J Mengshoel

Mobile devices have evolved to become computing platforms more similar to desktops and workstations than the cell phones and handsets of yesteryear. Unfortunately, today’s mobile infrastructures are mirrors of the wired past. Devices, apps, and networks impact one another, but a systematic approach for allowing them to cooperate is currently missing. We propose an approach that seeks to open key interfaces and to apply feedback and autonomic computing to improve both user experience and mobile system dynamics.


Document Classification, Shane K. Panter May 2013

Document Classification, Shane K. Panter

Boise State University Theses and Dissertations

We present an overview of the document classification process and present research conducted against the newly constructed SBIR-STTR corpus. Specifically, the current methods in use for annotation, corpus construction, feature construction, feature weighting, and classifier algorithms are surveyed. We introduce a new dataset derived from public data downloaded from sbir.gov and the Text Annotation Toolkit (TAT) 1 for use in classification research.

TAT is a collection of independent components packaged together into one open source software application. TAT was engineered to support the document classification process and workflow. Tracking of changes in a working corpus, saving data used in the …


Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen Apr 2013

Probabilistic Explicit Topic Modeling, Joshua Aaron Hansen

Theses and Dissertations

Latent Dirichlet Allocation (LDA) is widely used for automatic discovery of latent topics in document corpora. However, output from analysis using an LDA topic model suffers from a lack of identifiability between topics not only across corpora, but across runs of the algorithm. The output is also isolated from enriching information from knowledge sources such as Wikipedia and is difficult for humans to interpret due to a lack of meaningful topic labels. This thesis introduces two methods for probabilistic explicit topic modeling that address these issues: Latent Dirichlet Allocation with Static Topic-Word Distributions (LDA-STWD), and Explicit Dirichlet Allocation (EDA). LDA-STWD …


Artificial Immune Systems And Particle Swarm Optimization For Solutions To The General Adversarial Agents Problem, Jeremy Mange Apr 2013

Artificial Immune Systems And Particle Swarm Optimization For Solutions To The General Adversarial Agents Problem, Jeremy Mange

Dissertations

The general adversarial agents problem is an abstract problem description touching on the fields of Artificial Intelligence, machine learning, decision theory, and game theory. The goal of the problem is, given one or more mobile agents, each identified as either “friendly" or “enemy", along with a specified environment state, to choose an action or series of actions from all possible valid choices for the next “timestep" or series thereof, in order to lead toward a specified outcome or set of outcomes. This dissertation explores approaches to this problem utilizing Artificial Immune Systems, Particle Swarm Optimization, and hybrid approaches, along with …


Object Detection And Recognition In Natural Settings, George William Dittmar Jan 2013

Object Detection And Recognition In Natural Settings, George William Dittmar

Dissertations and Theses

Much research as of late has focused on biologically inspired vision models that are based on our understanding of how the visual cortex processes information. One prominent example of such a system is HMAX [17]. HMAX attempts to simulate the biological process for object recognition in cortex based on the model proposed by Hubel & Wiesel [10]. This thesis investigates the ability of an HMAX-like system (GLIMPSE [20]) to perform object-detection in cluttered natural scenes. I evaluate these results using the StreetScenes database from MIT [1, 8]. This thesis addresses three questions: (1) Can the GLIMPSE-based object detection system replicate …


Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin Jan 2013

Using Methods From The Data-Mining And Machine-Learning Literature For Disease Classification And Prediction: A Case Study Examining Classification Of Heart Failure Subtypes, Peter C. Austin

Peter Austin

OBJECTIVE: Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines.

STUDY DESIGN AND SETTING: We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) …


An Automated Prognosis System For Estrogen Hormone Status Assessment In Breast Cancer Tissue Samples, Fati̇h Sarikoç, Adem Kalinli, Hülya Akgün, Fi̇gen Öztürk Jan 2013

An Automated Prognosis System For Estrogen Hormone Status Assessment In Breast Cancer Tissue Samples, Fati̇h Sarikoç, Adem Kalinli, Hülya Akgün, Fi̇gen Öztürk

Turkish Journal of Electrical Engineering and Computer Sciences

Estrogen receptor (ER) status evaluation is a widely applied method in the prognosis of breast cancer. However, testing for the existence of the ER biomarker in a patient's tumor sample mainly depends on the subjective decisions of the doctors. The aim of this paper is to introduce the usage of a machine learning tool, functional trees (FTs), to attain an ER prognosis of the disease via an objective decision model. For this aim, 27 image files, each of which came from a biopsy sample of an invasive ductal carcinoma patient, were scanned and captured by a light microscope. From these …


Anticipating The Friction Coefficient Of Friction Materials Used In Automobiles By Means Of Machine Learning Without Using A Test Instrument, Mustafa Ti̇mur, Fati̇h Aydin Jan 2013

Anticipating The Friction Coefficient Of Friction Materials Used In Automobiles By Means Of Machine Learning Without Using A Test Instrument, Mustafa Ti̇mur, Fati̇h Aydin

Turkish Journal of Electrical Engineering and Computer Sciences

The most important factor for designs in which friction materials are used is the coefficient of friction. The coefficient of friction has been determined taking such variants as velocity, temperature, and pressure into account, which arise from various factors in friction materials, and by analyzing the effects of these variants on friction materials. Many test instruments have been produced in order to determine the coefficient of friction. In this article, a study about the use of machine learning algorithms instead of test instruments in order to determine the coefficient of friction is presented. Isotonic regression was selected as the machine …


Automated Detection Of Vehicles With Machine Learning, Michael N. Johnstone, Andrew Woodward Jan 2013

Automated Detection Of Vehicles With Machine Learning, Michael N. Johnstone, Andrew Woodward

Australian Information Security Management Conference

Considering the significant volume of data generated by sensor systems and network hardware which is required to be analysed and intepreted by security analysts, the potential for human error is significant. This error can lead to consequent harm for some systems in the event of an adverse event not being detected. In this paper we compare two machine learning algorithms that can assist in supporting the security function effectively and present results that can be used to select the best algorithm for a specific domain. It is suggested that a naive Bayesian classiifer (NBC) and an artificial neural network (ANN) …


Human Intention Recognition Based Assisted Telerobotic Grasping Of Objects In An Unstructured Environment, Karan Hariharan Khokar Jan 2013

Human Intention Recognition Based Assisted Telerobotic Grasping Of Objects In An Unstructured Environment, Karan Hariharan Khokar

USF Tampa Graduate Theses and Dissertations

In this dissertation work, a methodology is proposed to enable a robot to identify an object to be grasped and its intended grasp configuration while a human is teleoperating a robot towards the desired object. Based on the detected object and grasp configuration, the human is assisted in the teleoperation task. The environment is unstructured and consists of a number of objects, each with various possible grasp configurations. The identification of the object and the grasp configuration is carried out in real time, by recognizing the intention of the human motion. Simultaneously, the human user is assisted to preshape over …


A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi Jan 2013

A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi

CMC Senior Theses

I will investigate applications of machine learning algorithms to medical data, adaptations of differences in data collection, and the use of ensemble techniques.

Focusing on the binary classification problem of Parkinson’s Disease (PD) diagnosis, I will apply machine learning algorithms to a primary dataset consisting of voice recordings from healthy and PD subjects. Specifically, I will use Artificial Neural Networks, Support Vector Machines, and an Ensemble Learning algorithm to reproduce results from [MS12] and [GM09].

Next, I will adapt a secondary regression dataset of PD recordings and combine it with the primary binary classification dataset, testing various techniques to consolidate …


Artificial Intelligence And Data Mining: Algorithms And Applications, Jianhong Xia, Fuding Xie, Yong Zhang, Craig Caulfield Jan 2013

Artificial Intelligence And Data Mining: Algorithms And Applications, Jianhong Xia, Fuding Xie, Yong Zhang, Craig Caulfield

Research outputs 2013

Artificial intelligence and data mining techniques have been used in many domains to solve classification, segmentation, association, diagnosis, and prediction problems. The overall aim of this special issue is to open a discussion among researchers actively working on algorithms and applications. The issue covers a wide variety of problems for computational intelligence, machine learning, time series analysis, remote sensing image mining, and pattern recognition. After a rigorous peer review process, 20 papers have been selected from 38 submissions. The accepted papers in this issue addressed the following topics: (i) advanced artificial intelligence and data mining techniques; (ii) computational intelligence in …


Teaching Law And Digital Age Legal Practice With An Ai And Law Seminar: Justice, Lawyering And Legal Education In The Digital Age, Kevin D. Ashley Jan 2013

Teaching Law And Digital Age Legal Practice With An Ai And Law Seminar: Justice, Lawyering And Legal Education In The Digital Age, Kevin D. Ashley

Articles

A seminar on Artificial Intelligence ("Al") and Law can teach law students lessons about legal reasoning and legal practice in the digital age. Al and Law is a subfield of Al/computer science research that focuses on designing computer programs—computational models—that perform legal reasoning. These computational models are used in building tools to assist in legal practice and pedagogy and in studying legal reasoning in order to contribute to cognitive science and jurisprudence. Today, subject to a number of qualifications, computer programs can reason with legal rules, apply legal precedents, and even argue like a legal advocate.

This article provides a …


Concept Drift Datasets, Patrick Lindstrom Jan 2013

Concept Drift Datasets, Patrick Lindstrom

Doctoral

This zip file contains the datasets used in the PhD thesis:

Lindstrom, P., 2013. Handling Concept Drift in the Context of Expensive Labels. Technological University Dublin. For more information about the datasets please see the README file and the aforementioned thesis.


On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj Jan 2013

On Identifying Critical Nuggets Of Information During Classification Task, David Sathiaraj

LSU Doctoral Dissertations

In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the …


Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman Dec 2012

Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman

Edward H. Kennedy

Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration. Our objective was to compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data. Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). …