Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

2015

Discipline
Institution
Publication
Publication Type

Articles 1 - 30 of 33

Full-Text Articles in Physical Sciences and Mathematics

A Data Science Course For Undergraduates: Thinking With Data, Benjamin Baumer Dec 2015

A Data Science Course For Undergraduates: Thinking With Data, Benjamin Baumer

Mathematics and Statistics: Faculty Publications

Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be nontraditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students to a variety of techniques to analyze small, neat, and clean datasets. However, whether they pursue more formal training in statistics or not, many of these students will end ...


The Performance Of Random Prototypes In Hierarchical Models Of Vision, Kendall Lee Stewart Dec 2015

The Performance Of Random Prototypes In Hierarchical Models Of Vision, Kendall Lee Stewart

Dissertations and Theses

I investigate properties of HMAX, a computational model of hierarchical processing in the primate visual cortex. High-level cortical neurons have been shown to respond highly to particular natural shapes, such as faces. HMAX models this property with a dictionary of natural shapes, called prototypes, that respond to the presence of those shapes. The resulting set of similarity measurements is an effective descriptor for classifying images. Curiously, prior work has shown that replacing the dictionary of natural shapes with entirely random prototypes has little impact on classification performance. This work explores that phenomenon by studying the performance of random prototypes on ...


Dynamic Data Management In A Data Grid Environment, Björn Barrefors Dec 2015

Dynamic Data Management In A Data Grid Environment, Björn Barrefors

Computer Science and Engineering: Theses, Dissertations, and Student Research

A data grid is a geographically distributed set of resources providing a facility for computationally intensive analysis of large datasets to a large number of geographically distributed users. In the scientific community, data grids have become increasingly popular as scientific research is driven by large datasets. Until recently, developments in data management for data grids have focused on management of data at lower layers in the data grid architecture. With dataset sizes expected to approach exabyte scale in coming years, data management in data grids are facing a new set of challenges. In particularly, the problem of automatically placing and ...


Energy Forecasting For Event Venues: Big Data And Prediction Accuracy, Katarina Grolinger, Alexandra L'Heureux, Miriam Am Capretz, Luke Seewald Dec 2015

Energy Forecasting For Event Venues: Big Data And Prediction Accuracy, Katarina Grolinger, Alexandra L'Heureux, Miriam Am Capretz, Luke Seewald

Electrical and Computer Engineering Publications

Advances in sensor technologies and the proliferation of smart meters have resulted in an explosion of energy-related data sets. These Big Data have created opportunities for development of new energy services and a promise of better energy management and conservation. Sensor-based energy forecasting has been researched in the context of office buildings, schools, and residential buildings. This paper investigates sensor-based forecasting in the context of event-organizing venues, which present an especially difficult scenario due to large variations in consumption caused by the hosted events. Moreover, the significance of the data set size, specifically the impact of temporal granularity, on energy ...


Evaluating The Intrinsic Similarity Between Neural Networks, Stephen Charles Ashmore Dec 2015

Evaluating The Intrinsic Similarity Between Neural Networks, Stephen Charles Ashmore

Theses and Dissertations

We present Forward Bipartite Alignment (FBA), a method that aligns the topological structures of two neural networks. Neural networks are considered to be a black box, because neural networks contain complex model surface determined by their weights that combine attributes non-linearly. Two networks that make similar predictions on training data may still generalize differently. FBA enables a diversity of applications, including visualization and canonicalization of neural networks, ensembles, and cross-over between unrelated neural networks in evolutionary optimization. We describe the FBA algorithm, and describe implementations for three applications: genetic algorithms, visualization, and ensembles. We demonstrate FBA's usefulness by comparing ...


Distributed Approach For Peptide Identification, Naga V K Abhinav Vedanbhatla Oct 2015

Distributed Approach For Peptide Identification, Naga V K Abhinav Vedanbhatla

Masters Theses & Specialist Projects

A crucial step in protein identification is peptide identification. The Peptide Spectrum Match (PSM) information set is enormous. Hence, it is a time-consuming procedure to work on a single machine. PSMs are situated by a cross connection, a factual score, or a probability that the match between the trial and speculative is right and original. This procedure takes quite a while to execute. So, there is demand for enhancement of the performance to handle extensive peptide information sets. Development of appropriate distributed frameworks are expected to lessen the processing time.

The designed framework uses a peptide handling algorithm named C-Ranker ...


Learning Relative Similarity From Data Streams: Active Online Learning Approaches, Shuji Hao, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao Oct 2015

Learning Relative Similarity From Data Streams: Active Online Learning Approaches, Shuji Hao, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao

Research Collection School Of Information Systems

Relative similarity learning, as an important learning scheme for information retrieval, aims to learn a bi-linear similarity function from a collection of labeled instance-pairs, and the learned function would assign a high similarity value for a similar instance-pair and a low value for a dissimilar pair. Existing algorithms usually assume the labels of all the pairs in data streams are always made available for learning. However, this is not always realistic in practice since the number of possible pairs is quadratic to the number of instances in the database, and manually labeling the pairs could be very costly and time ...


Generating A Multipliciy Of Policies For Agent Steering In Crowd Simulation, Cory D Boatright, Mubbasir Kapadia, Jennie Shapira, Norman I. Badler Sep 2015

Generating A Multipliciy Of Policies For Agent Steering In Crowd Simulation, Cory D Boatright, Mubbasir Kapadia, Jennie Shapira, Norman I. Badler

Center for Human Modeling and Simulation

Pedestrian steering algorithms range from completely procedural to entirely data-driven, but the former grossly generalize across possible human behaviors and suffer computationally, whereas the latter are limited by the burden of ever-increasing data samples. Our approach seeks the balanced middle ground by deriving a collection of machine-learned policies based on the behavior of a procedural steering algorithm through the decomposition of the space of possible steering scenarios into steering contexts. The resulting algorithm scales well in the number of contexts, the use of new data sets to create new policies, and in the number of controlled agents as the policies ...


A Machine Learning Approach To Edge Type Prediction In Internet As Graphs, Jinu Susan Varghese, Lu Ruan Jul 2015

A Machine Learning Approach To Edge Type Prediction In Internet As Graphs, Jinu Susan Varghese, Lu Ruan

Computer Science Technical Reports

The Internet consists of a large number of interconnected autonomous systems (ASes). ASes engage in two types of business relationships to exchange traffic: provider-to-customer (p2c) relationship and peer-to-peer (p2p) relationship. Internet AS-level topology can be represented by AS graphs where nodes represent autonomous systems (ASes) and edges represent connectivity between ASes. While researchers have derived AS graphs using various data sources, inferring the types of edges (p2c or p2p) in AS graphs remains an open problem. In this paper we present a new machine learning approach to edge type inference in AS graphs. Our method uses the AdaBoost machine learning ...


Modeling Words For Online Sexual Behavior Surveillance And Clinical Text Information Extraction, Jason Alan Fries Jul 2015

Modeling Words For Online Sexual Behavior Surveillance And Clinical Text Information Extraction, Jason Alan Fries

Theses and Dissertations

How do we model the meaning of words? In domains like information retrieval, words have classically been modeled as discrete entities using 1-of-n encoding, a representation that elides most of a word's syntactic and semantic structure. Recent research, however, has begun exploring more robust representations called word embeddings. Embeddings model words as a parameterized function mapping into an n-dimensional continuous space and implicitly encode a number of interesting semantic and syntactic properties. This dissertation examines two application areas where existing, state-of-the-art terminology modeling improves the task of information extraction (IE) -- the process of transforming unstructured data into structured form ...


Detecting, Modeling, And Predicting User Temporal Intention, Hany M. Salaheldeen Jul 2015

Detecting, Modeling, And Predicting User Temporal Intention, Hany M. Salaheldeen

Computer Science Theses & Dissertations

The content of social media has grown exponentially in the recent years and its role has evolved from narrating life events to actually shaping them. Unfortunately, content posted and shared in social networks is vulnerable and prone to loss or change, rendering the context associated with it (a tweet, post, status, or others) meaningless. There is an inherent value in maintaining the consistency of such social records as in some cases they take over the task of being the first draft of history as collections of these social posts narrate the pulse of the street during historic events, protest, riots ...


Reliable Patch Trackers: Robust Visual Tracking By Exploiting Reliable Patches, Yang Li, Jianke Zhu, Steven C. H. Hoi Jun 2015

Reliable Patch Trackers: Robust Visual Tracking By Exploiting Reliable Patches, Yang Li, Jianke Zhu, Steven C. H. Hoi

Research Collection School Of Information Systems

Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization. In this paper, we propose a new tracking method, Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Specifically, we present a tracking reliability metric to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework. As the reliable patches distributed over ...


Information Filtering By Multiple Examples, Mingzhu Zhu May 2015

Information Filtering By Multiple Examples, Mingzhu Zhu

Dissertations

A key to successfully satisfy an information need lies in how users express it using keywords as queries. However, for many users, expressing their information needs using keywords is difficult, especially when the information need is complex. Search By Multiple Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords.

Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the user's provided examples (denoted as query examples) as positive set and the entire ...


Neuroscience-Inspired Dynamic Architectures, Catherine Dorothy Schuman May 2015

Neuroscience-Inspired Dynamic Architectures, Catherine Dorothy Schuman

Doctoral Dissertations

Biological brains are some of the most powerful computational devices on Earth. Computer scientists have long drawn inspiration from neuroscience to produce computational tools. This work introduces neuroscience-inspired dynamic architectures (NIDA), spiking neural networks embedded in a geometric space that exhibit dynamic behavior. A neuromorphic hardware implementation based on NIDA networks, Dynamic Adaptive Neural Network Array (DANNA), is discussed. Neuromorphic implementations are one alternative/complement to traditional von Neumann computation. A method for designing/training NIDA networks, based on evolutionary optimization, is introduced. We demonstrate the utility of NIDA networks on classification tasks, a control task, and an anomaly detection ...


Pattern Recognition And Matching In Ice Core Data, Nathan Dunn Apr 2015

Pattern Recognition And Matching In Ice Core Data, Nathan Dunn

Honors College

The purpose of this research is to investigate the potential of applying concepts from ma- chine learning, such as pattern recognition and matching, to detect climatic signals in ice core data. The main components of this project are the development of a pattern language for expressing relationships between chemical signals over time, a method of tokenizing ice core chemistry data into an easily manageable form, a method of matching specific instances of climatic signals to a specific pattern string, and a method to recognize and evaluate patterns within ice core chemistry data. While there are weaknesses in each of these ...


Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick Mar 2015

Epistemological Databases For Probabilistic Knowledge Base Construction, Michael Louis Wick

Doctoral Dissertations

Knowledge bases (KB) facilitate real world decision making by providing access to structured relational information that enables pattern discovery and semantic queries. Although there is a large amount of data available for populating a KB; the data must first be gathered and assembled. Traditionally, this integration is performed automatically by storing the output of an information extraction pipeline directly into a database as if this prediction were the ``truth.'' However, the resulting KB is often not reliable because (a) errors accumulate in the integration pipeline, and (b) they persist in the KB even after new information arrives that could rectify ...


Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad Mar 2015

Learning With Joint Inference And Latent Linguistic Structure In Graphical Models, Jason Narad

Doctoral Dissertations

Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system's output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of "telephone", combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from ...


Leveraging Contextual Relationships Between Objects For Localization, Clinton Leif Olson Mar 2015

Leveraging Contextual Relationships Between Objects For Localization, Clinton Leif Olson

Dissertations and Theses

Object localization is currently an active area of research in computer vision. The object localization task is to identify all locations of an object class within an image by drawing a bounding box around objects that are instances of that class. Object locations are typically found by computing a classification score over a small window at multiple locations in the image, based on some chosen criteria, and choosing the highest scoring windows as the object bounding-boxes. Localization methods vary widely, but there is a growing trend towards methods that are able to make localization more accurate and efficient through the ...


Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel Jan 2015

Cancer Risk Prediction With Next Generation Sequencing Data Using Machine Learning, Nihir Patel

Theses

The use of computational biology for next generation sequencing (NGS) analysis is rapidly increasing in genomics research. However, the effectiveness of NGS data to predict disease abundance is yet unclear. This research investigates the problem in the whole exome NGS data of the chronic lymphocytic leukemia (CLL) available at dbGaP. Initially, raw reads from samples are aligned to the human reference genome using burrows wheeler aligner. From the samples, structural variants, namely, Single Nucleotide Polymorphism (SNP) and Insertion Deletion (INDEL) are identified and are filtered using SAMtools as well as with Genome Analyzer Tool Kit (GATK). Subsequently, the variants are ...


Characterization Of Prose By Rhetorical Structure For Machine Learning Classification, James Java Jan 2015

Characterization Of Prose By Rhetorical Structure For Machine Learning Classification, James Java

CCE Theses and Dissertations

Measures of classical rhetorical structure in text can improve accuracy in certain types of stylistic classification tasks such as authorship attribution. This research augments the relatively scarce work in the automated identification of rhetorical figures and uses the resulting statistics to characterize an author's rhetorical style. These characterizations of style can then become part of the feature set of various classification models.

Our Rhetorica software identifies 14 classical rhetorical figures in free English text, with generally good precision and recall, and provides summary measures to use in descriptive or classification tasks. Classification models trained on Rhetorica's rhetorical measures ...


Forecasting Obsolescence Risk And Product Lifecycle With Machine Learning, Connor Patrick Jennings Jan 2015

Forecasting Obsolescence Risk And Product Lifecycle With Machine Learning, Connor Patrick Jennings

Graduate Theses and Dissertations

Rapid changes in technology have led to an increasingly fast pace of product introductions. New components offering added functionality, improved performance and quality are routinely available to a growing number of industry sectors (e.g., electronics, automotive, and defense industries). For long-life systems such as planes, ships, nuclear power plants, and more, these rapid changes help sustain the useful life, but at the same time, present significant challenges associated with managing change. Obsolescence of components and/or subsystems can be technical, functional, related to style, etc., and occur in nearly any industry. Over the years, many approaches for forecasting obsolescence ...


Fast Linear Algorithms For Machine Learning, Yichao Lu Jan 2015

Fast Linear Algorithms For Machine Learning, Yichao Lu

Publicly Accessible Penn Dissertations

Nowadays linear methods like Regression, Principal Component Analysis and Canoni- cal Correlation Analysis are well understood and widely used by the machine learning community for predictive modeling and feature generation. Generally speaking, all these methods aim at capturing interesting subspaces in the original high dimensional feature space. Due to the simple linear structures, these methods all have a closed form solution which makes computation and theoretical analysis very easy for small datasets. However, in modern machine learning problems it's very common for a dataset to have millions or billions of features and samples. In these cases, pursuing the closed ...


Energy Cost Forecasting For Event Venues, Katarina Grolinger, Andrea Zagar, Miriam Am Capretz, Luke Seewald Jan 2015

Energy Cost Forecasting For Event Venues, Katarina Grolinger, Andrea Zagar, Miriam Am Capretz, Luke Seewald

Electrical and Computer Engineering Publications

Electricity price, consumption, and demand forecasting has been a topic of research interest for a long time. The proliferation of smart meters has created new opportunities in energy prediction. This paper investigates energy cost forecasting in the context of entertainment event-organizing venues, which poses significant difficulty due to fluctuations in energy demand and wholesale electricity prices. The objective is to predict the overall cost of energy consumed during an entertainment event. Predictions are carried out separately for each event category and feature selection is used to select the most effective combination of event attributes for each category. Three machine learning ...


Case-Specific Random Forests For Big Data Prediction, Joshua Zimmerman, Dan Nettleton Jan 2015

Case-Specific Random Forests For Big Data Prediction, Joshua Zimmerman, Dan Nettleton

Statistics Conference Proceedings, Presentations and Posters

Some training datasets may be too large for storage on a single computer. Such datasets may be partitioned and stored on separate computers connected in a parallel computing environment. To predict the response associated with a specific target case when training data are partitioned, we propose a method for finding the training cases within each partition that are most relevant for predicting the response of a target case of interest. These most relevant training cases from each partition can be combined into a single dataset, which can be a subset of the entire training dataset that is small enough for ...


A Middleware Framework For Application-Aware And User-Specific Energy Optimization In Smart Mobile Devices, Sudeep Pasricha, Brad K. Donohoo, Chris Ohlsen Jan 2015

A Middleware Framework For Application-Aware And User-Specific Energy Optimization In Smart Mobile Devices, Sudeep Pasricha, Brad K. Donohoo, Chris Ohlsen

U.S. Air Force Research

munication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated ‘‘smart’’ mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov ...


Extensions And Applications Of Ensemble-Of-Trees Methods In Machine Learning, Justin Bleich Jan 2015

Extensions And Applications Of Ensemble-Of-Trees Methods In Machine Learning, Justin Bleich

Publicly Accessible Penn Dissertations

Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits.

These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part ...


Singular Value Computation And Subspace Clustering, Qiao Liang Jan 2015

Singular Value Computation And Subspace Clustering, Qiao Liang

Theses and Dissertations--Mathematics

In this dissertation we discuss two problems. In the first part, we consider the problem of computing a few extreme eigenvalues of a symmetric definite generalized eigenvalue problem or a few extreme singular values of a large and sparse matrix. The standard method of choice of computing a few extreme eigenvalues of a large symmetric matrix is the Lanczos or the implicitly restarted Lanczos method. These methods usually employ a shift-and-invert transformation to accelerate the speed of convergence, which is not practical for truly large problems. With this in mind, Golub and Ye proposes an inverse-free preconditioned Krylov subspace method ...


Modeling User Transportation Patterns Using Mobile Devices, Erfan Davami Jan 2015

Modeling User Transportation Patterns Using Mobile Devices, Erfan Davami

Electronic Theses and Dissertations, 2004-2019

Participatory sensing frameworks use humans and their computing devices as a large mobile sensing network. Dramatic accessibility and affordability have turned mobile devices (smartphone and tablet computers) into the most popular computational machines in the world, exceeding laptops. By the end of 2013, more than 1.5 billion people on earth will have a smartphone. Increased coverage and higher speeds of cellular networks have given these devices the power to constantly stream large amounts of data. Most mobile devices are equipped with advanced sensors such as GPS, cameras, and microphones. This expansion of smartphone numbers and power has created a ...


Converting Neuroimaging Big Data To Information: Statistical Frameworks For Interpretation Of Image Driven Biomarkers And Image Driven Disease Subtyping, Bilwaj Krishnanand Gaonkar Jan 2015

Converting Neuroimaging Big Data To Information: Statistical Frameworks For Interpretation Of Image Driven Biomarkers And Image Driven Disease Subtyping, Bilwaj Krishnanand Gaonkar

Publicly Accessible Penn Dissertations

Large scale clinical trials and population based research studies collect huge amounts of neuroimaging data. Machine learning classifiers can potentially use these data to train models that diagnose brain related diseases from individual brain scans. In this dissertation we address two distinct challenges that beset a wider adoption of these tools for diagnostic purposes.

The first challenge that besets the neuroimaging based disease classification is the lack of a statistical inference machinery for highlighting brain regions that contribute significantly to the classifier decisions. In this dissertation, we address this challenge by developing an analytic framework for interpreting support vector machine ...


Reverse Engineering The Human Brain: An Evolutionary Computation Approach To The Analysis Of Fmri, Nicholas Allgaier Jan 2015

Reverse Engineering The Human Brain: An Evolutionary Computation Approach To The Analysis Of Fmri, Nicholas Allgaier

Graduate College Dissertations and Theses

The field of neuroimaging has truly become data rich, and as such, novel analytical methods capable of gleaning meaningful information from large stores of imaging data are in high demand. Those methods that might also be applicable on the level of individual subjects, and thus potentially useful clinically, are of special interest. In this dissertation we introduce just such a method, called nonlinear functional mapping (NFM), and demonstrate its application in the analysis of resting state fMRI (functional Magnetic Resonance Imaging) from a 242-subject subset of the IMAGEN project, a European study of risk-taking behavior in adolescents that includes longitudinal ...