Physical Sciences and Mathematics | Open Access Articles

Data Analytics For Power Utility Storm Planning, Lan Lin, Aldo Dagnino, Derek Doran, Swapna S. Gokhale Oct 2014

Data Analytics For Power Utility Storm Planning, Lan Lin, Aldo Dagnino, Derek Doran, Swapna S. Gokhale

Kno.e.sis Publications

As the world population grows, recent climatic changes seem to bring powerful storms to populated areas. The impact of these storms on utility services is devastating. Hurricane Sandy is a recent example of the enormous damages that storms can inflict on infrastructure, society, and the economy. Quick response to these emergencies represents a big challenge to electric power utilities. Traditionally utilities develop preparedness plans for storm emergency situations based on the experience of utility experts and with limited use of historical data. With the advent of the Smart Grid, utilities are incorporating automation and sensing technologies in their grids and …

Go to article

Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams Aug 2014

Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams

Electronic Thesis and Dissertation Repository

In this age of information overload, one experiences a rapidly growing over-abundance of written text. To assist with handling this bounty, this plethora of texts is now widely used to develop and optimize statistical natural language processing (NLP) systems. Surprisingly, the use of more fragments of text to train these statistical NLP systems may not necessarily lead to improved performance. We hypothesize that those fragments that help the most with training are those that contain the desired information. Therefore, determining informativeness in text has become a central issue in our view of NLP. Recent developments in this field have spawned …

Go to article

Complex Network Analysis For Scientific Collaboration Prediction And Biological Hypothesis Generation, Qing Zhang Aug 2014

Complex Network Analysis For Scientific Collaboration Prediction And Biological Hypothesis Generation, Qing Zhang

Theses and Dissertations

With the rapid development of digitalized literature, more and more knowledge has been discovered by computational approaches. This thesis addresses the problem of link prediction in co-authorship networks and protein--protein interaction networks derived from the literature. These networks (and most other types of networks) are growing over time and we assume that a machine can learn from past link creations by examining the network status at the time of their creation. Our goal is to create a computationally efficient approach to recommend new links for a node in a network (e.g., new collaborations in co-authorship networks and new interactions in …

Go to article

Element Detection In Japanese Comic Book Panels, Toshihiro Kuboi Aug 2014

Element Detection In Japanese Comic Book Panels, Toshihiro Kuboi

Master's Theses

Comic books are a unique and increasingly popular form of entertainment combining visual and textual elements of communication. This work pertains to making comic books more accessible. Specifically, this paper explains how we detect elements such as speech bubbles present in Japanese comic book panels. Some applications of the work presented in this paper are automatic detection of text and its transformation into audio or into other languages. Automatic detection of elements can also allow reasoning and analysis at a deeper semantic level than what’s possible today. Our approach uses an expert system and a machine learning system. The expert …

Go to article

Predicting Music Genre Preferences Based On Online Comments, Andrew J. Sinclair Jun 2014

Predicting Music Genre Preferences Based On Online Comments, Andrew J. Sinclair

Master's Theses

Communication Accommodation Theory (CAT) states that individuals adapt to each other’s communicative behaviors. This adaptation is called “convergence.” In this work we explore the convergence of writing styles of users of the online music distribution plat- form SoundCloud.com. In order to evaluate our system we created a corpus of over 38,000 comments retrieved from SoundCloud in April 2014. The corpus represents comments from 8 distinct musical genres: Classical, Electronic, Hip Hop, Jazz, Country, Metal, Folk, and World. Our corpus contains: short comments, frequent misspellings, little sentence struc- ture, hashtags, emoticons, and URLs. We adapt techniques used by researchers analyzing other …

Go to article

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz May 2014

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz

Wilson A Higashino

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped …

Go to article

A Computational Model Of Memetic Evolution: Optimizing Collective Intelligence, Noah Welsh May 2014

A Computational Model Of Memetic Evolution: Optimizing Collective Intelligence, Noah Welsh

All Dissertations

The purpose of this study was to create an adaptive agent based simulation modeling the processes of creative collaboration. This model aided in the development of a new evolutionary based framework through which education scholars, academics, and professionals in all disciplines and industries can work to optimize their collective ability to find creative solutions to complex problems. The basic premise follows that the process of idea exchange, parallels the role sexual reproduction in biological evolution and is essential to society's collective ability to solve complex problems. The study outlined a set of assumptions used to develop a new theory of …

Go to article

Deep Learning Via Stacked Sparse Autoencoders For Automated Voxel-Wise Brain Parcellation Based On Functional Connectivity, Céline Gravelines Apr 2014

Deep Learning Via Stacked Sparse Autoencoders For Automated Voxel-Wise Brain Parcellation Based On Functional Connectivity, Céline Gravelines

Electronic Thesis and Dissertation Repository

Functional brain parcellation – the delineation of brain regions based on functional connectivity – is an active research area lacking an ideal subject-specific solution independent of anatomical composition, manual feature engineering, or heavily labelled examples. Deep learning is a cutting-edge area of machine learning on the forefront of current artificial intelligence developments. Specifically, autoencoders are artificial neural networks which can be stacked to form hierarchical sparse deep models from which high-level features are compressed, organized, and extracted, without labelled training data, allowing for unsupervised learning. This thesis presents a novel application of stacked sparse autoencoders to the problem of parcellating …

Go to article

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan Apr 2014

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Subsemble is a flexible ensemble method that partitions a full data set into subsets of observations, fits the same algorithm on each subset, and uses a tailored form of V-fold cross-validation to construct a prediction function that combines the subset-specific fits with a second metalearner algorithm. Previous work studied the performance of Subsemble with subsets created randomly, and showed that these types of Subsembles often result in better prediction performance than the underlying algorithm fit just once on the full dataset. Since the final Subsemble estimator varies depending on the data used to create the subset-specific fits, different strategies for …

Go to article

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz Jan 2014

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz

Electrical and Computer Engineering Publications

Go to article

Random Forests Based Rule Learning And Feature Elimination, Sheng Liu Jan 2014

Random Forests Based Rule Learning And Feature Elimination, Sheng Liu

Electronic Theses and Dissertations

Much research combines data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. We propose an efficient approach, combining rule extraction and feature elimination, based on 1-norm regularized random forests. This approach simultaneously extracts a small number of rules generated by random forests and selects important features. To evaluate this approach, we have applied it to …

Go to article

An Evolutionary Approximation To Contrastive Divergence In Convolutional Restricted Boltzmann Machines, Ryan R. Mccoppin Jan 2014

An Evolutionary Approximation To Contrastive Divergence In Convolutional Restricted Boltzmann Machines, Ryan R. Mccoppin

Browse all Theses and Dissertations

Deep learning is an emerging area in machine learning that exploits multi-layered neural networks to extract invariant relationships from large data sets. Deep learning uses layers of non-linear transformations to represent data in abstract and discrete forms. Several different architectures have been developed over the past few years specifically to process images including the Convolutional Restricted Boltzmann Machine. The Boltzmann Machine is trained using contrastive divergence, a depth-first gradient based training algorithm. Gradient based training methods have no guarantee of reaching an optimal solution and tend to search a limited region of the solution space. In this thesis, we present …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Data Analytics For Power Utility Storm Planning, Lan Lin, Aldo Dagnino, Derek Doran, Swapna S. Gokhale

Kno.e.sis Publications

Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams

Electronic Thesis and Dissertation Repository

Complex Network Analysis For Scientific Collaboration Prediction And Biological Hypothesis Generation, Qing Zhang

Theses and Dissertations

Element Detection In Japanese Comic Book Panels, Toshihiro Kuboi

Master's Theses

Predicting Music Genre Preferences Based On Online Comments, Andrew J. Sinclair

Master's Theses

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz

Wilson A Higashino

A Computational Model Of Memetic Evolution: Optimizing Collective Intelligence, Noah Welsh

All Dissertations

Deep Learning Via Stacked Sparse Autoencoders For Automated Voxel-Wise Brain Parcellation Based On Functional Connectivity, Céline Gravelines

Electronic Thesis and Dissertation Repository

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz

Electrical and Computer Engineering Publications

Random Forests Based Rule Learning And Feature Elimination, Sheng Liu

Electronic Theses and Dissertations

An Evolutionary Approximation To Contrastive Divergence In Convolutional Restricted Boltzmann Machines, Ryan R. Mccoppin

Browse all Theses and Dissertations