Power-Efficient And Highly Scalable Parallel Graph Sampling Using Fpgas, Usman Tariq, Umer Cheema, Fahad Saeed
Parallel Computing and Data Science Lab Technical Reports
Energy efficiency is a crucial problem in data centers where big data is generally represented by directed or undirected graphs. Analysis of this big data graph is challenging due to volume and velocity of the data as well as irregular memory access patterns. Graph sampling is one of the most effective ways to reduce the size of graph while maintaining crucial characteristics. In this paper we present design and implementation of an FPGA based graph sampling method which is both time- and energy-efficient. This is in contrast to existing parallel approaches which include memory-distributed clusters, multicore and GPUs. Our ...
Hash-Map-Eradicator: Filtering Non-Target Sequences From Next Generation Sequencing Reads, 2017 Loyola University Chicago
Hash-Map-Eradicator: Filtering Non-Target Sequences From Next Generation Sequencing Reads, Jonathon Brenner, Catherine Putonti
Contemporary DNA sequencing technologies are continuously increasing throughput at ever decreasing costs. Moreover, due to recent advances in sequencing technology new platforms are emerging. As such computational challenges persist. The average read length possible has taken a giant leap forward with the PacBio and Nanopore solutions. Regardless of the platform used, impurities within the DNA preparation of the sample - be it from unintentional contaminants or pervasive symbiots - remains an issue. We have developed a new tool, HAsh-MaP-ERadicator (HAMPER), for the detection and removal of non-target, contaminating DNA sequences. Integrating hash-based and mapping-based strategies, HAMPER is both memory and time efficient ...
Data Predictive Control Using Regression Trees And Ensemble Learning, 2017 University of Pennsylvania
Data Predictive Control Using Regression Trees And Ensemble Learning, Achin Jain, Francesco Smarra, Rahul Mangharam
Real-Time and Embedded Systems Lab (mLAB)
Decisions on how to best operate large complex plants such as natural gas processing, oil refineries, and energy efficient buildings are becoming ever so complex that model-based predictive control (MPC) algorithms must play an important role. However, a key factor prohibiting the widespread adoption of MPC, is the cost, time, and effort associated with learning first-principles dynamical models of the underlying physical system. An alternative approach is to employ learning algorithms to build black-box models which rely only on real-time data from the sensors. Machine learning is widely used for regression and classification, but thus far data-driven models have not ...
Approximation Algorithms For Effective Team Formation, 2017 City University of New York (CUNY)
Approximation Algorithms For Effective Team Formation, George Rabanca
All Graduate Works by Year: Dissertations, Theses, and Capstone Projects
This dissertation investigates the problem of creating multiple disjoint teams of maximum efficacy from a fixed set of workers. We identify three parameters which directly correlate to the team effectiveness — team expertise, team cohesion and team size — and propose efficient algorithms for optimizing each in various settings. We show that under standard assumptions the problems we explore are not optimally solvable in polynomial time, and thus we focus on developing efficient algorithms with guaranteed worst case approximation bounds. First, we investigate maximizing team expertise in a setting where each worker has different expertise for each job and each job may ...
Morphogenesis And Growth Driven By Selection Of Dynamical Properties, 2017 The Graduate Center, City University of New York
Morphogenesis And Growth Driven By Selection Of Dynamical Properties, Yuri Cantor
All Graduate Works by Year: Dissertations, Theses, and Capstone Projects
Organisms are understood to be complex adaptive systems that evolved to thrive in hostile environments. Though widely studied, the phenomena of organism development and growth, and their relationship to organism dynamics is not well understood. Indeed, the large number of components, their interconnectivity, and complex system interactions all obscure our ability to see, describe, and understand the functioning of biological organisms.
Here we take a synthetic and computational approach to the problem, abstracting the organism as a cellular automaton. Such systems are discrete digital models of real-world environments, making them more accessible and easier to study then their physical world ...
Parallelization Of Molecular Docking Algorithms Using Cuda For Use In Drug Discovery, 2017 Purdue University
Parallelization Of Molecular Docking Algorithms Using Cuda For Use In Drug Discovery, Brandon Stewart, Jonathan Fine, Gaurav Chopra Phd
The Summer Undergraduate Research Fellowship (SURF) Symposium
Traditional drug discovery methodology uses a multitude of software packages to design and evaluate new drug-like compounds. While software packages implement a wide variety of methods, the serial (i.e. single core) implementation for many of these algorithms, prohibit large scale docking, such as proteome-wide docking (i.e. thousands of compounds with thousands of proteins). Several docking algorithms can be parallelized, significantly reducing the runtime of the calculations, thus enabling large-scale docking. Implementing algorithms that take advantage of the distributed nature of graphical processing units (GPUs) via the Compute Unified Device Architecture (CUDA) enables us to efficiently implement massively parallel ...
Vertex Weighted Spectral Clustering, 2017 East Tennessee State University
Vertex Weighted Spectral Clustering, Mohammad Masum
Electronic Theses and Dissertations
Spectral clustering is often used to partition a data set into a specified number of clusters. Both the unweighted and the vertex-weighted approaches use eigenvectors of the Laplacian matrix of a graph. Our focus is on using vertex-weighted methods to refine clustering of observations. An eigenvector corresponding with the second smallest eigenvalue of the Laplacian matrix of a graph is called a Fiedler vector. Coefficients of a Fiedler vector are used to partition vertices of a given graph into two clusters. A vertex of a graph is classified as unassociated if the Fiedler coefficient of the vertex is close to ...
Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, 2017 The University of Western Ontario
Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li
Electronic Thesis and Dissertation Repository
Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such datasets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered.
1. Sparse data and convergence behavior. How different properties of a dataset, such as ...
An Analysis Of The Application Of Simplified Silhouette To The Evaluation Of K-Means Clustering Validity, 2017 Dublin Institute of Technology
An Analysis Of The Application Of Simplified Silhouette To The Evaluation Of K-Means Clustering Validity, Fei Wang, Hector-Hugo Franco-Penya, John D. Kelleher, John Pugh, Robert Ross
Silhouette is one of the most popular and effective internal measures for the evaluation of clustering validity. Simplified Silhouette is a computationally simplified version of Silhouette. However, to date Simplified Silhouette has not been systematically analysed in a specific clustering algorithm. This paper analyses the application of Simplified Silhouette to the evaluation of k-means clustering validity and compares it with the k-means Cost Function and the original Silhouette from both theoretical and empirical perspectives. The theoretical analysis shows that Simplified Silhouette has a mathematical relationship with both the k-means Cost Function and the original Silhouette, while empirically, we show that ...
Educational Magic Tricks Based On Error-Detection Schemes, 2017 Loyola University Chicago
Educational Magic Tricks Based On Error-Detection Schemes, Ronald I. Greenberg
Computer Science: Faculty Publications and Other Works
Magic tricks based on computer science concepts help grab student attention and can motivate them to delve more deeply. Error detection ideas long used by computer scientists provide a rich basis for working magic; probably the most well known trick of this type is one included in the CS Unplugged activities. This paper shows that much more powerful variations of the trick can be performed, some in an unplugged environment and some with computer assistance. Some of the tricks also show off additional concepts in computer science and discrete mathematics.
Game Specific Approaches To Monte Carlo Tree Search For Dots And Boxes, 2017 Western Kentucky University
Game Specific Approaches To Monte Carlo Tree Search For Dots And Boxes, Jared Prince
Honors College Capstone Experience/Thesis Projects
In this project, a Monte Carlo tree search player was designed and implemented for the child’s game dots and boxes, the computational burden of which has left traditional artificial intelligence approaches like minimax ineffective. Two potential improvements to this player were implemented using game-specific information about dots and boxes: the lack of information for decision-making provided by the net score and the inherent symmetry in many states. The results of these two approaches are presented, along with details about the design of the Monte Carlo tree search player. The first improvement, removing net score from the state information, was ...
Resource Bound Guarantees Via Programming Languages, 2017 The University of Western Ontario
Resource Bound Guarantees Via Programming Languages, Michael J. Burrell
Electronic Thesis and Dissertation Repository
We present a programming language in which every well-typed program halts in time polynomial with respect to its input and, more importantly, in which upper bounds on resource requirements can be inferred with certainty. Ensuring that software meets its resource constraints is important in a number of domains, most prominently in hard real-time systems and safety critical systems where failing to meet its time constraints can result in catastrophic failure. The use of test- ing in ensuring resource constraints is of limited use since the testing of every input or environment is impossible in general. Static analysis, whether via the ...
Quo Vadis-A Framework For Intelligent Routing In Large Communication Networks., 2017 Iowa State University
Quo Vadis-A Framework For Intelligent Routing In Large Communication Networks., Armin Mikler, Johnny S. Wong, Vasant Honavar
This paper presents Quo Vadis, an evolving framework for intelligent traffic management in very large communication networks. Quo Vadis is designed to exploit topological properties of large networks as well as their spatio-temporal dynamics to optimize multiple performance criteria through cooperation among nodes in the network. It employs a distributed representation of network state information using local load measurements supplemented by a less precise global summary. Routing decisions in Quo Vadis are based on parameterized heuristics designed to optimize various performance metrics in an anticipatory or pro-active as well as compensatory or reactive mode and to minimize the overhead associated ...
Tree-Based Algorithm To Find The K-Th Value In Distributed Systems, 2017 Iowa State University
Tree-Based Algorithm To Find The K-Th Value In Distributed Systems, Yoonsik Cheon, Johnny S. Wong
In this paper, we study distributed algorithms for finding the k-th value in the decentralized systems. First we consider the case of circular configuration of processors where no processor knows the total number of participants. Later a network of arbitrary configuration is examined and a tree-based algorithm is proposed. The proposed algorithm requires O(N) messages and O(log N) rounds of message passing, where N is the number of nodes in the network.
Utility-Theoretic Heuristics For Intelligent Adaptive Routing In Large Communcation Networks, 2017 Iowa State University
Utility-Theoretic Heuristics For Intelligent Adaptive Routing In Large Communcation Networks, Armin Mikler, Vasant Honavar, Johnny S. Wong
Utility theory offers an elegant and powerful theoretical framework for design and analysis of autonomous adaptive communication networks. Routing of messages in such networks presents a real-time instance of a multi-criterion quasi-optimization problem in a dynamic and uncertain environment. In this paper, we examine several heuristic decision functions that can be used to guide messages along a near-optimal (e.g., minimum delay) path in a large network. We present an analysis of properties of such heuristics under a set of simplifying assumptions about the network topology and load dynamics. In particular, we identify the conditions under which one such utility-theoretic ...
Encryption Backdoors: A Discussion Of Feasibility, Ethics, And The Future Of Cryptography, 2017 Seattle Pacific University
Encryption Backdoors: A Discussion Of Feasibility, Ethics, And The Future Of Cryptography, Jennifer A. Martin
In the age of technological advancement and the digitization of information, privacy seems to be all but an illusion. Encryption is supposed to be the white knight that keeps our information and communications safe from unwanted eyes, but how secure are the encryption algorithms that we use? Do we put too much trust in those that are charged with implementing our everyday encryption systems? This paper addresses the concept of backdoors in encryption: ways that encryption systems can be implemented so that the security can be bypassed by those that know about its existence. Many governments around the world are ...
Neural Network Ai For Fightingice, 2017 California Polytechnic State University, San Luis Obispo
Neural Network Ai For Fightingice, Alan D. Robison
Game AI in the ﬁghting game genre, along the lines of Street Fighter, Mortal Kombat and Tekken, is traditionally script-based, with hard-coded reactions to various situations. Though this approach is often easy to understand and tweak, it requires substantial time and understanding of the game to implement in a way that is challenging and satisfying for the player due to the very large possibility space. This paper explores the use of neural networks as an alternative approach by implementing and training a network to select an action to take each frame based on the game state.
Effective Ann Topologies For Use As Genotypes For Evaluating Design And Fabrication, 2017 Union College - Schenectady, NY
Effective Ann Topologies For Use As Genotypes For Evaluating Design And Fabrication, John R. Peterson
Honors Theses and Student Projects
There is promise in the field of Evolutionary Design for systems that evolve not only what to manufacture but also how to manufacture it. EvoFab is a system that uses Genetic Algorithms to evolve Artificial Neural Networks (ANNs) which control a modified 3d-printer with the goal of automating some level of invention. ANNs are an obvious choice for use with a system like this as they are canonically evolvable encodings, and have been successfully used as evolved control systems in Evolutionary Robotics. However, there is little known about how the structural characteristics of an ANN affect the shapes that can ...
Scalable And Fully Distributed Localization In Large-Scale Sensor Networks, 2017 Old Dominion University
Scalable And Fully Distributed Localization In Large-Scale Sensor Networks, Miao Jin, Su Xia, Hongyi Wu, Xianfeng David Gu
Electrical & Computer Engineering Faculty Publications
This work proposes a novel connectivity-based localization algorithm, well suitable for large-scale sensor networks with complex shapes and a non-uniform nodal distribution. In contrast to current state-of-the-art connectivity-based localization methods, the proposed algorithm is highly scalable with linear computation and communication costs with respect to the size of the network; and fully distributed where each node only needs the information of its neighbors without cumbersome partitioning and merging process. The algorithm is theoretically guaranteed and numerically stable. Moreover, the algorithm can be readily extended to the localization of networks with a one-hop transmission range distance measurement, and the propagation of ...
Cataloging Github Repositories, 2017 Singapore Management University
Cataloging Github Repositories, Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, David Lo
Research Collection School Of Information Systems
GitHub is one of the largest and most popular repository hosting service today, having about 14 million users and more than 54 million repositories as of March 2017. This makes it an excellent platform to find projects that developers are interested in exploring. GitHub showcases its most popular projects by cataloging them manually into categories such as DevOps tools, web application frameworks, and game engines. We propose that such cataloging should not be limited only to popular projects. We explore the possibility of developing such cataloging system by automatically extracting functionality descriptive text segments from readme files of GitHub repositories ...