Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (10)
- Physical Sciences and Mathematics (10)
- Electrical and Computer Engineering (7)
- Artificial Intelligence and Robotics (2)
- Life Sciences (2)
-
- Numerical Analysis and Scientific Computing (2)
- Operations Research, Systems Engineering and Industrial Engineering (2)
- Other Computer Engineering (2)
- Systems Science (2)
- Bacteria (1)
- Bioimaging and Biomedical Optics (1)
- Bioinformatics (1)
- Biomedical Engineering and Bioengineering (1)
- Computational Biology (1)
- Computer and Systems Architecture (1)
- Digital Communications and Networking (1)
- Disease Modeling (1)
- Diseases (1)
- Genetics and Genomics (1)
- Genomics (1)
- Hardware Systems (1)
- Medicine and Health Sciences (1)
- Organisms (1)
- Signal Processing (1)
- Institution
- Publication
- Publication Type
Articles 1 - 14 of 14
Full-Text Articles in Computer Engineering
Large Genomes Assembly Using Mapreduce Framework, Yuehua Zhang
Large Genomes Assembly Using Mapreduce Framework, Yuehua Zhang
All Dissertations
Knowing the genome sequence of an organism is the essential step toward understanding its genomic and genetic characteristics. Currently, whole genome shotgun (WGS) sequencing is the most widely used genome sequencing technique to determine the entire DNA sequence of an organism. Recent advances in next-generation sequencing (NGS) techniques have enabled biologists to generate large DNA sequences in a high-throughput and low-cost way. However, the assembly of NGS reads faces significant challenges due to short reads and an enormously high volume of data. Despite recent progress in genome assembly, current NGS assemblers cannot generate high-quality results or efficiently handle large genomes …
Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal
Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal
Turkish Journal of Electrical Engineering and Computer Sciences
Machine learning (ML) on Big Data has gone beyond the capacity of traditional machines and technologies. ML for large scale datasets is the current focus of researchers. Most of the ML algorithms primarily suffer from memory constraints, complex computation, and scalability issues.The least square twin support vector machine (LSTSVM) technique is an extended version of support vector machine (SVM). It is much faster as compared to SVM and is widely used for classification tasks. However, when applied to large scale datasets having millions or billions of samples and/or large number of classes, it causes computational and storage bottlenecks. This paper …
Scale-Invariant Histogram Of Oriented Gradients: Novel Approach For Pedestriandetection In Multiresolution Image Dataset, Sweta Panigrahi, Surya Narayana Raju Undi
Scale-Invariant Histogram Of Oriented Gradients: Novel Approach For Pedestriandetection In Multiresolution Image Dataset, Sweta Panigrahi, Surya Narayana Raju Undi
Turkish Journal of Electrical Engineering and Computer Sciences
This paper proposes a scale-invariant histogram of oriented gradients (SI-HOG) for pedestrian detection. Most of the algorithms for pedestrian detection use the HOG as the basic feature and combine other features with the HOG to form the feature set, which is usually applied with a support vector machine (SVM). Hence, the HOG feature is the most efficient and fundamental feature for pedestrian detection. However, the HOG feature produces feature vectors of different lengths for different image resolutions; thus, the feature vectors are incomparable for the SVM. The proposed method forms a scale-space pyramid wherein the histogram bin is calculated. Thus, …
A Counter Based Approach For Reducer Placement With Augmented Hadoop Rackawareness, Mir Wajahat Hussain, K Hemant Reddy, Diptendu Sinha Roy
A Counter Based Approach For Reducer Placement With Augmented Hadoop Rackawareness, Mir Wajahat Hussain, K Hemant Reddy, Diptendu Sinha Roy
Turkish Journal of Electrical Engineering and Computer Sciences
As the data-driven paradigm for intelligent systems design is gaining prominence, performance requirements have become very stringent, leading to numerous fine-tuned versions of Hadoop and its MapReduce programming model. However, very few researchers have investigated the effect of intelligent reducer placement on Hadoop's performance. This paper delves into this much ignored reducer placement phase for improving Hadoop's performance and proposes to spawn reduce phase of Hadoop tasks in an asynchronous fashion across nodes in a Hadoop cluster. The main contributions of this paper are: (i) to track when map phase of tasks are completed, (ii) to count the number of …
Scalable Profiling And Visualization For Characterizing Microbiomes, Camilo Valdes
Scalable Profiling And Visualization For Characterizing Microbiomes, Camilo Valdes
FIU Electronic Theses and Dissertations
Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information.
The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters …
The Scheduling Algorithm Of Cloud Job Based On Hopfield Neural Network, Yudong Guo, Jinping Zuo
The Scheduling Algorithm Of Cloud Job Based On Hopfield Neural Network, Yudong Guo, Jinping Zuo
Journal of System Simulation
Abstract: Focusing on the low efficiency of cloud job scheduling and the insufficient utility of resource, a job scheduling algorithm based on Hopfield Neural Network is proposed. In order to improve the resource scheduling ability of the system, The resource characteristics which influence the cloud job scheduling are shown. The mathematical model of resource constraints is established, and the Hopfield energy function is designed and optimized. The average utilization rate of 9 nodes is analyzed by using the standard test cases, and the performance and resource utilization of the proposed strategy are compared with three typical algorithms. …
Cloud Job Scheduling Model Based On Improved Plant Growth Algorithm, Li Qiang, Xiaofeng Liu
Cloud Job Scheduling Model Based On Improved Plant Growth Algorithm, Li Qiang, Xiaofeng Liu
Journal of System Simulation
Abstract: The performance of cloud job scheduling algorithm has a great importance to the whole cloud system. The key factors that affect cloud operation scheduling are found out, and a resource constraint model is established. The existing simulation plant growth algorithm is improved based on the Logistic model of plant growth law, so that the plant growth way was made to change according to the energy power. The comparison of four different plant models was carried out and their different features were analyzed. Compared with 6 typical cloud job scheduling algorithms, it is concluded that the improved simulation plant growth …
Scheduling In Mapreduce Clusters, Chen He
Scheduling In Mapreduce Clusters, Chen He
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and the fault-tolerance feature of the framework make it very popular in Big Data processing.
As MapReduce clusters get popular, their scheduling becomes increasingly important. On one hand, many MapReduce applications have high performance requirements, for example, on response time and/or throughput. On the other hand, with the increasing size of MapReduce clusters, the energy-efficient scheduling of MapReduce clusters becomes inevitable. These scheduling challenges, however, have not been systematically studied.
The objective of this dissertation is to …
Hadoop Framework Implementation And Performance Analysis On A Cloud, Göksu Zeki̇ye Özen, Mehmet Tekerek, Rayi̇mbek Sultanov
Hadoop Framework Implementation And Performance Analysis On A Cloud, Göksu Zeki̇ye Özen, Mehmet Tekerek, Rayi̇mbek Sultanov
Turkish Journal of Electrical Engineering and Computer Sciences
The Hadoop framework uses the MapReduce programming paradigm to process big data by distributing data across a cluster and aggregating. MapReduce is one of the methods used to process big data hosted on large clusters. In this method, jobs are processed by dividing into small pieces and distributing over nodes. Parameters such as distributing method over nodes, the number of jobs held in a parallel fashion, and the number of nodes in the cluster affect the execution time of jobs. The aim of this paper is to determine how the numbers of nodes, maps, and reduces affect the performance of …
A Mapreduce-Based Distributed Svm Algorithm For Binary Classification, Ferhat Özgür Çatak, Mehmet Erdal Balaban
A Mapreduce-Based Distributed Svm Algorithm For Binary Classification, Ferhat Özgür Çatak, Mehmet Erdal Balaban
Turkish Journal of Electrical Engineering and Computer Sciences
Although the support vector machine (SVM) algorithm has a high generalization property for classifying unseen examples after the training phase~and a small loss value, the algorithm is not suitable for real-life classification and regression problems. SVMs cannot solve hundreds of thousands of examples in a training dataset. In previous studies on distributed machine-learning algorithms, the SVM was trained in a costly and preconfigured computer environment. In this research, we present a MapReduce-based distributed parallel SVM training algorithm for binary classification problems. This work shows how to distribute optimization problems over cloud computing systems with the MapReduce technique. In the second …
Scalable Sentiment Analytics, Aslan Baki̇rov, Kevser Nur Çoğalmiş, Ahmet Bulut
Scalable Sentiment Analytics, Aslan Baki̇rov, Kevser Nur Çoğalmiş, Ahmet Bulut
Turkish Journal of Electrical Engineering and Computer Sciences
Spark has become a widely popular analytics framework that provides an implementation of the equally popular MapReduce programming model. Hadoop is an Apache foundation framework that can be used for processing large datasets on a cluster of computers using the MapReduce programming model. Mahout is an Apache foundation project developed for building scalable machine learning libraries, which includes built-in machine learning classifiers. In this paper, we show how to build a simple text classifier on Spark, Apache Hadoop, and Apache Mahout for extracting out sentiments from a text collection containing millions of text documents. Using a collection of 7 million …
Distributed Formal Concept Analysis Algorithms Based On An Iterative Mapreduce Framework, Ruairí De Fréin, Biao Xu, Eric Robson, Mícheál Ó Fóghlú
Distributed Formal Concept Analysis Algorithms Based On An Iterative Mapreduce Framework, Ruairí De Fréin, Biao Xu, Eric Robson, Mícheál Ó Fóghlú
Conference papers
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable for distributed implementation. Taking the MapReduce (MR) framework as our inspiration we introduce a distributed approach for performing formal concept mining. Our method has its novelty in that we use a light-weight MapReduce runtime called Twister which is better suited to iterative algorithms than recent distributed approaches. First, we describe the theoretical foundations underpinning our distributed formal concept analysis approach. Second, we provide a representative exemplar of how a classic centralized algorithm can be implemented in a distributed fashion using our methodology: we modify Ganter’s classic algorithm …
Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas
Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for …
Improving Performance And Programmer Productivity For I/O-Intensive High Performance Computing Applications, Saba Sehrish
Improving Performance And Programmer Productivity For I/O-Intensive High Performance Computing Applications, Saba Sehrish
Electronic Theses and Dissertations
Due to the explosive growth in the size of scientific data sets, data-intensive computing is an emerging trend in computational science. HPC applications are generating and processing large amount of data ranging from terabytes (TB) to petabytes (PB). This new trend of growth in data for HPC applications has imposed challenges as to what is an appropriate parallel programming framework to efficiently process large data sets. In this work, we study the applicability of two programming models (MPI/MPI-IO and MapReduce) to a variety of I/O-intensive HPC applications ranging from simulations to analytics. We identify several performance and programmer productivity related …