Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

Big Data

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 35

Full-Text Articles in Engineering

Building A Benchmark For Industrial Iot Application, Pranay K. Tiru, Soma Tummala Oct 2023

Building A Benchmark For Industrial Iot Application, Pranay K. Tiru, Soma Tummala

College of Engineering Summer Undergraduate Research Program

In this project, we have developed a rather robust means of processing and displaying large sums of IoT data using several cutting-edge, industry-standard technologies. Our data pipeline integrates physical sensors that send various environmental data like temperature, humidity, and pressure. Once created, the data is then collected at an MQTT broker, streamed through a Kafka cluster, processed within a Spark Cluster, and stored in a Cassandra database.

In order to test the rigidity of the pipeline, we also created virtual sensors. This allowed us to send an immense amount of data, which wasn’t necessarily feasible with just the physical sensors. …


Parallel Algorithms For Scalable Graph Mining: Applications On Big Data And Machine Learning, Naw Safrin Sattar Aug 2022

Parallel Algorithms For Scalable Graph Mining: Applications On Big Data And Machine Learning, Naw Safrin Sattar

University of New Orleans Theses and Dissertations

Parallel computing plays a crucial role in processing large-scale graph data. Complex network analysis is an exciting area of research for many applications in different scientific domains e.g., sociology, biology, online media, recommendation systems and many more. Graph mining is an area of interest with diverse problems from different domains of our daily life. Due to the advancement of data and computing technologies, graph data is growing at an enormous rate, for example, the number of links in social networks is growing every millisecond. Machine/Deep learning plays a significant role for technological accomplishments to work with big data in modern …


Learning Analytics For The Formative Assessment Of New Media Skills, Negar Shabihi Mar 2022

Learning Analytics For The Formative Assessment Of New Media Skills, Negar Shabihi

Electronic Thesis and Dissertation Repository

Recent theories of education have shifted learning environments towards student-centred education. Also, the advancement of technology and the need for skilled individuals in different areas have led to the introduction of new media skills. Along with new pedagogies and content, these changes require new forms of assessment. However, assessment as the core of learning has not been modified as much as other educational aspects. Hence, much attention is required to develop assessment methods based on current educational requirements. To address this gap, we have implemented two data-driven systematic literature reviews to recognize the existing state of the field in the …


A Method For Monitoring Operating Equipment Effectiveness With The Internet Of Things And Big Data, Carl D. Hays Iii Jun 2021

A Method For Monitoring Operating Equipment Effectiveness With The Internet Of Things And Big Data, Carl D. Hays Iii

Master's Theses

The purpose of this paper was to use the Overall Equipment Effectiveness productivity formula in plant manufacturing and convert it to measuring productivity for forklifts. Productivity for a forklift was defined as being available and picking up and moving containers at port locations in Seattle and Alaska. This research uses performance measures in plant manufacturing and applies them to mobile equipment in order to establish the most effective means of analyzing reliability and productivity. Using the Internet of Things to collect data on fifteen forklift trucks in three different locations, this data was then analyzed over a six-month period to …


A New Distributed Anomaly Detection Approach For Log Ids Management Based Ondeep Learning, Murat Koca, Muhammed Ali̇ Aydin, Ahmet Sertbaş, Abdül Hali̇m Zai̇m Jan 2021

A New Distributed Anomaly Detection Approach For Log Ids Management Based Ondeep Learning, Murat Koca, Muhammed Ali̇ Aydin, Ahmet Sertbaş, Abdül Hali̇m Zai̇m

Turkish Journal of Electrical Engineering and Computer Sciences

Today, with the rapid increase of data, the security of big data has become more important than ever for managers. However, traditional infrastructure systems cannot cope with increasingly big data that is created like an avalanche. In addition, as the existing database systems increase licensing costs per transaction, organizations using information technologies are shifting to free and open source solutions. For this reason, we propose an anomaly attack detection model on Apache Hadoop distributed file system (HDFS), which stands out in open source big data analytics, and Apache Spark, which stands out with its speed performance in analysis to reduce …


Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal Jan 2021

Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal

Turkish Journal of Electrical Engineering and Computer Sciences

Machine learning (ML) on Big Data has gone beyond the capacity of traditional machines and technologies. ML for large scale datasets is the current focus of researchers. Most of the ML algorithms primarily suffer from memory constraints, complex computation, and scalability issues.The least square twin support vector machine (LSTSVM) technique is an extended version of support vector machine (SVM). It is much faster as compared to SVM and is widely used for classification tasks. However, when applied to large scale datasets having millions or billions of samples and/or large number of classes, it causes computational and storage bottlenecks. This paper …


A Study On The Improvement Of Data Collection In Data Centers And Its Analysis On Deep Learning-Based Applications, Dipak Kumar Singh Jun 2020

A Study On The Improvement Of Data Collection In Data Centers And Its Analysis On Deep Learning-Based Applications, Dipak Kumar Singh

LSU Doctoral Dissertations

Big data are usually stored in data center networks for processing and analysis through various cloud applications. Such applications are a collection of data-intensive jobs which often involve many parallel flows and are network bound in the distributed environment. The recent networking abstraction, coflow, for data parallel programming paradigm to express the communication requirements has opened new opportunities to network scheduling for such applications. Therefore, I propose coflow based network scheduling algorithm, Coflourish, to enhance the job completion time for such data-parallel applications, in the presence of the increased background traffic to mimic the cloud environment infrastructure. It outperforms …


Design And Implementation Of Anomaly Detections For User Authentication Framework, Iman Abu Sulayman Dec 2019

Design And Implementation Of Anomaly Detections For User Authentication Framework, Iman Abu Sulayman

Electronic Thesis and Dissertation Repository

Anomaly detection is quickly becoming a very significant tool for a variety of applications such as intrusion detection, fraud detection, fault detection, system health monitoring, and event detection in IoT devices. An application that lacks a strong implementation for anomaly detection is user trait modeling for user authentication purposes. User trait models expose up-to-date representation of the user so that changes in their interests, their learning progress or interactions with the system are noticed and interpreted. The reason behind the lack of adoption in user trait modeling arises from the need of a continuous flow of high-volume data, that is …


Similarity-Based Chained Transfer Learning For Energy Forecasting With Big Data, Yifang Tian, Ljubisa Sehovac, Katarina Grolinger Sep 2019

Similarity-Based Chained Transfer Learning For Energy Forecasting With Big Data, Yifang Tian, Ljubisa Sehovac, Katarina Grolinger

Electrical and Computer Engineering Publications

Smart meter popularity has resulted in the ability to collect big energy data and has created opportunities for large-scale energy forecasting. Machine Learning (ML) techniques commonly used for forecasting, such as neural networks, involve computationally intensive training typically with data from a single building or a single aggregated load to predict future consumption for that same building or aggregated load. With hundreds of thousands of meters, it becomes impractical or even infeasible to individually train a model for each meter. Consequently, this paper proposes Similarity-Based Chained Transfer Learning (SBCTL), an approach for building neural network-based models for many meters by …


Big Five Technologies In Aeronautical Engineering Education: Scoping Review, Ruth Martinez-Lopez Jan 2019

Big Five Technologies In Aeronautical Engineering Education: Scoping Review, Ruth Martinez-Lopez

International Journal of Aviation, Aeronautics, and Aerospace

The constant demands that technology creates in aerospace engineering also influence education. The identification of the technologies with practical application in aerospace engineering is of current interest to decision makers in both universities and industry. A social network approach enhances this scoping review of the research literature to identify the main topics using the Big Five technologies in aerospace engineering education. The conceptual structure of the dataset (n=447) was analyzed from different approaches: at macro-level, a comparative of the digital technology identified by cluster analysis with the number of co-words established in 3 and 8 and, a keyword central structure …


Automatic Identification Of Animals In The Wild: A Comparative Study Between C-Capsule Networks And Deep Convolutional Neural Networks., Joel Kamdem Teto, Ying Xie Nov 2018

Automatic Identification Of Animals In The Wild: A Comparative Study Between C-Capsule Networks And Deep Convolutional Neural Networks., Joel Kamdem Teto, Ying Xie

Master of Science in Computer Science Theses

The evolution of machine learning and computer vision in technology has driven a lot of

improvements and innovation into several domains. We see it being applied for credit decisions, insurance quotes, malware detection, fraud detection, email composition, and any other area having enough information to allow the machine to learn patterns. Over the years the number of sensors, cameras, and cognitive pieces of equipment placed in the wilderness has been growing exponentially. However, the resources (human) to leverage these data into something meaningful are not improving at the same rate. For instance, a team of scientist volunteers took 8.4 years, …


Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik Nov 2018

Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik

International Journal of Business and Technology

The social Web provides opportunities for the public to have social interactions and online discussions. A large number of online users using the social web sites create a high volume of data. This leads to the emergence of Big Data, which focuses on computational analysis of data to reveal patterns, and associations relating to human interactions. Such analyses have vast applications in various fields such as understanding human behaviors, studying culture influence, and promoting online marketing. The blogs are one of the social web channels that offer a way to discuss various topics. Finding the top bloggers has been a …


A New Framework For Securing, Extracting And Analyzing Big Forensic Data, Hitesh Sachdev, Hayden Wimmer, Lei Chen, Carl Rebman Oct 2018

A New Framework For Securing, Extracting And Analyzing Big Forensic Data, Hitesh Sachdev, Hayden Wimmer, Lei Chen, Carl Rebman

Journal of Digital Forensics, Security and Law

Finding new methods to investigate criminal activities, behaviors, and responsibilities has always been a challenge for forensic research. Advances in big data, technology, and increased capabilities of smartphones has contributed to the demand for modern techniques of examination. Smartphones are ubiquitous, transformative, and have become a goldmine for forensics research. Given the right tools and research methods investigating agencies can help crack almost any illegal activity using smartphones. This paper focuses on conducting forensic analysis in exposing a terrorist or criminal network and introduces a new Big Forensic Data Framework model where different technologies of Hadoop and EnCase software are …


Resampling Methods And Visualization Tools For Computer Performance Comparisons In The Presence Of Performance Variation, Samuel Oridge Irving Apr 2018

Resampling Methods And Visualization Tools For Computer Performance Comparisons In The Presence Of Performance Variation, Samuel Oridge Irving

LSU Master's Theses

Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data Analytics increases. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between …


Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao Dec 2017

Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Recent studies showed that single-machine graph processing systems can be as highly competitive as clusterbased approaches on large-scale problems. While several outof-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge …


A Study Of Application-Awareness In Software-Defined Data Center Networks, Chui-Hui Chiu Nov 2017

A Study Of Application-Awareness In Software-Defined Data Center Networks, Chui-Hui Chiu

LSU Doctoral Dissertations

A data center (DC) has been a fundamental infrastructure for academia and industry for many years. Applications in DC have diverse requirements on communication. There are huge demands on data center network (DCN) control frameworks (CFs) for coordinating communication traffic. Simultaneously satisfying all demands is difficult and inefficient using existing traditional network devices and protocols. Recently, the agile software-defined Networking (SDN) is introduced to DCN for speeding up the development of the DCNCF. Application-awareness preserves the application semantics including the collective goals of communications. Previous works have illustrated that application-aware DCNCFs can much more efficiently allocate network resources by explicitly …


Wide-Area Measurement-Driven Approaches For Power System Modeling And Analytics, Hesen Liu Aug 2017

Wide-Area Measurement-Driven Approaches For Power System Modeling And Analytics, Hesen Liu

Doctoral Dissertations

This dissertation presents wide-area measurement-driven approaches for power system modeling and analytics. Accurate power system dynamic models are the very basis of power system analysis, control, and operation. Meanwhile, phasor measurement data provide first-hand knowledge of power system dynamic behaviors. The idea of building out innovative applications with synchrophasor data is promising.

Taking advantage of the real-time wide-area measurements, one of phasor measurements’ novel applications is to develop a synchrophasor-based auto-regressive with exogenous inputs (ARX) model that can be updated online to estimate or predict system dynamic responses.

Furthermore, since auto-regressive models are in a big family, the ARX model …


Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo Jan 2017

Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo

CCE Theses and Dissertations

As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed …


Conditional Correlation Analysis, Sanjeev Bhatta Jan 2017

Conditional Correlation Analysis, Sanjeev Bhatta

Browse all Theses and Dissertations

Correlation analysis is a frequently used statistical measure to examine the relationship among variables in different practical applications. However, the traditional correlation analysis uses an overly simplistic method to do so. It measures how two variables are related in an application by examining only their relationship in the entire underlying data space. As a result, traditional correlation analysis may miss a strong correlation between those variables especially when that relationship exists in the small subpopulation of the larger data space. This is no longer acceptable and may lose a fair share of information in this era of Big Data which …


Accelerating Big Data Applications Using Lightweight Virtualization Framework On Enterprise Cloud, Janki Bhimani, Zhengyu Yang, Miriam Leeser, Ningfang Mi Dec 2016

Accelerating Big Data Applications Using Lightweight Virtualization Framework On Enterprise Cloud, Janki Bhimani, Zhengyu Yang, Miriam Leeser, Ningfang Mi

Zhengyu Yang

Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and …


Metaflow: A Scalable Metadata Lookup Service For Distributed File Systems In Data Centers, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Haiyong Xie Sep 2016

Metaflow: A Scalable Metadata Lookup Service For Distributed File Systems In Data Centers, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Haiyong Xie

Research Collection School Of Computing and Information Systems

In large-scale distributed file systems, efficient metadata operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup service could be a performance bottleneck due to its significant CPU overhead. Our investigations showed that the lookup service could reduce system throughput by up to 70%, and increase system latency by a factor of up to 8 compared to ideal scenarios. In this paper, we present MetaFlow, a scalable metadata lookup service utilizing software-defined networking (SDN) techniques to distribute lookup workload over network components. MetaFlow tackles …


Cepsim: Modelling And Simulation Of Complex Event Processing Systems In Cloud Environments, Wilson A. Higashino, Miriam Am Capretz, Luiz F. Bittencourt Jan 2016

Cepsim: Modelling And Simulation Of Complex Event Processing Systems In Cloud Environments, Wilson A. Higashino, Miriam Am Capretz, Luiz F. Bittencourt

Electrical and Computer Engineering Publications

The emergence of Big Data has had profound impacts on how data are stored and processed. As technologies created to process continuous streams of data with low latency, Complex Event Processing (CEP) and Stream Processing (SP) have often been related to the Big Data velocity dimension and used in this context. Many modern CEP and SP systems leverage cloud environments to provide the low latency and scalability required by Big Data applications, yet validating these systems at the required scale is a research problem per se. Cloud computing simulators have been used as a tool to facilitate reproducible and repeatable …


Data To Decisions For Cyberspace Operations, Steve Stone Dec 2015

Data To Decisions For Cyberspace Operations, Steve Stone

Military Cyber Affairs

In 2011, the United States (U.S.) Department of Defense (DOD) named cyberspace a new operational domain. The U.S. Cyber Command and the Military Services are working to make the cyberspace environment a suitable place for achieving national objectives and enabling military command and control (C2). To effectively conduct cyberspace operations, DOD requires data and analysis of the Mission, Network, and Adversary. However, the DOD’s current data processing and analysis capabilities do not meet mission needs within critical operational timelines. This paper presents a summary of the data processing and analytics necessary to effectively conduct cyberspace operations.


Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik Nov 2015

Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik

UBT International Conference

The social Web provides opportunities for the public to have social interactions and online discussions. A large number of online users using the social web sites create a high volume of data. This leads to the emergence of Big Data, which focuses on computational analysis of data to reveal patterns, and associations relating to human interactions. Such analyses have vast applications in various fields such as understanding human behaviors, studying culture influence, and promoting online marketing. The blogs are one of the social web channels that offer a way to discuss various topics. Finding the top bloggers has been a …


The Importance Of Big Data Analytics, Eljona Proko Nov 2015

The Importance Of Big Data Analytics, Eljona Proko

UBT International Conference

Identified as the tendency of IT, Big Data gained global attention. Advances in data analytics are changing the way businesses compete, enabling them to make faster and better decisions based on real-time analysis. Big Data introduces a new set of challenges. Three characteristics define Big Data: volume, variety, and velocity. Big Data requires tools and methods that can be applied to analyze and extract patterns from large-scale data. Companies generate enormous volumes of polystructured data from Web, social network posts, sensors, mobile devices, emails, and many other sources. Companies need a cost-effective, massively scalable solution for capturing, storing, and analyzing …


Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz May 2015

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz

Wilson A Higashino

: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …


Dcms: A Data Analytics And Management System For Molecular Simulation, Meryem Berrada Mar 2015

Dcms: A Data Analytics And Management System For Molecular Simulation, Meryem Berrada

USF Tampa Graduate Theses and Dissertations

Despite the fact that Molecular Simulation systems represent a major research tool in multiple scientific and engineering fields, there is still a lack of systems for effective data management and fast data retrieval and processing. This is mainly due to the nature of MS which generate a very large amount of data - a system usually encompass millions of data information, and one query usually runs for tens of thousands of time frames. For this purpose, we designed and developed a new application, DCMS (A data Analytics and Management System for molecular Simulation), that intends to speed up the process …


Browser Based Visualization For Parameter Spaces Of Big Data Using Client-Server Model, Kurtis M. Glendenning Jan 2015

Browser Based Visualization For Parameter Spaces Of Big Data Using Client-Server Model, Kurtis M. Glendenning

Browse all Theses and Dissertations

Visualization is an important task in data analytics, as it allows researchers to view abstract patterns within the data instead of reading through extensive raw data. Allowing the ability to interact with the visualizations is an essential aspect since it provides the ability to intuitively explore data to find meaning and patterns more efficiently. Interactivity, however, becomes progressively more difficult as the size of the dataset increases. This project begins by leveraging existing web-based data visualization technologies and extends their functionality through the use of parallel processing. This methodology utilizes state-of-the-art techniques, such as Node.js, to split the visualization rendering …


Improvements On Scientific System Analysis, Vladimir Grupchev Jan 2015

Improvements On Scientific System Analysis, Vladimir Grupchev

USF Tampa Graduate Theses and Dissertations

Thanks to the advancement of the modern computer simulation systems, many scientific applications generate, and require manipulation of large volumes of data. Scientific exploration substantially relies on effective and accurate data analysis. The shear size of the generated data, however, imposes big challenges in the process of analyzing the system. In this dissertation we propose novel techniques as well as using some known designs in a novel way in order to improve scientific data analysis.

We develop an efficient method to compute an analytical query called spatial distance histogram (SDH). Special heuristics are exploited to process SDH efficiently and accurately. …


Disaster Data Management In Cloud Environments, Katarina Grolinger Jan 2014

Disaster Data Management In Cloud Environments, Katarina Grolinger

Katarina Grolinger

Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management. In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …