Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication
-
- Electrical and Computer Engineering Publications (4)
- Browse all Theses and Dissertations (2)
- Electronic Thesis and Dissertation Repository (2)
- Katarina Grolinger (2)
- Research Collection School Of Computing and Information Systems (2)
-
- Turkish Journal of Electrical Engineering and Computer Sciences (2)
- UBT International Conference (2)
- USF Tampa Graduate Theses and Dissertations (2)
- CCE Theses and Dissertations (1)
- Doctoral Dissertations (1)
- International Journal of Business and Technology (1)
- Journal of Digital Forensics, Security and Law (1)
- Master of Science in Computer Science Theses (1)
- Military Cyber Affairs (1)
- SMU Data Science Review (1)
- University of New Orleans Theses and Dissertations (1)
- Wilson A Higashino (1)
- Publication Type
Articles 1 - 27 of 27
Full-Text Articles in Engineering
Parallel Algorithms For Scalable Graph Mining: Applications On Big Data And Machine Learning, Naw Safrin Sattar
Parallel Algorithms For Scalable Graph Mining: Applications On Big Data And Machine Learning, Naw Safrin Sattar
University of New Orleans Theses and Dissertations
Parallel computing plays a crucial role in processing large-scale graph data. Complex network analysis is an exciting area of research for many applications in different scientific domains e.g., sociology, biology, online media, recommendation systems and many more. Graph mining is an area of interest with diverse problems from different domains of our daily life. Due to the advancement of data and computing technologies, graph data is growing at an enormous rate, for example, the number of links in social networks is growing every millisecond. Machine/Deep learning plays a significant role for technological accomplishments to work with big data in modern …
Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux
Machine Learning With Big Data For Electrical Load Forecasting, Alexandra L'Heureux
Electronic Thesis and Dissertation Repository
Today, the amount of data collected is exploding at an unprecedented rate due to developments in Web technologies, social media, mobile and sensing devices and the internet of things (IoT). Data is gathered in every aspect of our lives: from financial information to smart home devices and everything in between. The driving force behind these extensive data collections is the promise of increased knowledge. Therefore, the potential of Big Data relies on our ability to extract value from these massive data sets. Machine learning is central to this quest because of its ability to learn from data and provide data-driven …
A New Distributed Anomaly Detection Approach For Log Ids Management Based Ondeep Learning, Murat Koca, Muhammed Ali̇ Aydin, Ahmet Sertbaş, Abdül Hali̇m Zai̇m
A New Distributed Anomaly Detection Approach For Log Ids Management Based Ondeep Learning, Murat Koca, Muhammed Ali̇ Aydin, Ahmet Sertbaş, Abdül Hali̇m Zai̇m
Turkish Journal of Electrical Engineering and Computer Sciences
Today, with the rapid increase of data, the security of big data has become more important than ever for managers. However, traditional infrastructure systems cannot cope with increasingly big data that is created like an avalanche. In addition, as the existing database systems increase licensing costs per transaction, organizations using information technologies are shifting to free and open source solutions. For this reason, we propose an anomaly attack detection model on Apache Hadoop distributed file system (HDFS), which stands out in open source big data analytics, and Apache Spark, which stands out with its speed performance in analysis to reduce …
Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal
Design Development And Performance Analysis Of Distributed Least Square Twinsupport Vector Machine For Binary Classification, Bakshi Rohit Prasad, Sonali Agarwal
Turkish Journal of Electrical Engineering and Computer Sciences
Machine learning (ML) on Big Data has gone beyond the capacity of traditional machines and technologies. ML for large scale datasets is the current focus of researchers. Most of the ML algorithms primarily suffer from memory constraints, complex computation, and scalability issues.The least square twin support vector machine (LSTSVM) technique is an extended version of support vector machine (SVM). It is much faster as compared to SVM and is widely used for classification tasks. However, when applied to large scale datasets having millions or billions of samples and/or large number of classes, it causes computational and storage bottlenecks. This paper …
Similarity-Based Chained Transfer Learning For Energy Forecasting With Big Data, Yifang Tian, Ljubisa Sehovac, Katarina Grolinger
Similarity-Based Chained Transfer Learning For Energy Forecasting With Big Data, Yifang Tian, Ljubisa Sehovac, Katarina Grolinger
Electrical and Computer Engineering Publications
Smart meter popularity has resulted in the ability to collect big energy data and has created opportunities for large-scale energy forecasting. Machine Learning (ML) techniques commonly used for forecasting, such as neural networks, involve computationally intensive training typically with data from a single building or a single aggregated load to predict future consumption for that same building or aggregated load. With hundreds of thousands of meters, it becomes impractical or even infeasible to individually train a model for each meter. Consequently, this paper proposes Similarity-Based Chained Transfer Learning (SBCTL), an approach for building neural network-based models for many meters by …
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
SMU Data Science Review
In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …
Deep Neural Network Learning-Based Classifier Design For Big-Data Analytics, Krishnan Raghavan
Deep Neural Network Learning-Based Classifier Design For Big-Data Analytics, Krishnan Raghavan
Doctoral Dissertations
"In this digital age, big-data sets are commonly found in the field of healthcare, manufacturing and others where sustainable analysis is necessary to create useful information. Big-data sets are often characterized by high-dimensionality and massive sample size. High dimensionality refers to the presence of unwanted dimensions in the data where challenges such as noise, spurious correlation and incidental endogeneity are observed. Massive sample size, on the other hand, introduces the problem of heterogeneity because complex and unstructured data types must analyzed. To mitigate the impact of these challenges while considering the application of classification, a two step analysis approach is …
Automatic Identification Of Animals In The Wild: A Comparative Study Between C-Capsule Networks And Deep Convolutional Neural Networks., Joel Kamdem Teto, Ying Xie
Automatic Identification Of Animals In The Wild: A Comparative Study Between C-Capsule Networks And Deep Convolutional Neural Networks., Joel Kamdem Teto, Ying Xie
Master of Science in Computer Science Theses
The evolution of machine learning and computer vision in technology has driven a lot of
improvements and innovation into several domains. We see it being applied for credit decisions, insurance quotes, malware detection, fraud detection, email composition, and any other area having enough information to allow the machine to learn patterns. Over the years the number of sensors, cameras, and cognitive pieces of equipment placed in the wilderness has been growing exponentially. However, the resources (human) to leverage these data into something meaningful are not improving at the same rate. For instance, a team of scientist volunteers took 8.4 years, …
Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik
Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik
International Journal of Business and Technology
The social Web provides opportunities for the public to have social interactions and online discussions. A large number of online users using the social web sites create a high volume of data. This leads to the emergence of Big Data, which focuses on computational analysis of data to reveal patterns, and associations relating to human interactions. Such analyses have vast applications in various fields such as understanding human behaviors, studying culture influence, and promoting online marketing. The blogs are one of the social web channels that offer a way to discuss various topics. Finding the top bloggers has been a …
A New Framework For Securing, Extracting And Analyzing Big Forensic Data, Hitesh Sachdev, Hayden Wimmer, Lei Chen, Carl Rebman
A New Framework For Securing, Extracting And Analyzing Big Forensic Data, Hitesh Sachdev, Hayden Wimmer, Lei Chen, Carl Rebman
Journal of Digital Forensics, Security and Law
Finding new methods to investigate criminal activities, behaviors, and responsibilities has always been a challenge for forensic research. Advances in big data, technology, and increased capabilities of smartphones has contributed to the demand for modern techniques of examination. Smartphones are ubiquitous, transformative, and have become a goldmine for forensics research. Given the right tools and research methods investigating agencies can help crack almost any illegal activity using smartphones. This paper focuses on conducting forensic analysis in exposing a terrorist or criminal network and introduces a new Big Forensic Data Framework model where different technologies of Hadoop and EnCase software are …
Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao
Graphmp: An Efficient Semi-External-Memory Big Graph Processing System On A Single Machine, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Xiaokui Xiao
Research Collection School Of Computing and Information Systems
Recent studies showed that single-machine graph processing systems can be as highly competitive as clusterbased approaches on large-scale problems. While several outof-core graph processing systems and computation models have been proposed, the high disk I/O overhead could significantly reduce performance in many practical cases. In this paper, we propose GraphMP to tackle big graph analytics on a single machine. GraphMP achieves low disk I/O overhead with three techniques. First, we design a vertex-centric sliding window (VSW) computation model to avoid reading and writing vertices on disk. Second, we propose a selective scheduling method to skip loading and processing unnecessary edge …
Conditional Correlation Analysis, Sanjeev Bhatta
Conditional Correlation Analysis, Sanjeev Bhatta
Browse all Theses and Dissertations
Correlation analysis is a frequently used statistical measure to examine the relationship among variables in different practical applications. However, the traditional correlation analysis uses an overly simplistic method to do so. It measures how two variables are related in an application by examining only their relationship in the entire underlying data space. As a result, traditional correlation analysis may miss a strong correlation between those variables especially when that relationship exists in the small subpopulation of the larger data space. This is no longer acceptable and may lose a fair share of information in this era of Big Data which …
Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo
Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo
CCE Theses and Dissertations
As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed …
Metaflow: A Scalable Metadata Lookup Service For Distributed File Systems In Data Centers, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Haiyong Xie
Metaflow: A Scalable Metadata Lookup Service For Distributed File Systems In Data Centers, Peng Sun, Yonggang Wen, Nguyen Binh Duong Ta, Haiyong Xie
Research Collection School Of Computing and Information Systems
In large-scale distributed file systems, efficient metadata operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup service could be a performance bottleneck due to its significant CPU overhead. Our investigations showed that the lookup service could reduce system throughput by up to 70%, and increase system latency by a factor of up to 8 compared to ideal scenarios. In this paper, we present MetaFlow, a scalable metadata lookup service utilizing software-defined networking (SDN) techniques to distribute lookup workload over network components. MetaFlow tackles …
Energy Consumption Prediction With Big Data: Balancing Prediction Accuracy And Computational Resources, Katarina Grolinger, Miriam Am Capretz, Luke Seewald
Energy Consumption Prediction With Big Data: Balancing Prediction Accuracy And Computational Resources, Katarina Grolinger, Miriam Am Capretz, Luke Seewald
Electrical and Computer Engineering Publications
In recent years, advances in sensor technologies and expansion of smart meters have resulted in massive growth of energy data sets. These Big Data have created new opportunities for energy prediction, but at the same time, they impose new challenges for traditional technologies. On the other hand, new approaches for handling and processing these Big Data have emerged, such as MapReduce, Spark, Storm, and Oxdata H2O. This paper explores how findings from machine learning with Big Data can benefit energy consumption prediction. An approach based on local learning with support vector regression (SVR) is presented. Although local learning itself is …
Data To Decisions For Cyberspace Operations, Steve Stone
Data To Decisions For Cyberspace Operations, Steve Stone
Military Cyber Affairs
In 2011, the United States (U.S.) Department of Defense (DOD) named cyberspace a new operational domain. The U.S. Cyber Command and the Military Services are working to make the cyberspace environment a suitable place for achieving national objectives and enabling military command and control (C2). To effectively conduct cyberspace operations, DOD requires data and analysis of the Mission, Network, and Adversary. However, the DOD’s current data processing and analysis capabilities do not meet mission needs within critical operational timelines. This paper presents a summary of the data processing and analytics necessary to effectively conduct cyberspace operations.
Energy Forecasting For Event Venues: Big Data And Prediction Accuracy, Katarina Grolinger, Alexandra L'Heureux, Miriam Am Capretz, Luke Seewald
Energy Forecasting For Event Venues: Big Data And Prediction Accuracy, Katarina Grolinger, Alexandra L'Heureux, Miriam Am Capretz, Luke Seewald
Electrical and Computer Engineering Publications
Advances in sensor technologies and the proliferation of smart meters have resulted in an explosion of energy-related data sets. These Big Data have created opportunities for development of new energy services and a promise of better energy management and conservation. Sensor-based energy forecasting has been researched in the context of office buildings, schools, and residential buildings. This paper investigates sensor-based forecasting in the context of event-organizing venues, which present an especially difficult scenario due to large variations in consumption caused by the hosted events. Moreover, the significance of the data set size, specifically the impact of temporal granularity, on energy …
Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik
Exploring The Role Of Sentiments In Identification Of Active And Influential Bloggers, Mohammad Alghobiri, Umer Ishfaq, Hikmat Ullah Khan, Tahir Afzal Malik
UBT International Conference
The social Web provides opportunities for the public to have social interactions and online discussions. A large number of online users using the social web sites create a high volume of data. This leads to the emergence of Big Data, which focuses on computational analysis of data to reveal patterns, and associations relating to human interactions. Such analyses have vast applications in various fields such as understanding human behaviors, studying culture influence, and promoting online marketing. The blogs are one of the social web channels that offer a way to discuss various topics. Finding the top bloggers has been a …
The Importance Of Big Data Analytics, Eljona Proko
The Importance Of Big Data Analytics, Eljona Proko
UBT International Conference
Identified as the tendency of IT, Big Data gained global attention. Advances in data analytics are changing the way businesses compete, enabling them to make faster and better decisions based on real-time analysis. Big Data introduces a new set of challenges. Three characteristics define Big Data: volume, variety, and velocity. Big Data requires tools and methods that can be applied to analyze and extract patterns from large-scale data. Companies generate enormous volumes of polystructured data from Web, social network posts, sensors, mobile devices, emails, and many other sources. Companies need a cost-effective, massively scalable solution for capturing, storing, and analyzing …
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Wilson A Higashino
: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …
Dcms: A Data Analytics And Management System For Molecular Simulation, Meryem Berrada
Dcms: A Data Analytics And Management System For Molecular Simulation, Meryem Berrada
USF Tampa Graduate Theses and Dissertations
Despite the fact that Molecular Simulation systems represent a major research tool in multiple scientific and engineering fields, there is still a lack of systems for effective data management and fast data retrieval and processing. This is mainly due to the nature of MS which generate a very large amount of data - a system usually encompass millions of data information, and one query usually runs for tens of thousands of time frames. For this purpose, we designed and developed a new application, DCMS (A data Analytics and Management System for molecular Simulation), that intends to speed up the process …
Browser Based Visualization For Parameter Spaces Of Big Data Using Client-Server Model, Kurtis M. Glendenning
Browser Based Visualization For Parameter Spaces Of Big Data Using Client-Server Model, Kurtis M. Glendenning
Browse all Theses and Dissertations
Visualization is an important task in data analytics, as it allows researchers to view abstract patterns within the data instead of reading through extensive raw data. Allowing the ability to interact with the visualizations is an essential aspect since it provides the ability to intuitively explore data to find meaning and patterns more efficiently. Interactivity, however, becomes progressively more difficult as the size of the dataset increases. This project begins by leveraging existing web-based data visualization technologies and extends their functionality through the use of parallel processing. This methodology utilizes state-of-the-art techniques, such as Node.js, to split the visualization rendering …
Disaster Data Management In Cloud Environments, Katarina Grolinger
Disaster Data Management In Cloud Environments, Katarina Grolinger
Katarina Grolinger
Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management. In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Katarina Grolinger
: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Electrical and Computer Engineering Publications
: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …
Disaster Data Management In Cloud Environments, Katarina Grolinger
Disaster Data Management In Cloud Environments, Katarina Grolinger
Electronic Thesis and Dissertation Repository
Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management.
In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …
Efficient And Private Processing Of Analytical Queries In Scientific Datasets, Anand Kumar
Efficient And Private Processing Of Analytical Queries In Scientific Datasets, Anand Kumar
USF Tampa Graduate Theses and Dissertations
Large amount of data is generated by applications used in basic-science research and development applications. The size of data introduces great challenges in storage, analysis and preserving privacy. This dissertation proposes novel techniques to efficiently analyze the data and reduce storage space requirements through a data compression technique while preserving privacy and providing data security.
We present an efficient technique to compute an analytical query called spatial distance histogram (SDH) using spatiotemporal properties of the data. Special spatiotemporal properties present in the data are exploited to process SDH efficiently on the fly. General purpose graphics processing units (GPGPU or just …