Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- New Jersey Institute of Technology (6)
- University of Texas at Arlington (6)
- Old Dominion University (4)
- Walden University (3)
- Boise State University (2)
-
- Louisiana State University (2)
- Purdue University (2)
- United Arab Emirates University (2)
- University of Nevada, Las Vegas (2)
- University of South Florida (2)
- City University of New York (CUNY) (1)
- Coastal Carolina University (1)
- DePaul University (1)
- Governors State University (1)
- James Madison University (1)
- Marquette University (1)
- Michigan Technological University (1)
- Missouri University of Science and Technology (1)
- Nova Southeastern University (1)
- Portland State University (1)
- Singapore Management University (1)
- Southeastern University (1)
- University at Albany, State University of New York (1)
- University of Missouri, St. Louis (1)
- University of New Mexico (1)
- University of Texas Rio Grande Valley (1)
- Virginia Commonwealth University (1)
- Wayne State University (1)
- Wilfrid Laurier University (1)
- World Maritime University (1)
- Publication Year
- Publication
-
- Dissertations (6)
- Computer Science and Engineering Theses (4)
- Walden Dissertations and Doctoral Studies (3)
- Boise State University Theses and Dissertations (2)
- Computational Modeling & Simulation Engineering Theses & Dissertations (2)
-
- Computer Science and Engineering Dissertations (2)
- LSU Doctoral Dissertations (2)
- Theses (2)
- UNLV Theses, Dissertations, Professional Papers, and Capstones (2)
- USF Tampa Graduate Theses and Dissertations (2)
- Accounting Dissertations (1)
- All Capstone Projects (1)
- CCE Theses and Dissertations (1)
- College of Computing and Digital Media Dissertations (1)
- Computer Science ETDs (1)
- Computer Science Theses & Dissertations (1)
- Dissertations (1934 -) (1)
- Dissertations and Theses (1)
- Dissertations and Theses Collection (Open Access) (1)
- Dissertations, Master's Theses and Master's Reports (1)
- Dissertations, Theses, and Capstone Projects (1)
- Doctor of Education (Ed.D) (1)
- Doctoral Dissertations (1)
- Electrical & Computer Engineering Theses & Dissertations (1)
- Honors Theses (1)
- Legacy Theses & Dissertations (2009 - 2024) (1)
- Masters Theses, 2020-current (1)
- Open Access Dissertations (1)
- Open Access Theses (1)
- Theses and Dissertations (1)
Articles 1 - 30 of 51
Full-Text Articles in Physical Sciences and Mathematics
Interposition Based Container Optimization For Data Intensive Applications, Rohan Tikmany
Interposition Based Container Optimization For Data Intensive Applications, Rohan Tikmany
College of Computing and Digital Media Dissertations
Reproducibility of applications is paramount in several scenarios such as collaborative work and software testing. Containers provide an easy way of addressing reproducibility by packaging the application's software and data dependencies into one executable unit, which can be executed multiple times in different environments. With the increased use of containers in industry as well as academia, current research has examined the provisioning and storage cost of containers and has shown that container deployments often include unnecessary software packages. Current methods to optimize the container size prune unnecessary data at the granularity of files and thus make binary decisions. We show …
Digital Dna: The Ethical Implications Of Big Data As The World’S New-Age Commodity, Clark H. Dotson
Digital Dna: The Ethical Implications Of Big Data As The World’S New-Age Commodity, Clark H. Dotson
Honors Theses
In the emerging digital world that we find ourselves in, it becomes apparent that data collection has become a staple of daily life, whether we like it or not. This research discussion aims to bring light to just how much one’s own digital identity is valued in the technologically-infused world of today, with distinct research and local examples to bring awareness to the ethical implications of your online presence. The paper in question examines anecdotal and research evidence of the collection of data, both through true and unjust means, as well as ethical implications of what this information truly represents. …
A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong
A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong
Dissertations and Theses Collection (Open Access)
With the rise of big data and artificial intelligence, data intelligence has gradually become the focus of academia and industry. Data intelligence has two obvious characteristics: big data drive and application scene drive. More and more enterprises extract valuable patterns contained in data with prediction and decision analysis methods and technologies such as large-scale data mining, machine learning and deep learning and use them to improve the management and decision in complex practice, so as to promote changes of new business modes, organizational structures and even business strategies, and improve the operational efficiency of organizations. However, there are few studies …
Compilation Optimizations To Enhance Resilience Of Big Data Programs And Quantum Processors, Travis D. Lecompte
Compilation Optimizations To Enhance Resilience Of Big Data Programs And Quantum Processors, Travis D. Lecompte
LSU Doctoral Dissertations
Modern computers can experience a variety of transient errors due to the surrounding environment, known as soft faults. Although the frequency of these faults is low enough to not be noticeable on personal computers, they become a considerable concern during large-scale distributed computations or systems in more vulnerable environments like satellites. These faults occur as a bit flip of some value in a register, operation, or memory during execution. They surface as either program crashes, hangs, or silent data corruption (SDC), each of which can waste time, money, and resources. Hardware methods, such as shielding or error correcting memory (ECM), …
Efficient And Scalable Triangle Centrality Algorithms In The Arkouda Framework, Joseph Thomas Patchett
Efficient And Scalable Triangle Centrality Algorithms In The Arkouda Framework, Joseph Thomas Patchett
Theses
Graph data structures provide a unique challenge for both analysis and algorithm development. These data structures are irregular in that memory accesses are not known a priori and accesses to these structures tend to lack locality.
Despite these challenges, graph data structures are a natural way to represent relationships between entities and to exhibit unique features about these relationships. The network created from these relationships can create unique local structures that can describe the behavior between members of these structures. Graphs can be analyzed in a number of different ways including at a high level in community detection and at …
On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee
On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee
Dissertations
A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …
Translation Of Array-Based Loop Programs To Optimized Sql-Based Distributed Programs, Md Hasanuzzaman Noor
Translation Of Array-Based Loop Programs To Optimized Sql-Based Distributed Programs, Md Hasanuzzaman Noor
Computer Science and Engineering Dissertations
Most programs written to operate on data are usually expressed in terms of array operations in sequential loops. However, these programs do not scale to large amount of data generated by scientific experiments and industrial and commercial markets. Given the success of machine learning algorithms on large amount of data and the recent shift of industries to data-driven decision making, the data scientists who are not familiar with Big Data frameworks have to rewrite the sequential programs to distributed data-parallel programs by hand. We present a novel framework, called SQLgen, that automatically translates sequential loops to distributed data-parallel programs. SQLgen …
Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma
Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma
Computational Modeling & Simulation Engineering Theses & Dissertations
The rapid rise of shared electric scooter (E-Scooter) systems offers many urban areas a new micro-mobility solution. The portable and flexible characteristics have made E-Scooters a competitive mode for short-distance trips. Compared to other modes such as bikes, E-Scooters allow riders to freely ride on different facilities such as streets, sidewalks, and bike lanes. However, sharing lanes with vehicles and other users tends to cause safety issues for riding E-Scooters. Conventional methods are often not applicable for analyzing such safety issues because well-archived historical crash records are not commonly available for emerging E-Scooters.
Perceiving the growth of such a micro-mobility …
Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv
Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv
Theses and Dissertations
With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Theses and Dissertations (Comprehensive)
This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …
Performance Optimization Of Big Data Computing Workflows For Batch And Stream Data Processing In Multi-Clouds, Huiyan Cao
Dissertations
Workflow techniques have been widely used as a major computing solution in many science domains. With the rapid deployment of cloud infrastructures around the globe and the economic benefits of cloud-based computing and storage services, an increasing number of scientific workflows have migrated or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy various data- and network-intensive computing workflows such as serial computing workflows, MapReduce/Spark-based workflows, and Storm-based stream data processing workflows in multi-cloud environments, where inter-cloud data transfer oftentimes plays a significant role in both workflow performance …
Improving A Wireless Localization System Via Machine Learning Techniques And Security Protocols, Zachary Yorio
Improving A Wireless Localization System Via Machine Learning Techniques And Security Protocols, Zachary Yorio
Masters Theses, 2020-current
The recent advancements made in Internet of Things (IoT) devices have brought forth new opportunities for technologies and systems to be integrated into our everyday life. In this work, we investigate how edge nodes can effectively utilize 802.11 wireless beacon frames being broadcast from pre-existing access points in a building to achieve room-level localization. We explain the needed hardware and software for this system and demonstrate a proof of concept with experimental data analysis. Improvements to localization accuracy are shown via machine learning by implementing the random forest algorithm. Using this algorithm, historical data can train the model and make …
Mr_Qp: A Scalable Approach To Query Processing On Arbitrary-Size Graphs Using The Map/Reduce Framework, Harshit Ashokkumar Modi
Mr_Qp: A Scalable Approach To Query Processing On Arbitrary-Size Graphs Using The Map/Reduce Framework, Harshit Ashokkumar Modi
Computer Science and Engineering Theses
The utility and widespread use of Relational Database Management Systems(RDBMSs) comes not only from its simple, easy-to-understand data model (a relation or a set) but mainly from the ability to write non-procedural queries and their optimization by the system. Queries produce exact answers that match the contents of the database. Query processing of RDBMSs has been researched for more than 4 decades and includes extensions to more complex analysis on data warehouses. In contrast, search has not been addressed by RDBMSs. As the use of other other data types (key-value store, column-store, and graphs to name a few) are becoming …
Exploring Strategies To Transition To Big Data Technologies From Dw Technologies, Mbah Johnas Fortem
Exploring Strategies To Transition To Big Data Technologies From Dw Technologies, Mbah Johnas Fortem
Walden Dissertations and Doctoral Studies
As a result of innovation and technological improvements, organizations are now capable of capturing and storing massive amounts of data from various sources and domains. This increase in the volume of data resulted in traditional tools used for processing, storing, and analyzing large amounts of data becoming increasingly inefficient. Grounded in the extended technology acceptance model, the purpose of this qualitative multiple case study was to explore the strategies data managers use to transition from traditional data warehousing technologies to big data technologies. The participants included data managers from 6 organizations (medium and large size) based in Munich, Germany, who …
Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li
Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li
Computer Science and Engineering Theses
Performance evaluation and resource provisioning are two most critical factors to be considered for designers of distributed systems at modern warehouse data centers. The ever-increasing volumes of data in recent years have pushed many businesses to move their computing tasks to the Cloud, which offers many benefits including the low system management and maintenance costs and better scalability. As a result, most recent prominently emerging workloads are data-intensive, calling for scaling out the workload to a large number of servers for parallel processing. Questions can be asked as what factors impact the system scaling performance, and how to efficiently schedule …
Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li
Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li
Computer Science and Engineering Dissertations
Performance evaluation and resource provisioning are two most critical factors to be considered for designers of distributed systems at modern warehouse data centers. The ever-increasing volumes of data in recent years have pushed many businesses to move their computing tasks to the Cloud, which offers many benefits including the low system management and maintenance costs and better scalability. As a result, most recent prominently emerging workloads are data-intensive, calling for scaling out the workload to a large number of servers for parallel processing. Questions can be asked as what factors impact the system scaling performance, and how to efficiently schedule …
High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami
High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami
LSU Doctoral Dissertations
Genome sequencing technology has witnessed tremendous progress in terms of throughput and cost per base pair, resulting in an explosion in the size of data. Typical de Bruijn graph-based assembly tools demand a lot of processing power and memory and cannot assemble big datasets unless running on a scaled-up server with terabytes of RAMs or scaled-out cluster with several dozens of nodes. In the first part of this work, we present a distributed next-generation sequence (NGS) assembler called Lazer, that achieves both scalability and memory efficiency by using partitioned de Bruijn graphs. By enhancing the memory-to-disk swapping and reducing the …
Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan
Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan
Dissertations
Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.
First, dependencies among discrete, continuous and repeated observations are studied …
Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker
Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker
Doctor of Education (Ed.D)
The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that …
A Data-Driven Approach For Modeling Agents, Hamdi Kavak
A Data-Driven Approach For Modeling Agents, Hamdi Kavak
Computational Modeling & Simulation Engineering Theses & Dissertations
Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating …
Privacy Preservation In Social Media Environments Using Big Data, Katrina Ward
Privacy Preservation In Social Media Environments Using Big Data, Katrina Ward
Doctoral Dissertations
"With the pervasive use of mobile devices, social media, home assistants, and smart devices, the idea of individual privacy is fading. More than ever, the public is giving up personal information in order to take advantage of what is now considered every day conveniences and ignoring the consequences. Even seemingly harmless information is making headlines for its unauthorized use (18). Among this data is user trajectory data which can be described as a user's location information over a time period (6). This data is generated whenever users access their devices to record their location, query the location of a point …
Leveraging Tiled Display For Big Data Visualization Using D3.Js, Ujjwal Acharya
Leveraging Tiled Display For Big Data Visualization Using D3.Js, Ujjwal Acharya
Boise State University Theses and Dissertations
Data visualization has proven effective at detecting patterns and drawing inferences from raw data by transforming it into visual representations. As data grows large, visualizing it faces two major challenges: 1) limited resolution i.e. a screen is limited to a few million pixels but the data can have a billion data points, and 2) computational load i.e. processing of this data becomes computationally challenging for a single node system. This work addresses both of these issues for efficient big data visualization. In the developed system, a High Pixel Density and Large Format display was used enabling the display of fine …
Deep Data Locality On Apache Hadoop, Sungchul Lee
Deep Data Locality On Apache Hadoop, Sungchul Lee
UNLV Theses, Dissertations, Professional Papers, and Capstones
The amount of data being collected in various areas such as social media, network, scientific instrument, mobile devices, and sensors is growing continuously, and the technology to process them is also advancing rapidly. One of the fundamental technologies to process big data is Apache Hadoop that has been adopted by many commercial products, such as InfoSphere by IBM, or Spark by Cloudera. MapReduce on Hadoop has been widely used in many data science applications. As a dominant big data processing platform, the performance of MapReduce on Hadoop system has a significant impact on the big data processing capability across multiple …
Secure Multiparty Protocol For Differentially-Private Data Release, Anthony Harris
Secure Multiparty Protocol For Differentially-Private Data Release, Anthony Harris
Boise State University Theses and Dissertations
In the era where big data is the new norm, a higher emphasis has been placed on models which guarantees the release and exchange of data. The need for privacy-preserving data arose as more sophisticated data-mining techniques led to breaches of sensitive information. In this thesis, we present a secure multiparty protocol for the purpose of integrating multiple datasets simultaneously such that the contents of each dataset is not revealed to any of the data owners, and the contents of the integrated data do not compromise individual’s privacy. We utilize privacy by simulation to prove that the protocol is privacy-preserving, …
Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao
Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao
Theses
The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by …
Supporting Big Data At The Vehicular Edge, Lloyd Decker
Supporting Big Data At The Vehicular Edge, Lloyd Decker
Computer Science Theses & Dissertations
Vehicular networks are commonplace, and many applications have been developed to utilize their sensor and computing resources. This is a great utilization of these resources as long as they are mobile. The question to ask is whether these resources could be put to use when the vehicle is not mobile. If the vehicle is parked, the resources are simply dormant and waiting for use. If the vehicle has a connection to a larger computing infrastructure, then it can put its resources towards that infrastructure. With enough vehicles interconnected, there exists a computing environment that could handle many cloud-based application services. …
Assessment Of Factors Influencing Intent-To-Use Big Data Analytics In An Organization: A Survey Study, Wayne Madhlangobe
Assessment Of Factors Influencing Intent-To-Use Big Data Analytics In An Organization: A Survey Study, Wayne Madhlangobe
CCE Theses and Dissertations
The central question was how the relationship between trust-in-technology and intent-to-use Big Data Analytics in an organization is mediated by both Perceived Risk and Perceived Usefulness. Big Data Analytics is quickly becoming a critically important driver for business success. Many organizations are increasing their Information Technology budgets on Big Data Analytics capabilities. Technology Acceptance Model stands out as a critical theoretical lens primarily due to its assessment approach and predictive explanatory capacity to explain individual behaviors in the adoption of technology. Big Data Analytics use in this study was considered a voluntary act, therefore, well aligned with the Theory of …
Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan
Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan
Walden Dissertations and Doctoral Studies
From 2006 to 2016, an estimated average of 50% of big data analytics and decision support projects failed to deliver acceptable and actionable outputs to business users. The resulting management inefficiency came with high cost, and wasted investments estimated at $2.7 trillion in 2016 for companies in the United States. The purpose of this quantitative descriptive study was to examine the data model of a typical data analytics project in a big data environment for opportunities to improve the information created for management problem-solving. The research questions focused on finding artifacts within enterprise data to model key business scenarios for …
Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara
Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara
Dissertations, Master's Theses and Master's Reports
Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient.
This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is …
Visual Logging Framework Using Elk Stack, Ravi Nishant
Visual Logging Framework Using Elk Stack, Ravi Nishant
Computer Science and Engineering Theses
Logging is the process of storing information for future reference and audit purposes. In software applications, logging plays a very critical role as a development utility and ensures code quality. It acts as an enabler for developers and support professionals by providing them capability to see application’s functionality and understand any issues with it. Data logging has a widespread use in scientific experiments and analytical systems. Major systems which heavily uses data logging are weather reporting services, digital advertisement, search engines, space exploration systems to name a few. Although, data logging increases the productivity and efficiency of a software system, …