Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

Dimension Reduction Techniques For High Dimensional And Ultra-High Dimensional Data, Subha Datta Dec 2019

Dimension Reduction Techniques For High Dimensional And Ultra-High Dimensional Data, Subha Datta

Dissertations

This dissertation introduces two statistical techniques to tackle high-dimensional data, which is very commonplace nowadays. It consists of two topics which are inter-related by a common link, dimension reduction.

The first topic is a recently introduced classification technique, the weighted principal support vector machine (WPSVM), which is incorporated into a spatial point process framework. The WPSVM possesses an additional parameter, a weight parameter, besides the regularization parameter. Most statistical techniques, including WPSVM, have an inherent assumption of independence, which means the data points are not connected with each other in any manner. But spatial data violates this assumption. Correlation between …


Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li Dec 2019

Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li

Computer Science and Engineering Theses

Performance evaluation and resource provisioning are two most critical factors to be considered for designers of distributed systems at modern warehouse data centers. The ever-increasing volumes of data in recent years have pushed many businesses to move their computing tasks to the Cloud, which offers many benefits including the low system management and maintenance costs and better scalability. As a result, most recent prominently emerging workloads are data-intensive, calling for scaling out the workload to a large number of servers for parallel processing. Questions can be asked as what factors impact the system scaling performance, and how to efficiently schedule …


Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li Dec 2019

Performance Modeling And Resource Provisioning For Data-Intensive Applications, Zhongwei Li

Computer Science and Engineering Dissertations

Performance evaluation and resource provisioning are two most critical factors to be considered for designers of distributed systems at modern warehouse data centers. The ever-increasing volumes of data in recent years have pushed many businesses to move their computing tasks to the Cloud, which offers many benefits including the low system management and maintenance costs and better scalability. As a result, most recent prominently emerging workloads are data-intensive, calling for scaling out the workload to a large number of servers for parallel processing. Questions can be asked as what factors impact the system scaling performance, and how to efficiently schedule …


High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami Jun 2019

High-Performance Computing Frameworks For Large-Scale Genome Assembly, Sayan Goswami

LSU Doctoral Dissertations

Genome sequencing technology has witnessed tremendous progress in terms of throughput and cost per base pair, resulting in an explosion in the size of data. Typical de Bruijn graph-based assembly tools demand a lot of processing power and memory and cannot assemble big datasets unless running on a scaled-up server with terabytes of RAMs or scaled-out cluster with several dozens of nodes. In the first part of this work, we present a distributed next-generation sequence (NGS) assembler called Lazer, that achieves both scalability and memory efficiency by using partitioned de Bruijn graphs. By enhancing the memory-to-disk swapping and reducing the …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Dynamic Sampling Versions Of Popular Spc Charts For Big Data Analysis, Samuel Anyaso-Samuel May 2019

Dynamic Sampling Versions Of Popular Spc Charts For Big Data Analysis, Samuel Anyaso-Samuel

Boise State University Theses and Dissertations

The statistical process control (SPC) chart is an effective tool for the analysis, interpretation, and visualization of data from sequential processes. Commonly used SPC charts such as the Shewhart, CUSUM and EWMA charts are widely implemented in detecting distributional shifts in various processes. With recent scientific and technological advancements, massive amounts of data continue to be generated by production, medical, agricultural and many other industrial processes. Conventional SPC charts have significant drawbacks in monitoring such processes, specifically when the velocity of the data flow is greater than the run time of the monitoring procedure. In the literature, dynamic sampling control …


Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker Apr 2019

Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker

Doctor of Education (Ed.D)

The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that …


A Data-Driven Approach For Modeling Agents, Hamdi Kavak Apr 2019

A Data-Driven Approach For Modeling Agents, Hamdi Kavak

Computational Modeling & Simulation Engineering Theses & Dissertations

Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating …


Design Of Experiment And Analysis Techniques For Fuel Consumption Data Using Heavy-Duty Diesel Vehicles And On-Road Testing, Sarah Ann Mills Jan 2019

Design Of Experiment And Analysis Techniques For Fuel Consumption Data Using Heavy-Duty Diesel Vehicles And On-Road Testing, Sarah Ann Mills

Graduate Theses, Dissertations, and Problem Reports

Chassis dynamometer and on-road testing are usually employed to test vehicle operation. Testing on a chassis dynamometer reduces data variability compared to on-road testing due to the controlled environment but it does not account for other important variables that affects real-world vehicle operation. This study used on-road testing to investigate the differences between two test fuels under real-world conditions. Three heavy-duty diesel vehicles were driven on different routes for a period of three months. Each vehicle was instrumented with flow meters to gather fuel consumption data, which was then compared to the fuel rate broadcasted by the engine control unit …


Privacy Preservation In Social Media Environments Using Big Data, Katrina Ward Jan 2019

Privacy Preservation In Social Media Environments Using Big Data, Katrina Ward

Doctoral Dissertations

"With the pervasive use of mobile devices, social media, home assistants, and smart devices, the idea of individual privacy is fading. More than ever, the public is giving up personal information in order to take advantage of what is now considered every day conveniences and ignoring the consequences. Even seemingly harmless information is making headlines for its unauthorized use (18). Among this data is user trajectory data which can be described as a user's location information over a time period (6). This data is generated whenever users access their devices to record their location, query the location of a point …