Physical Sciences and Mathematics | Open Access Articles

Hydroterre Strahler Network Service For Any Level 12 Huc Catchment In The Usa., Lorne Leonard, Chris Duffy Aug 2014

Hydroterre Strahler Network Service For Any Level 12 Huc Catchment In The Usa., Lorne Leonard, Chris Duffy

International Conference on Hydroinformatics

My talk will discuss two related topics to HydroTerre (http://www.hydroterre.psu.edu), a prototype infrastructure that provides researchers, educators, and resource managers with seamless access to geospatial/geotemporal data for supporting physics-based numerical models. The first topic describes the prototype, defining the supporting Essential Terrestrial Variables (ETV’s) and the infrastructure to support models and data anywhere in the continental USA (CONUS). I will address how we are overcoming important problems of accessibility to high-resolution geospatial data sets from multiple sources, scalability of geospatial data in support of distributed models and data-intensive computation for multi-scale, multi-state simulations. The second topic will describe a derived …

Go to article

Transit Demand Estimation And Crowding Prediction Based On Real-Time Transit Data, Michael Aro Jul 2014

Transit Demand Estimation And Crowding Prediction Based On Real-Time Transit Data, Michael Aro

Electronic Thesis and Dissertation Repository

With an increasing number of intelligent analytic techniques and increasing networking capabilities, municipal transit authorities can leverage real-time data to estimate transit volume and predict crowding conditions. We introduce a proactive Transit Demand Estimation and Prediction System (TraDEPS) – an approach that has the potential to prevent crowding and improve transit service, by measuring the transit activity (the number of passengers on the individual modes of public transportation and the demand on a route), and estimating crowding levels at a given time. This system utilizes a combination of real-time data streams from multiple sources, a predictive model and data analytics …

Go to article

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz May 2014

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz

Wilson A Higashino

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped …

Go to article

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan Apr 2014

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Subsemble is a flexible ensemble method that partitions a full data set into subsets of observations, fits the same algorithm on each subset, and uses a tailored form of V-fold cross-validation to construct a prediction function that combines the subset-specific fits with a second metalearner algorithm. Previous work studied the performance of Subsemble with subsets created randomly, and showed that these types of Subsembles often result in better prediction performance than the underlying algorithm fit just once on the full dataset. Since the final Subsemble estimator varies depending on the data used to create the subset-specific fits, different strategies for …

Go to article

Disaster Data Management In Cloud Environments, Katarina Grolinger Jan 2014

Disaster Data Management In Cloud Environments, Katarina Grolinger

Katarina Grolinger

Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management. In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …

Go to article

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz Jan 2014

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz

Katarina Grolinger

: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …

Go to article

Reading In Binary Data And Creating An R User Interface, Malika J. Onstad, Brett Amidan, Kimberly Freeman Jan 2014

Reading In Binary Data And Creating An R User Interface, Malika J. Onstad, Brett Amidan, Kimberly Freeman

STAR Program Research Presentations

The Bonneville Power Administration (BPA) employs Phasor Measurement Units (PMUs) to measure variables such as Voltage, Frequency, and Phasor Angles every sixtieth of a second. These measurements result in terabytes of data which are analyzed to detect abnormalities in the power grid. Recently BPA has switched the data file format from DST to PDAT. A function does not currently exist to read in PDAT files in order to prepare the data for analysis. In order to do this the raw PMU data must be sorted and extracted to ensure its accuracy prior to analysis. This research worked to produce a …

Go to article

M-Grid : A Distributed Framework For Multidimensional Indexing And Querying Of Location Based Big Data, Shashank Kumar Jan 2014

M-Grid : A Distributed Framework For Multidimensional Indexing And Querying Of Location Based Big Data, Shashank Kumar

Masters Theses

"The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional …

Go to article

Commons At The Intersection Of Peer Production, Citizen Science, And Big Data: Galaxy Zoo, Michael J. Madison Jan 2014

Commons At The Intersection Of Peer Production, Citizen Science, And Big Data: Galaxy Zoo, Michael J. Madison

Book Chapters

The knowledge commons research framework is applied to a case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis via the Internet. In the second place Galaxy Zoo is a highly successful example of peer production, sometimes known colloquially as crowdsourcing, by which data are gathered, supplied, and/or analyzed by very large numbers of anonymous and pseudonymous contributors to an …

Go to article

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz Jan 2014

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz

Electrical and Computer Engineering Publications

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Hydroterre Strahler Network Service For Any Level 12 Huc Catchment In The Usa., Lorne Leonard, Chris Duffy

International Conference on Hydroinformatics

Transit Demand Estimation And Crowding Prediction Based On Real-Time Transit Data, Michael Aro

Electronic Thesis and Dissertation Repository

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz

Wilson A Higashino

A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Disaster Data Management In Cloud Environments, Katarina Grolinger

Katarina Grolinger

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz

Katarina Grolinger

Reading In Binary Data And Creating An R User Interface, Malika J. Onstad, Brett Amidan, Kimberly Freeman

STAR Program Research Presentations

M-Grid : A Distributed Framework For Multidimensional Indexing And Querying Of Location Based Big Data, Shashank Kumar

Masters Theses

Commons At The Intersection Of Peer Production, Citizen Science, And Big Data: Galaxy Zoo, Michael J. Madison

Book Chapters

Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz

Electrical and Computer Engineering Publications