Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (8)
- Databases and Information Systems (3)
- Computer Engineering (2)
- Data Storage Systems (2)
- Engineering (2)
-
- Software Engineering (2)
- Anthropology (1)
- Applied Statistics (1)
- Data Science (1)
- Economics (1)
- Environmental Sciences (1)
- Intellectual Property Law (1)
- Internet Law (1)
- Law (1)
- Law and Economics (1)
- Law and Society (1)
- Numerical Analysis and Scientific Computing (1)
- Organization Development (1)
- Political Economy (1)
- Property Law and Real Estate (1)
- Public Affairs, Public Policy and Public Administration (1)
- Public Law and Legal Theory (1)
- Public Policy (1)
- Rule of Law (1)
- Science and Technology Studies (1)
- Social and Behavioral Sciences (1)
- Social and Cultural Anthropology (1)
- Sociology (1)
- Statistics and Probability (1)
- Institution
- Publication
- Publication Type
Articles 1 - 10 of 10
Full-Text Articles in Physical Sciences and Mathematics
Hydroterre Strahler Network Service For Any Level 12 Huc Catchment In The Usa., Lorne Leonard, Chris Duffy
Hydroterre Strahler Network Service For Any Level 12 Huc Catchment In The Usa., Lorne Leonard, Chris Duffy
International Conference on Hydroinformatics
My talk will discuss two related topics to HydroTerre (http://www.hydroterre.psu.edu), a prototype infrastructure that provides researchers, educators, and resource managers with seamless access to geospatial/geotemporal data for supporting physics-based numerical models. The first topic describes the prototype, defining the supporting Essential Terrestrial Variables (ETV’s) and the infrastructure to support models and data anywhere in the continental USA (CONUS). I will address how we are overcoming important problems of accessibility to high-resolution geospatial data sets from multiple sources, scalability of geospatial data in support of distributed models and data-intensive computation for multi-scale, multi-state simulations. The second topic will describe a derived …
Transit Demand Estimation And Crowding Prediction Based On Real-Time Transit Data, Michael Aro
Transit Demand Estimation And Crowding Prediction Based On Real-Time Transit Data, Michael Aro
Electronic Thesis and Dissertation Repository
With an increasing number of intelligent analytic techniques and increasing networking capabilities, municipal transit authorities can leverage real-time data to estimate transit volume and predict crowding conditions. We introduce a proactive Transit Demand Estimation and Prediction System (TraDEPS) – an approach that has the potential to prevent crowding and improve transit service, by measuring the transit activity (the number of passengers on the individual modes of public transportation and the demand on a route), and estimating crowding levels at a given time. This system utilizes a combination of real-time data streams from multiple sources, a predictive model and data analytics …
Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz
Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson Higashino, Alexandra L'Heureux, David Allison, Miriam Capretz
Wilson A Higashino
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped …
A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan
A Scalable Supervised Subsemble Prediction Algorithm, Stephanie Sapp, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Subsemble is a flexible ensemble method that partitions a full data set into subsets of observations, fits the same algorithm on each subset, and uses a tailored form of V-fold cross-validation to construct a prediction function that combines the subset-specific fits with a second metalearner algorithm. Previous work studied the performance of Subsemble with subsets created randomly, and showed that these types of Subsembles often result in better prediction performance than the underlying algorithm fit just once on the full dataset. Since the final Subsemble estimator varies depending on the data used to create the subset-specific fits, different strategies for …
Disaster Data Management In Cloud Environments, Katarina Grolinger
Disaster Data Management In Cloud Environments, Katarina Grolinger
Katarina Grolinger
Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management. In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz
Katarina Grolinger
: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …
Reading In Binary Data And Creating An R User Interface, Malika J. Onstad, Brett Amidan, Kimberly Freeman
Reading In Binary Data And Creating An R User Interface, Malika J. Onstad, Brett Amidan, Kimberly Freeman
STAR Program Research Presentations
The Bonneville Power Administration (BPA) employs Phasor Measurement Units (PMUs) to measure variables such as Voltage, Frequency, and Phasor Angles every sixtieth of a second. These measurements result in terabytes of data which are analyzed to detect abnormalities in the power grid. Recently BPA has switched the data file format from DST to PDAT. A function does not currently exist to read in PDAT files in order to prepare the data for analysis. In order to do this the raw PMU data must be sorted and extracted to ensure its accuracy prior to analysis. This research worked to produce a …
M-Grid : A Distributed Framework For Multidimensional Indexing And Querying Of Location Based Big Data, Shashank Kumar
M-Grid : A Distributed Framework For Multidimensional Indexing And Querying Of Location Based Big Data, Shashank Kumar
Masters Theses
"The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional …
Commons At The Intersection Of Peer Production, Citizen Science, And Big Data: Galaxy Zoo, Michael J. Madison
Commons At The Intersection Of Peer Production, Citizen Science, And Big Data: Galaxy Zoo, Michael J. Madison
Book Chapters
The knowledge commons research framework is applied to a case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis via the Internet. In the second place Galaxy Zoo is a highly successful example of peer production, sometimes known colloquially as crowdsourcing, by which data are gathered, supplied, and/or analyzed by very large numbers of anonymous and pseudonymous contributors to an …
Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz
Challenges For Mapreduce In Big Data, Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz
Electrical and Computer Engineering Publications
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped …