Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Databases and Information Systems

Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts Jan 2023

Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts

Political Science & Geography Faculty Publications

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber Jul 2021

Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber

Journal of Spatial Information Science

In this paper we consider some of the issues of working with big data and big spatial data and highlight the need for an open and critical framework. We focus on a set of challenges underlying the collection and analysis of big data. In particular, we consider 1) inference when working with usually biased big data, challenging the assumed inferential superiority of data with observations, n, approaching N, the population n -> N. We also emphasise 2) the need for analyses that answer questions of practical significance or with greater emphasis on the size of the effect, rather than the …


Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church Jul 2020

Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church

Research Collection School Of Computing and Information Systems

Spatial optimization represents a set of powerful spatial analysis techniques that can be used to identify optimal solution(s) and even generate a large number of competitive alternatives. The formulation of such problems involves maximizing or minimizing one or more objectives while satisfying a number of constraints. Solution techniques range from exact models solved with such approaches as linear programming and integer programming, or heuristic algorithms, i.e. Tabu Search, Simulated Annealing, and Genetic Algorithms. Spatial optimization techniques have been utilized in numerous planning applications, such as location-allocation modeling/site selection, land use planning, school districting, regionalization, routing, and urban design. These methods …


Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker Apr 2019

Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker

Doctor of Education (Ed.D)

The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that …


Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi Jan 2019

Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi

Copyright, Fair Use, Scholarly Communication, etc.

Recently, big data investment has become important for organizations, especially with the fast growth of data following the huge expansion in the usage of social media applications, and websites. Many organizations depend on extracting and reaching the needed reports and statistics. As the investments on big data and its storage have become major challenges for organizations, many technologies and methods have been developed to tackle those challenges.

One of such technologies is Hadoop, a framework that is used to divide big data into packages and distribute those packages through nodes to be processed, consuming less cost than the traditional storage …


Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman May 2018

Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman

Research Collection School Of Computing and Information Systems

Advanced sensor technology, social media, and other information technologies have provided us with “big data” on climate change. Due to the World Meteorological Organization’s Global Climate Observing System, climate observations and records, as well as discussions on climate-related concerns such as measurement of air temperature, are widely available now. The United Nations’ Global Pulse visualises public engagement on climate change globally, with data such as the volume of climate-related tweets. Big data, data analytics, and the sharing of scientific results in the popular press have created, as a result, an unprecedented level of citizen informedness—the degree to which citizens have …


Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan Jan 2018

Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan

Walden Dissertations and Doctoral Studies

From 2006 to 2016, an estimated average of 50% of big data analytics and decision support projects failed to deliver acceptable and actionable outputs to business users. The resulting management inefficiency came with high cost, and wasted investments estimated at $2.7 trillion in 2016 for companies in the United States. The purpose of this quantitative descriptive study was to examine the data model of a typical data analytics project in a big data environment for opportunities to improve the information created for management problem-solving. The research questions focused on finding artifacts within enterprise data to model key business scenarios for …


The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez Sep 2017

The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or "BOP". The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don't support interactive visualization of millions of objects, a new platform was needed. The BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA …


Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby Sep 2017

Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

The open source software GeoWave bridges the gap between geographic information systems and distributed computing. This is done by preserving locality of multidimensional data when indexing it into a single-dimensional key-value store, using space filling curves. This means that like values in each dimension are stored physically close together in the datastore. We demonstrate the efficiencies and benefits of the GeoWave indexing algorithm to store and query billions of spatiotemporal data points. We show how this indexing strategy can be used to reduce query and processing times by multiple orders of magnitude using publicly available taxi trip data published by …


Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale Mar 2017

Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale

Geography Faculty Publications

No abstract provided.


Proactive It Incident Prevention: Using Data Analytics To Reduce Service Interruptions, Mark G. Malley Jan 2017

Proactive It Incident Prevention: Using Data Analytics To Reduce Service Interruptions, Mark G. Malley

Walden Dissertations and Doctoral Studies

The cost of resolving user requests for IT assistance rises annually. Researchers have demonstrated that data warehouse analytic techniques can improve service, but they have not established the benefit of using global organizational data to reduce reported IT incidents. The purpose of this quantitative, quasi-experimental study was to examine the extent to which IT staff use of organizational knowledge generated from data warehouse analytical measures reduces the number of IT incidents over a 30-day period, as reported by global users of IT within an international pharmaceutical company headquartered in Germany. Organizational learning theory was used to approach the theorized relationship …


Major Challenges And Solutions For Utilizing Big Data In The Maritime Industry, Sadaharu Koga Jan 2015

Major Challenges And Solutions For Utilizing Big Data In The Maritime Industry, Sadaharu Koga

World Maritime University Dissertations

The dissertation is a study of big data for the use in the maritime industry. Today’s society is information-intensive. The term “big data” is becoming more common. In fact, some maritime companies and institutions have already been trying to utilize big data for enhancing maritime safety and environmental protection. In order to promote this trend, the dissertation tries to identify common and important challenges for the whole maritime industry in terms of the utilization of big data and propose corresponding solutions. First, by reviewing the definitions of big data, three major features are identified. Big data takes electronic form, is …


Networked Employment Discrimination, Tamara Kneese Oct 2014

Networked Employment Discrimination, Tamara Kneese

Media Studies

Employers often struggle to assess qualified applicants, particularly in contexts where they receive hundreds of applications for job openings. In an effort to increase efficiency and improve the process, many have begun employing new tools to sift through these applications, looking for signals that a candidate is “the best fit.” Some companies use tools that offer algorithmic assessments of workforce data to identify the variables that lead to stronger employee performance, or to high employee attrition rates, while others turn to third party ranking services to identify the top applicants in a labor pool. Still others eschew automated systems, but …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


The Use Of Business Intelligence Techniques In Supply Chain Performance, Jue Gu Jul 2014

The Use Of Business Intelligence Techniques In Supply Chain Performance, Jue Gu

Open Access Theses

Who likes data? Businesses are always loyal data followers. Companies analyze various forms of data to maintain businesses and identify their current performance in different areas so they can find business opportunities to improve and obtain more market share in advance (Qrunfleh & Tarafdar, 2012). When Big Data comes to businesses, companies who can take advantage of data the best tend to regularly get more business and customers (Waller & Fawcett, 2013). Collecting, analyzing, and demonstrating data could be essential to a single business, a company's supply chain performance and its sustainability. As an intelligent data processing product in terms …


Predicting Human Behavior, Tamara Kneese Mar 2014

Predicting Human Behavior, Tamara Kneese

Media Studies

Countless highly accurate predictions can be made from trace data, with varying degrees of personal or societal consequence (e.g., search engines predict hospital admission, gaming companies can predict compulsive gambling problems, government agencies predict criminal activity). Predicting human behavior can be both hugely beneficial and deeply problematic depending on the context. What kinds of predictive privacy harms are emerging? And what are the implications for systems of oversight and due process protections? For example, what are the implications for employment, health care and policing when predictive models are involved? How should varied organizations address what they can predict?