Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Big data

Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 46

Full-Text Articles in Physical Sciences and Mathematics

Interposition Based Container Optimization For Data Intensive Applications, Rohan Tikmany Jul 2023

Interposition Based Container Optimization For Data Intensive Applications, Rohan Tikmany

College of Computing and Digital Media Dissertations

Reproducibility of applications is paramount in several scenarios such as collaborative work and software testing. Containers provide an easy way of addressing reproducibility by packaging the application's software and data dependencies into one executable unit, which can be executed multiple times in different environments. With the increased use of containers in industry as well as academia, current research has examined the provisioning and storage cost of containers and has shown that container deployments often include unnecessary software packages. Current methods to optimize the container size prune unnecessary data at the granularity of files and thus make binary decisions. We show …


Digital Dna: The Ethical Implications Of Big Data As The World’S New-Age Commodity, Clark H. Dotson May 2023

Digital Dna: The Ethical Implications Of Big Data As The World’S New-Age Commodity, Clark H. Dotson

Honors Theses

In the emerging digital world that we find ourselves in, it becomes apparent that data collection has become a staple of daily life, whether we like it or not. This research discussion aims to bring light to just how much one’s own digital identity is valued in the technologically-infused world of today, with distinct research and local examples to bring awareness to the ethical implications of your online presence. The paper in question examines anecdotal and research evidence of the collection of data, both through true and unjust means, as well as ethical implications of what this information truly represents. …


A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong Mar 2023

A Study Of The Impact Of Data Intelligence On Software Delivery Performance, Yongdong Dong

Dissertations and Theses Collection (Open Access)

With the rise of big data and artificial intelligence, data intelligence has gradually become the focus of academia and industry. Data intelligence has two obvious characteristics: big data drive and application scene drive. More and more enterprises extract valuable patterns contained in data with prediction and decision analysis methods and technologies such as large-scale data mining, machine learning and deep learning and use them to improve the management and decision in complex practice, so as to promote changes of new business modes, organizational structures and even business strategies, and improve the operational efficiency of organizations. However, there are few studies …


Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts Jan 2023

Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts

Political Science & Geography Faculty Publications

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen Jan 2022

Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen

Computer Science Faculty Publications

Technical drawings used for illustrating designs are ubiquitous in patent documents, especially design patents. Different from natural images, these drawings are usually made using black strokes with little color information, making it challenging for models trained on natural images to recognize objects. To facilitate indexing and searching, we propose an effective and efficient visual descriptor model that extracts object names and aspects from patent captions to annotate benchmark patent figure datasets. We compared two state-of-the-art named entity recognition (NER) models and found that with a limited number of annotated samples, the BiLSTM-CRF model outperforms the Transformer model by a significant …


On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee Dec 2021

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee

Dissertations

A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …


Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber Jul 2021

Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber

Journal of Spatial Information Science

In this paper we consider some of the issues of working with big data and big spatial data and highlight the need for an open and critical framework. We focus on a set of challenges underlying the collection and analysis of big data. In particular, we consider 1) inference when working with usually biased big data, challenging the assumed inferential superiority of data with observations, n, approaching N, the population n -> N. We also emphasise 2) the need for analyses that answer questions of practical significance or with greater emphasis on the size of the effect, rather than the …


An Introduction To Seshat: Global History Databank, Peter Turchin, Harvey Whitehouse, Pieter François, Daniel Hoyer, Abel Alves, John Baines, David Baker, Marta Bartkowiak, Jennifer Bates, James Bennett, Julye Bidmead, Peter Bol, Alessandro Ceccarelli, Kostis Christakis, David Christian, Alan Covey, Franco De Angelis, Timothy K. Earle, Neil R. Edwards, Gary Feinman, Stephanie Grohmann, Philip B. Holden, Árni Júlíusson, Andrey Korotayev, Axel Kristinsson, Jennifer Larson, Oren Litwin, Victor Mair, Joseph G. Manning, Patrick Manning, Arkadiusz Marciniak, Gregory Mcmahon, John Miksic, Juan Carlos Moreno Garcia, Ian Morris, Ruth Mostern, Daniel Mullins, Oluwole Oyebamiji, Peter Peregrine, Cameron Petrie, Johannes Preiser-Kapeller, Peter Rudiak-Gould, Paula Sabloff, Patrick Savage, Charles Spencer, Miriam Stark, Barend Ter Haar, Stefan Thurner, Vesna Wallace, Nina Witoszek, Liye Xie Nov 2020

An Introduction To Seshat: Global History Databank, Peter Turchin, Harvey Whitehouse, Pieter François, Daniel Hoyer, Abel Alves, John Baines, David Baker, Marta Bartkowiak, Jennifer Bates, James Bennett, Julye Bidmead, Peter Bol, Alessandro Ceccarelli, Kostis Christakis, David Christian, Alan Covey, Franco De Angelis, Timothy K. Earle, Neil R. Edwards, Gary Feinman, Stephanie Grohmann, Philip B. Holden, Árni Júlíusson, Andrey Korotayev, Axel Kristinsson, Jennifer Larson, Oren Litwin, Victor Mair, Joseph G. Manning, Patrick Manning, Arkadiusz Marciniak, Gregory Mcmahon, John Miksic, Juan Carlos Moreno Garcia, Ian Morris, Ruth Mostern, Daniel Mullins, Oluwole Oyebamiji, Peter Peregrine, Cameron Petrie, Johannes Preiser-Kapeller, Peter Rudiak-Gould, Paula Sabloff, Patrick Savage, Charles Spencer, Miriam Stark, Barend Ter Haar, Stefan Thurner, Vesna Wallace, Nina Witoszek, Liye Xie

Religious Studies Faculty Articles and Research

This article introduces the Seshat: Global History Databank, its potential, and its methodology. Seshat is a databank containing vast amounts of quantitative data buttressed by qualitative nuance for a large sample of historical and archaeological polities. The sample is global in scope and covers the period from the Neolithic Revolution to the Industrial Revolution. Seshat allows scholars to capture dynamic processes and to test theories about the co-evolution (or not) of social scale and complexity, agriculture, warfare, religion, and any number of such Big Questions. Seshat is rapidly becoming a massive resource for innovative cross-cultural and cross-disciplinary research. Seshat is …


Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church Jul 2020

Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church

Research Collection School Of Computing and Information Systems

Spatial optimization represents a set of powerful spatial analysis techniques that can be used to identify optimal solution(s) and even generate a large number of competitive alternatives. The formulation of such problems involves maximizing or minimizing one or more objectives while satisfying a number of constraints. Solution techniques range from exact models solved with such approaches as linear programming and integer programming, or heuristic algorithms, i.e. Tabu Search, Simulated Annealing, and Genetic Algorithms. Spatial optimization techniques have been utilized in numerous planning applications, such as location-allocation modeling/site selection, land use planning, school districting, regionalization, routing, and urban design. These methods …


Exploring Strategies To Transition To Big Data Technologies From Dw Technologies, Mbah Johnas Fortem Jan 2020

Exploring Strategies To Transition To Big Data Technologies From Dw Technologies, Mbah Johnas Fortem

Walden Dissertations and Doctoral Studies

As a result of innovation and technological improvements, organizations are now capable of capturing and storing massive amounts of data from various sources and domains. This increase in the volume of data resulted in traditional tools used for processing, storing, and analyzing large amounts of data becoming increasingly inefficient. Grounded in the extended technology acceptance model, the purpose of this qualitative multiple case study was to explore the strategies data managers use to transition from traditional data warehousing technologies to big data technologies. The participants included data managers from 6 organizations (medium and large size) based in Munich, Germany, who …


Repositories For Taxonomic Data: Where We Are And What Is Missing, Aurélian Miralles, Teddy Bruy, Katherine Wolcott, Mark D. Scherz, Dominik Begerow, Bank Beszteri, Michael Bonkowski, Janine Felden, Birgit Gemeinholzer, Frank Glaw, Frank Oliver Glöckner, Oliver Hawlitschek, Ivaylo Kostadinov, Tim W. Nattkemper, Christian Printzen, Jasmin Renz, Nataliya Rybalka, Marc Stadler, Tanja Weibulat, Thomas Wilke, Susanne S. Renner, Miguel Vences Jan 2020

Repositories For Taxonomic Data: Where We Are And What Is Missing, Aurélian Miralles, Teddy Bruy, Katherine Wolcott, Mark D. Scherz, Dominik Begerow, Bank Beszteri, Michael Bonkowski, Janine Felden, Birgit Gemeinholzer, Frank Glaw, Frank Oliver Glöckner, Oliver Hawlitschek, Ivaylo Kostadinov, Tim W. Nattkemper, Christian Printzen, Jasmin Renz, Nataliya Rybalka, Marc Stadler, Tanja Weibulat, Thomas Wilke, Susanne S. Renner, Miguel Vences

Harold W. Manter Laboratory: Library Materials

Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000–20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4,113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use …


Ml4iot: A Framework To Orchestrate Machine Learning Workflows On Internet Of Things Data, Jose Miguel Alves, Leonardo Honorio, Miriam A M Capretz Oct 2019

Ml4iot: A Framework To Orchestrate Machine Learning Workflows On Internet Of Things Data, Jose Miguel Alves, Leonardo Honorio, Miriam A M Capretz

Electrical and Computer Engineering Publications

Internet of Things (IoT) applications generate vast amounts of real-time data. Temporal analysis of these data series to discover behavioural patterns may lead to qualified knowledge affecting a broad range of industries. Hence, the use of machine learning (ML) algorithms over IoT data has the potential to improve safety, economy, and performance in critical processes. However, creating ML workflows at scale is a challenging task that depends upon both production and specialized skills. Such tasks require investigation, understanding, selection, and implementation of specific ML workflows, which often lead to bottlenecks, production issues, and code management complexity and even then may …


Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski Jun 2019

Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski

Beyond: Undergraduate Research Journal

The purpose of this research project is to use statistical analysis, data mining, and machine learning techniques to determine identifiable factors in child welfare service records that could lead to a child entering the foster care system multiple times. This would allow us the capability of accurately predicting a case’s outcome based on these factors. We were provided with eight years of data in the form of multiple spreadsheets from Partnership for Strong Families (PSF), a child welfare services organization based in Gainesville, Florida, who is contracted by the Florida Department for Children and Families (DCF). This data contained a …


Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks, Samira Pouyanfar Jun 2019

Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks, Samira Pouyanfar

FIU Electronic Theses and Dissertations

With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects …


How To Derive Causal Insights For Digital Commerce In China? A Research Commentary On Computational Social Science Methods, David C.W. Phang, Kanliang Wang, Qiu-Hong Wang, Robert John Kauffman, Maurizio Naldi May 2019

How To Derive Causal Insights For Digital Commerce In China? A Research Commentary On Computational Social Science Methods, David C.W. Phang, Kanliang Wang, Qiu-Hong Wang, Robert John Kauffman, Maurizio Naldi

Research Collection School Of Computing and Information Systems

The transformation of empirical research due to the arrival of big data analytics and data science, as well as the new availability of methods that emphasize causal inference, are moving forward at full speed. In this Research Commentary, we examine the extent to which this has the potential to influence how e-commerce research is conducted. China offers the ultimate in data-at-scale settings, and the construction of real-world natural experiments. Chinese e-commerce includes some of the largest firms involved in e-commerce, mobile commerce, social media and social networks. This article was written to encourage young faculty and doctoral students to engage …


Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker Apr 2019

Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker

Doctor of Education (Ed.D)

The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that …


Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi Jan 2019

Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi

Copyright, Fair Use, Scholarly Communication, etc.

Recently, big data investment has become important for organizations, especially with the fast growth of data following the huge expansion in the usage of social media applications, and websites. Many organizations depend on extracting and reaching the needed reports and statistics. As the investments on big data and its storage have become major challenges for organizations, many technologies and methods have been developed to tackle those challenges.

One of such technologies is Hadoop, a framework that is used to divide big data into packages and distribute those packages through nodes to be processed, consuming less cost than the traditional storage …


Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman May 2018

Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman

Research Collection School Of Computing and Information Systems

Advanced sensor technology, social media, and other information technologies have provided us with “big data” on climate change. Due to the World Meteorological Organization’s Global Climate Observing System, climate observations and records, as well as discussions on climate-related concerns such as measurement of air temperature, are widely available now. The United Nations’ Global Pulse visualises public engagement on climate change globally, with data such as the volume of climate-related tweets. Big data, data analytics, and the sharing of scientific results in the popular press have created, as a result, an unprecedented level of citizen informedness—the degree to which citizens have …


Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao Apr 2018

Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao

Theses

The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by …


The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal Jan 2018

The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal

George K. Thiruvathukal

This special issue of IT Professional focuses on the graph database. The graph database, a relatively new phenomenon, is well suited to the burgeoning information era in which we are increasingly becoming immersed. Here, the guest editors briefly explain how a graph database works, its relation to the relational database management system (RDBMS), and its quantitative and qualitative pros and cons, including how graph databases can be harnessed in a hybrid environment. They also survey the excellent articles submitted for this special issue.


Recommender Systems For Large-Scale Social Networks: A Review Of Challenges And Solutions, Magdalini Eirinaki, Jerry Gao, Iraklis Varlamis, Konstantinos Tserpes Jan 2018

Recommender Systems For Large-Scale Social Networks: A Review Of Challenges And Solutions, Magdalini Eirinaki, Jerry Gao, Iraklis Varlamis, Konstantinos Tserpes

Faculty Publications

Social networks have become very important for networking, communications, and content sharing. Social networking applications generate a huge amount of data on a daily basis and social networks constitute a growing field of research, because of the heterogeneity of data and structures formed in them, and their size and dynamics. When this wealth of data is leveraged by recommender systems, the resulting coupling can help address interesting problems related to social engagement, member recruitment, and friend recommendations.In this work we review the various facets of large-scale social recommender systems, summarizing the challenges and interesting problems and discussing some of the …


Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan Jan 2018

Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan

Walden Dissertations and Doctoral Studies

From 2006 to 2016, an estimated average of 50% of big data analytics and decision support projects failed to deliver acceptable and actionable outputs to business users. The resulting management inefficiency came with high cost, and wasted investments estimated at $2.7 trillion in 2016 for companies in the United States. The purpose of this quantitative descriptive study was to examine the data model of a typical data analytics project in a big data environment for opportunities to improve the information created for management problem-solving. The research questions focused on finding artifacts within enterprise data to model key business scenarios for …


The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal Nov 2017

The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

This special issue of IT Professional focuses on the graph database. The graph database, a relatively new phenomenon, is well suited to the burgeoning information era in which we are increasingly becoming immersed. Here, the guest editors briefly explain how a graph database works, its relation to the relational database management system (RDBMS), and its quantitative and qualitative pros and cons, including how graph databases can be harnessed in a hybrid environment. They also survey the excellent articles submitted for this special issue.


Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer Oct 2017

Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer

Research Collection Lee Kong Chian School Of Business

The advent of big data has created opportunities for firms to customize their products and services to unprecedented levels of granularity. Using big data to personalize an offering in real time, however, remains a major challenge. In the mobile advertising industry, once a customer enters the network, an ad-serving decision must be made in a matter of milliseconds. In this work, we describe the design and implementation of an ad-serving algorithm that incorporates machine-learning methods to make personalized ad-serving decisions within milliseconds. We developed this algorithm for Vungle Inc., one of the largest global mobile ad networks. Our approach also …


Vetcompass Australia: A National Big Data Collection System For Veterinary Science, Paul Mcgreevy, Peter Thomson, Navneet K. Dhand, David Raubenheimer, Sophie Masters, Caroline S. Mansfield, Timothy Baldwin, Ricardo J. Soares Magalhaes, Jacquie Rand, Peter Hill, Anne Peaston, James Gilkerson, Martin Combs, Shane Raidal, Peter Irwin, Peter Irons, Richard Squires, David Brodbelt, Jeremy Hammond Sep 2017

Vetcompass Australia: A National Big Data Collection System For Veterinary Science, Paul Mcgreevy, Peter Thomson, Navneet K. Dhand, David Raubenheimer, Sophie Masters, Caroline S. Mansfield, Timothy Baldwin, Ricardo J. Soares Magalhaes, Jacquie Rand, Peter Hill, Anne Peaston, James Gilkerson, Martin Combs, Shane Raidal, Peter Irwin, Peter Irons, Richard Squires, David Brodbelt, Jeremy Hammond

Paul McGreevy, PhD

VetCompass Australia is veterinary medical records-based research coordinated with the global VetCompass endeavor to maximize its quality and effectiveness for Australian companion animals (cats, dogs, and horses). Bringing together all seven Australian veterinary schools, it is the first nationwide surveillance system collating clinical records on companion-animal diseases and treatments. VetCompass data service collects and aggregates real-time, clinical records for
researchers to interrogate, delivering sustainable and cost-effective access to data from hundreds of veterinary practitioners nationwide. Analysis of these clinical records will reveal geographical and temporal trends in the prevalence of inherited and acquired diseases, identify frequently prescribed treatments, revolutionize clinical …


The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez Sep 2017

The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or "BOP". The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don't support interactive visualization of millions of objects, a new platform was needed. The BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA …


Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby Sep 2017

Optimizing Spatiotemporal Analysis Using Multidimensional Indexing With Geowave, Richard Fecher, Michael A. Whitby

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

The open source software GeoWave bridges the gap between geographic information systems and distributed computing. This is done by preserving locality of multidimensional data when indexing it into a single-dimensional key-value store, using space filling curves. This means that like values in each dimension are stored physically close together in the datastore. We demonstrate the efficiencies and benefits of the GeoWave indexing algorithm to store and query billions of spatiotemporal data points. We show how this indexing strategy can be used to reduce query and processing times by multiple orders of magnitude using publicly available taxi trip data published by …


Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale Mar 2017

Ten Simple Rules For Responsible Big Data Research, Matthew Zook, Solon Barocas, Danah Boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, Rachelle Hollander, Barbara A. Koenig, Jacob Metcalf, Arvind Narayanan, Alondra Nelson, Frank Pasquale

Geography Faculty Publications

No abstract provided.


Semantic Inference On Clinical Documents: Combining Machine Learning Algorithms With An Inference Engine For Effective Clinical Diagnosis And Treatment, Shuo Yang, Ran Wei, Jingzhi Guo, Lida Xu Jan 2017

Semantic Inference On Clinical Documents: Combining Machine Learning Algorithms With An Inference Engine For Effective Clinical Diagnosis And Treatment, Shuo Yang, Ran Wei, Jingzhi Guo, Lida Xu

Information Technology & Decision Sciences Faculty Publications

Clinical practice calls for reliable diagnosis and optimized treatment. However, human errors in health care remain a severe issue even in industrialized countries. The application of clinical decision support systems (CDSS) casts light on this problem. However, given the great improvement in CDSS over the past several years, challenges to their wide-scale application are still present, including: 1) decision making of CDSS is complicated by the complexity of the data regarding human physiology and pathology, which could render the whole process more time-consuming by loading big data related to patients; and 2) information incompatibility among different health information systems (HIS) …