Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics

Big data

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 44

Full-Text Articles in Social and Behavioral Sciences

In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn Jan 2024

In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn

Marketing Faculty Publications

[Introduction] Today's most mature, most sophisticated, best-in-class forecasting is what we call consumption-based forecasting (CBF). In contrast, the least sophisticated companies typically do not forecast at all, but rather set financial targets based on management expectations. Companies beginning to use statistical forecasting techniques usually take a supply-centric orientation, relying on time series techniques applied to shipment and/or order history. The next stage of progression is to incorporate promotions data, economic data, and market data alongside supply-centric data so that regression and other advanced analytics can be used. Companies pursing CBF utilize even more advanced capabilities to capture, examine, and understand …


Regulating Machine Learning: The Challenge Of Heterogeneity, Cary Coglianese Feb 2023

Regulating Machine Learning: The Challenge Of Heterogeneity, Cary Coglianese

All Faculty Scholarship

Machine learning, or artificial intelligence, refers to a vast array of different algorithms that are being put to highly varied uses, including in transportation, medicine, social media, marketing, and many other settings. Not only do machine-learning algorithms vary widely across their types and uses, but they are evolving constantly. Even the same algorithm can perform quite differently over time as it is fed new data. Due to the staggering heterogeneity of these algorithms, multiple regulatory agencies will be needed to regulate the use of machine learning, each within their own discrete area of specialization. Even these specialized expert agencies, though, …


Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts Jan 2023

Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts

Political Science & Geography Faculty Publications

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random …


Strategic Perspective Of Leveraging New Generation Information Technology To Enable Modernization Of Emergency Management, Haibo Zhang, Xinyu Dai, Depei Qian, Jian Lyu Dec 2022

Strategic Perspective Of Leveraging New Generation Information Technology To Enable Modernization Of Emergency Management, Haibo Zhang, Xinyu Dai, Depei Qian, Jian Lyu

Bulletin of Chinese Academy of Sciences (Chinese Version)

The application and development of the new generation information technology is a vital support to realize the modernization of emergency management. At present, the new generation information technology such as big data and artificial intelligence has been widely used in natural disasters, safe production, and other fields. It has improved the monitoring and early warning, regulation and law enforcement, command and decision support, rescue, and social mobilization capabilities of governments, promoted the level of intrinsic safety of enterprises, provided important support for the precise prevention and control of the COVID-19, and increased the efficiency of China’s emergency management and sense …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu Jan 2022

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions …


Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma Dec 2021

Data-Driven Operational And Safety Analysis Of Emerging Shared Electric Scooter Systems, Qingyu Ma

Computational Modeling & Simulation Engineering Theses & Dissertations

The rapid rise of shared electric scooter (E-Scooter) systems offers many urban areas a new micro-mobility solution. The portable and flexible characteristics have made E-Scooters a competitive mode for short-distance trips. Compared to other modes such as bikes, E-Scooters allow riders to freely ride on different facilities such as streets, sidewalks, and bike lanes. However, sharing lanes with vehicles and other users tends to cause safety issues for riding E-Scooters. Conventional methods are often not applicable for analyzing such safety issues because well-archived historical crash records are not commonly available for emerging E-Scooters.

Perceiving the growth of such a micro-mobility …


Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber Jul 2021

Big Issues For Big Data: Challenges For Critical Spatial Data Analytics, Chris Brunsdon, Alexis Comber

Journal of Spatial Information Science

In this paper we consider some of the issues of working with big data and big spatial data and highlight the need for an open and critical framework. We focus on a set of challenges underlying the collection and analysis of big data. In particular, we consider 1) inference when working with usually biased big data, challenging the assumed inferential superiority of data with observations, n, approaching N, the population n -> N. We also emphasise 2) the need for analyses that answer questions of practical significance or with greater emphasis on the size of the effect, rather than the …


Geoai: Where Machine Learning And Big Data Converge In Giscience, Wenwen Li Jul 2021

Geoai: Where Machine Learning And Big Data Converge In Giscience, Wenwen Li

Journal of Spatial Information Science

In this paper GeoAI is introduced as an emergent spatial analytical framework for data-intensive GIScience. As the new fuel of geospatial research, GeoAI leverages recent breakthroughs in machine learning and advanced computing to achieve scalable processing and intelligent analysis of geospatial big data. The three-pillar view of GeoAI, its two methodological threads (data-driven and knowledge-driven), as well as their geospatial applications are highlighted. The paper concludes with discussion of remaining challenges and future research directions of GeoAI.


Spatio-Temporal Visual Analytics: A Vision For 2020s, Natalia Andrienko, Gennady Andrienko Jul 2021

Spatio-Temporal Visual Analytics: A Vision For 2020s, Natalia Andrienko, Gennady Andrienko

Journal of Spatial Information Science

Visual analytics is a research discipline that is based on acknowledging the power and the necessity of the human vision, understanding, and reasoning in data analysis and problem solving. Visual analytics develops methods, analytical workflows, and software tools for analysing data of various types, particularly, spatio-temporal data, which can describe the processes going on in the environment, society, and economy. We briefly overview the achievements of the visual analytics research concerning spatio-temporal data analysis and discuss the major open problems.


On The Semantics Of Big Earth Observation Data For Land Classification, Gilberto Camara Jul 2021

On The Semantics Of Big Earth Observation Data For Land Classification, Gilberto Camara

Journal of Spatial Information Science

This paper discusses the challenges of using big Earth observation data for land classification. The approach taken is to consider pure data-driven methods to be insufficient to represent continuous change. I argue for sound theories when working with big data. After revising existing classification schemes such as FAO's Land Cover Classification System (LCCS), I conclude that LCCS and similar proposals cannot capture the complexity of landscape dynamics. I then investigate concepts that are being used for analyzing satellite image time series; I show these concepts to be instances of events. Therefore, for continuous monitoring of land change, event recognition needs …


Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook Feb 2021

Big Data: Ethics, Resources, And Potential Collaboration, Matthew Zook

Geography Presentations

This presentation goes over 10 simple rules for responsible big data research.


The State Of #Digitalentrepreneurship: A Big Data Leximancer Analysis Of Social Media Activity, Violetta Wilk, Helen Cripps, Alexandru Capatina, Adrian Micu, Angela-Eliza Micu Jan 2021

The State Of #Digitalentrepreneurship: A Big Data Leximancer Analysis Of Social Media Activity, Violetta Wilk, Helen Cripps, Alexandru Capatina, Adrian Micu, Angela-Eliza Micu

Research outputs 2014 to 2021

This paper examined online sentiment, key themes and patterns evident in social media activity about digital entrepreneurship. It provides a snapshot-in-time, visual-first perspective on social media user-generated-content (UGC) to better understand the topic of digital entrepreneurship. Global data consisting of 31,017 publicly available UGC which used the #digitalentrepreneurship (hashtag) and the keywords ‘digital entrepreneurship’ were collected. A computer assisted qualitative data analysis software (CAQDAS), Leximancer, was used for an automated text-mining analysis. There is positive online sentiment surrounding digital entrepreneurship technology, ecosystem and industry, and one which promotes women transformation of digital entrepreneurship globally. Negative sentiment pointed out that future …


Administrative Law In The Automated State, Cary Coglianese Jan 2021

Administrative Law In The Automated State, Cary Coglianese

All Faculty Scholarship

In the future, administrative agencies will rely increasingly on digital automation powered by machine learning algorithms. Can U.S. administrative law accommodate such a future? Not only might a highly automated state readily meet longstanding administrative law principles, but the responsible use of machine learning algorithms might perform even better than the status quo in terms of fulfilling administrative law’s core values of expert decision-making and democratic accountability. Algorithmic governance clearly promises more accurate, data-driven decisions. Moreover, due to their mathematical properties, algorithms might well prove to be more faithful agents of democratic institutions. Yet even if an automated state were …


Human Trafficking In Nepal: Can Big Data Help?, Shushant Khanal Aug 2020

Human Trafficking In Nepal: Can Big Data Help?, Shushant Khanal

Undergraduate Research Journal

This paper provides an overview of human trafficking in Nepal, identifies strategies implemented by the government of the country to handle the problem and possibilities of using big data as a solution to the problem of human trafficking in Nepal. Big data, may be defined as the collection of a large volume of data from the past that is processed using machine learning and artificial intelligence to find a common pattern. The use of big data in tackling the problem of human trafficking is not new in developed countries like the United States but it is still a foreign idea …


Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church Jul 2020

Big Data, Spatial Optimization, And Planning, Kai Cao, Wenwen Li, Richard Church

Research Collection School Of Computing and Information Systems

Spatial optimization represents a set of powerful spatial analysis techniques that can be used to identify optimal solution(s) and even generate a large number of competitive alternatives. The formulation of such problems involves maximizing or minimizing one or more objectives while satisfying a number of constraints. Solution techniques range from exact models solved with such approaches as linear programming and integer programming, or heuristic algorithms, i.e. Tabu Search, Simulated Annealing, and Genetic Algorithms. Spatial optimization techniques have been utilized in numerous planning applications, such as location-allocation modeling/site selection, land use planning, school districting, regionalization, routing, and urban design. These methods …


Prospects And Challenges Of Population Health With Online And Other Big Data In Africa; Understanding The Link To Improving Healthcare Service Delivery, Rowland Edet, Bolarinwa Afolabi Jan 2020

Prospects And Challenges Of Population Health With Online And Other Big Data In Africa; Understanding The Link To Improving Healthcare Service Delivery, Rowland Edet, Bolarinwa Afolabi

Department of Sociology: Faculty Publications

Big data analytics offers promises to many health care service challenges and can provide answers to many population health issues. Big data is having a positive impact in almost every sphere of life in more advanced world while developing countries are striving to meet up. Even though healthcare systems in the developed world are recording some breakthroughs due to the application of big data, it is important to research the impact of big data in developing regions of the world, such as Africa and identify its peculiar needs. The purpose of this review was to summarize the challenges faced by …


Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe Jan 2020

Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe

Engineering Management & Systems Engineering Faculty Publications

Special information has a significant role in disaster management. Land cover mapping can detect short- and long-term changes and monitor the vulnerable habitats. It is an effective evaluation to be included in the disaster management system to protect the conservation areas. The critical visual and statistical information presented to the decision-makers can help in mitigation or adaption before crossing a threshold. This paper aims to contribute in the academic and the practice aspects by offering a potential solution to enhance the disaster data source effectiveness. The key research question that the authors try to answer in this paper is how …


“Where’S The I-O?” Artificial Intelligence And Machine Learning In Talent Management Systems, Manuel F. Gonzalez, John F. Capman, Frederick L. Oswald, Evan R. Theys, David L. Tomczak Nov 2019

“Where’S The I-O?” Artificial Intelligence And Machine Learning In Talent Management Systems, Manuel F. Gonzalez, John F. Capman, Frederick L. Oswald, Evan R. Theys, David L. Tomczak

Personnel Assessment and Decisions

Artificial intelligence (AI) and machine learning (ML) have seen widespread adoption by organizations seeking to identify and hire high-quality job applicants. Yet the volume, variety, and velocity of professional involvement among I-O psychologists remains relatively limited when it comes to developing and evaluating AI/ML applications for talent assessment and selection. Furthermore, there is a paucity of empirical research that investigates the reliability, validity, and fairness of AI/ML tools in organizational contexts. To stimulate future involvement and research, we share our review and perspective on the current state of AI/ML in talent assessment as well as its benefits and potential pitfalls; …


Cs + Sociology: Using Big Data To Identify And Understand Educational Inequality In America (1), Joseph Cleary, Elin Waring Jun 2019

Cs + Sociology: Using Big Data To Identify And Understand Educational Inequality In America (1), Joseph Cleary, Elin Waring

Open Educational Resources

This is the first of two lessons/labs for teaching and learning of computer science and sociology. Either and be used on their own or they can be used in sequence, in which case this should be used first.

Students will develop CS skills and behaviors including but not limited to: learning what an API is, learning how to access and utilize data on an API, and developing their R coding skills and knowledge. Students will also learn basic, but important, sociological principles such as how poverty is related to educational opportunities in America. Although prior knowledge of CS and sociology …


What To Do When Privacy Is Gone, James Brusseau May 2019

What To Do When Privacy Is Gone, James Brusseau

Computer Ethics - Philosophical Enquiry (CEPE) Proceedings

Today’s ethics of privacy is largely dedicated to defending personal information from big data technologies. This essay goes in the other direction. It considers the struggle to be lost, and explores two strategies for living after privacy is gone. First, total exposure embraces privacy’s decline, and then contributes to the process with transparency. All personal information is shared without reservation. The resulting ethics is explored through a big data version of Robert Nozick’s Experience Machine thought experiment. Second, transient existence responds to privacy’s loss by ceaselessly generating new personal identities, which translates into constantly producing temporarily unviolated private information. The …


Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker Apr 2019

Google Trends Data As A Proxy For Interest In Leadership, Finley W. Walker

Doctor of Education (Ed.D)

The purpose of this quantitative study was to investigate the observable patterns of online search behavior in the topic of leadership using Google Trends data. Institutions have had a historically difficult time predicting good leadership candidates. Better predictions can be made by using the big data offered by groups such as Google to learn who, where, and when people are interested in leadership. The study utilized descriptive, comparative, and correlative methodologies to study Google users’ interest in leadership from 2004 to 2017. Society has placed great value into leadership throughout history, and though overall interest remains strong, it appears that …


The Paradox Of Big Data, Gary N. Smith Jan 2019

The Paradox Of Big Data, Gary N. Smith

Pomona Economics

Data-mining is often used to discover patterns in Big Data. It is tempting believe that because an unearthed pattern is unusual it must be meaningful, but patterns are inevitable in Big Data and usually meaningless. The paradox of Big Data is that data mining is most seductive when there are a large number of variables, but a large number of variables exacerbates the perils of data mining.


Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi Jan 2019

Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi

Copyright, Fair Use, Scholarly Communication, etc.

Recently, big data investment has become important for organizations, especially with the fast growth of data following the huge expansion in the usage of social media applications, and websites. Many organizations depend on extracting and reaching the needed reports and statistics. As the investments on big data and its storage have become major challenges for organizations, many technologies and methods have been developed to tackle those challenges.

One of such technologies is Hadoop, a framework that is used to divide big data into packages and distribute those packages through nodes to be processed, consuming less cost than the traditional storage …


Transparency And Algorithmic Governance, Cary Coglianese, David Lehr Jan 2019

Transparency And Algorithmic Governance, Cary Coglianese, David Lehr

All Faculty Scholarship

Machine-learning algorithms are improving and automating important functions in medicine, transportation, and business. Government officials have also started to take notice of the accuracy and speed that such algorithms provide, increasingly relying on them to aid with consequential public-sector functions, including tax administration, regulatory oversight, and benefits administration. Despite machine-learning algorithms’ superior predictive power over conventional analytic tools, algorithmic forecasts are difficult to understand and explain. Machine learning’s “black-box” nature has thus raised concern: Can algorithmic governance be squared with legal principles of governmental transparency? We analyze this question and conclude that machine-learning algorithms’ relative inscrutability does not pose a …


Machine Learning For Ecosystem Services, Simon Willcock, Javier Martínez-López, Danny A.P. Hooftman, Kenneth J. Bagstad, Stefano Balbi, Alessia Marzo, Carlo Prato, Saverio Sciandrello, Giovanni Signorello Oct 2018

Machine Learning For Ecosystem Services, Simon Willcock, Javier Martínez-López, Danny A.P. Hooftman, Kenneth J. Bagstad, Stefano Balbi, Alessia Marzo, Carlo Prato, Saverio Sciandrello, Giovanni Signorello

Rubenstein School of Environment and Natural Resources Faculty Publications

Recent developments in machine learning have expanded data-driven modelling (DDM) capabilities, allowing artificial intelligence to infer the behaviour of a system by computing and exploiting correlations between observed variables within it. Machine learning algorithms may enable the use of increasingly available ‘big data’ and assist applying ecosystem service models across scales, analysing and predicting the flows of these services to disaggregated beneficiaries. We use the Weka and ARIES software to produce two examples of DDM: firewood use in South Africa and biodiversity value in Sicily, respectively. Our South African example demonstrates that DDM (64–91% accuracy) can identify the areas where …


Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman May 2018

Big Data For Climate Change Actions And The Paradox Of Citizen Informedness, Kustini Lim-Wavde, Robert J. Kauffman

Research Collection School Of Computing and Information Systems

Advanced sensor technology, social media, and other information technologies have provided us with “big data” on climate change. Due to the World Meteorological Organization’s Global Climate Observing System, climate observations and records, as well as discussions on climate-related concerns such as measurement of air temperature, are widely available now. The United Nations’ Global Pulse visualises public engagement on climate change globally, with data such as the volume of climate-related tweets. Big data, data analytics, and the sharing of scientific results in the popular press have created, as a result, an unprecedented level of citizen informedness—the degree to which citizens have …


Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan Jan 2018

Analytic Extensions To The Data Model For Management Analytics And Decision Support In The Big Data Environment, Nsikak Etim Akpakpan

Walden Dissertations and Doctoral Studies

From 2006 to 2016, an estimated average of 50% of big data analytics and decision support projects failed to deliver acceptable and actionable outputs to business users. The resulting management inefficiency came with high cost, and wasted investments estimated at $2.7 trillion in 2016 for companies in the United States. The purpose of this quantitative descriptive study was to examine the data model of a typical data analytics project in a big data environment for opportunities to improve the information created for management problem-solving. The research questions focused on finding artifacts within enterprise data to model key business scenarios for …


Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena Oct 2017

Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena

Dissertations (1934 -)

The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as …


The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez Sep 2017

The Billion Object Platform (Bop): A System To Lower Barriers To Support Big, Streaming, Spatio-Temporal Data Sources, Devika Kakkar, Ben Lewis, David Smiley, Ariel Nunez

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

With funding from the Sloan Foundation and Harvard Dataverse, the Harvard Center for Geographic Analysis (CGA) has developed a big spatio-temporal data visualization platform called the Billion Object Platform or "BOP". The goal of the project is to lower barriers for scholars who wish to access large, streaming, spatio-temporal datasets. Since once archived, streaming data gets big fast, and since most GIS systems don't support interactive visualization of millions of objects, a new platform was needed. The BOP is loaded with the latest billion geo-tweets and is fed a real-time stream of about 1 million tweets per day. The CGA …