Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen Aug 2022

Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

This research investigates changes in online behavior of users who publish in multiple communities on Reddit by measuring their toxicity at two levels. With the aid of crowdsourcing, we built a labeled dataset of 10,083 Reddit comments, then used the dataset to train and fine-tune a Bidirectional Encoder Representations from Transformers (BERT) neural network model. The model predicted the toxicity levels of 87,376,912 posts from 577,835 users and 2,205,581,786 comments from 890,913 users on Reddit over 16 years, from 2005 to 2020. This study utilized the toxicity levels of user content to identify toxicity changes by the user within the …


Imagining New Futures Beyond Predictive Systems In Child Welfare: A Qualitative Study With Impacted Stakeholders, Logan Stapleton, Min Hun Lee, Diana Qing, Marya Wright, Alexandra Chouldechova, Ken Holstein, Zhiwei Steven Wu, Haiyi Zhu Jun 2022

Imagining New Futures Beyond Predictive Systems In Child Welfare: A Qualitative Study With Impacted Stakeholders, Logan Stapleton, Min Hun Lee, Diana Qing, Marya Wright, Alexandra Chouldechova, Ken Holstein, Zhiwei Steven Wu, Haiyi Zhu

Research Collection School Of Computing and Information Systems

Child welfare agencies across the United States are turning to datadriven predictive technologies (commonly called predictive analytics) which use government administrative data to assist workers’ decision-making. While some prior work has explored impacted stakeholders’ concerns with current uses of data-driven predictive risk models (PRMs), less work has asked stakeholders whether such tools ought to be used in the first place. In this work, we conducted a set of seven design workshops with 35 stakeholders who have been impacted by the child welfare system or who work in it to understand their beliefs and concerns around PRMs, and to engage them …


Moving Toward Personalized Law, Cary Coglianese Mar 2022

Moving Toward Personalized Law, Cary Coglianese

All Faculty Scholarship

Rules operate as a tool of governance by making generalizations, thereby cutting down on government officials’ need to make individual determinations. But because they are generalizations, rules can result in inefficient or perverse outcomes due to their over- and under-inclusiveness. With the aid of advances in machine-learning algorithms, however, it is becoming increasingly possible to imagine governments shifting away from a predominant reliance on general rules and instead moving toward increased reliance on precise individual determinations—or on “personalized law,” to use the term Omri Ben-Shahar and Ariel Porat use in the title of their 2021 book. Among the various technological, …


Landslide Detection In The Himalayas Using Machine Learning Algorithms And U-Net, Sansar Raj Meena, Lucas Pedrosa Soares, Carlos H. Grohmann, Cees Van Westen, Kushanav Bhuyan, Ramesh P. Singh, Mario Floris, Filippo Catani Feb 2022

Landslide Detection In The Himalayas Using Machine Learning Algorithms And U-Net, Sansar Raj Meena, Lucas Pedrosa Soares, Carlos H. Grohmann, Cees Van Westen, Kushanav Bhuyan, Ramesh P. Singh, Mario Floris, Filippo Catani

Biology, Chemistry, and Environmental Sciences Faculty Articles and Research

Event-based landslide inventories are essential sources to broaden our understanding of the causal relationship between triggering events and the occurring landslides. Moreover, detailed inventories are crucial for the succeeding phases of landslide risk studies like susceptibility and hazard assessment. The openly available inventories differ in the quality and completeness levels. Event-based landslide inventories are created based on manual interpretation, and there can be significant differences in the mapping preferences among interpreters. To address this issue, we used two different datasets to analyze the potential of U-Net and machine learning approaches for automated landslide detection in the Himalayas. Dataset-1 is composed …


Algorithm Vs. Algorithm, Cary Coglianese, Alicia Lai Jan 2022

Algorithm Vs. Algorithm, Cary Coglianese, Alicia Lai

All Faculty Scholarship

Critics raise alarm bells about governmental use of digital algorithms, charging that they are too complex, inscrutable, and prone to bias. A realistic assessment of digital algorithms, though, must acknowledge that government is already driven by algorithms of arguably greater complexity and potential for abuse: the algorithms implicit in human decision-making. The human brain operates algorithmically through complex neural networks. And when humans make collective decisions, they operate via algorithms too—those reflected in legislative, judicial, and administrative processes. Yet these human algorithms undeniably fail and are far from transparent. On an individual level, human decision-making suffers from memory limitations, fatigue, …


A Synthetic Prediction Market For Estimating Confidence In Published Work, Sarah Rajtmajer, Christopher Griffin, Jian Wu, Robert Fraleigh, Laxmann Balaji, Anna Squicciarini, Anthony Kwasnica, David Pennock, Michael Mclaughlin, Timothy Fritton, Nishanth Nakshatri, Arjun Menon, Sai Ajay Modukuri, Rajal Nivargi, Xin Wei, Lee Giles Jan 2022

A Synthetic Prediction Market For Estimating Confidence In Published Work, Sarah Rajtmajer, Christopher Griffin, Jian Wu, Robert Fraleigh, Laxmann Balaji, Anna Squicciarini, Anthony Kwasnica, David Pennock, Michael Mclaughlin, Timothy Fritton, Nishanth Nakshatri, Arjun Menon, Sai Ajay Modukuri, Rajal Nivargi, Xin Wei, Lee Giles

Computer Science Faculty Publications

[First paragraph] Concerns about the replicability, robustness and reproducibility of findings in scientific literature have gained widespread attention over the last decade in the social sciences and beyond. This attention has been catalyzed by and has likewise motivated a number of large-scale replication projects which have reported successful replication rates between 36% and 78%. Given the challenges and resources required to run high-powered replication studies, researchers have sought other approaches to assess confidence in published claims. Initial evidence has supported the promise of prediction markets in this context. However, they require the coordinated, sustained effort of collections of human experts …


From Negative To Positive Algorithm Rights, Cary Coglianese, Kat Hefter Jan 2022

From Negative To Positive Algorithm Rights, Cary Coglianese, Kat Hefter

All Faculty Scholarship

Artificial intelligence, or “AI,” is raising alarm bells. Advocates and scholars propose policies to constrain or even prohibit certain AI uses by governmental entities. These efforts to establish a negative right to be free from AI stem from an understandable motivation to protect the public from arbitrary, biased, or unjust applications of algorithms. This movement to enshrine protective rights follows a familiar pattern of suspicion that has accompanied the introduction of other technologies into governmental processes. Sometimes this initial suspicion of a new technology later transforms into widespread acceptance and even a demand for its use. In this paper, we …


Antitrust By Algorithm, Cary Coglianese, Alicia Lai Jan 2022

Antitrust By Algorithm, Cary Coglianese, Alicia Lai

All Faculty Scholarship

Technological innovation is changing private markets around the world. New advances in digital technology have created new opportunities for subtle and evasive forms of anticompetitive behavior by private firms. But some of these same technological advances could also help antitrust regulators improve their performance in detecting and responding to unlawful private conduct. We foresee that the growing digital complexity of the marketplace will necessitate that antitrust authorities increasingly rely on machine-learning algorithms to oversee market behavior. In making this transition, authorities will need to meet several key institutional challenges—building organizational capacity, avoiding legal pitfalls, and establishing public trust—to ensure successful …


Machine Learning Land Cover And Land Use Classification Of 4-Band Satellite Imagery, Lorelei Turner [*], Torrey J. Wagner, Paul Auclair, Brent T. Langhals Jan 2022

Machine Learning Land Cover And Land Use Classification Of 4-Band Satellite Imagery, Lorelei Turner [*], Torrey J. Wagner, Paul Auclair, Brent T. Langhals

Faculty Publications

Land-cover and land-use classification generates categories of terrestrial features, such as water or trees, which can be used to track how land is used. This work applies classical, ensemble and neural network machine learning algorithms to a multispectral remote sensing dataset containing 405,000 28x28 pixel image patches in 4 electromagnetic frequency bands. For each algorithm, model metrics and prediction execution time were evaluated, resulting in two families of models; fast and precise. The prediction time for an 81,000-patch group of predictions wasmodels, and >5s for the precise models, and there was not a significant change in prediction time when a …


Taming The Data In The Internet Of Vehicles, Shahab Tayeb Jan 2022

Taming The Data In The Internet Of Vehicles, Shahab Tayeb

Mineta Transportation Institute

As an emerging field, the Internet of Vehicles (IoV) has a myriad of security vulnerabilities that must be addressed to protect system integrity. To stay ahead of novel attacks, cybersecurity professionals are developing new software and systems using machine learning techniques. Neural network architectures improve such systems, including Intrusion Detection System (IDSs), by implementing anomaly detection, which differentiates benign data packets from malicious ones. For an IDS to best predict anomalies, the model is trained on data that is typically pre-processed through normalization and feature selection/reduction. These pre-processing techniques play an important role in training a neural network to optimize …