Physical Sciences and Mathematics | Open Access Articles

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Nov 2019

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Includes framing, overview, and discussion of the explorations pursued as part of the Digital Libraries, Intelligent Data Analytics, and Augmented Description demonstration project, pursued by members of the Aida digital libraries research team at the University of Nebraska-Lincoln through a research services contract with the Library of Congress. This presentation covered: Aida research team and background for the demonstration project; broad outlines of “Digital Libraries, Intelligent Data Analytics, and Augmented Description”; what changed for us as a research team over the collaboration and why; deliverables of our work; thoughts toward “What next”; and deep-dives into the explorations. The machine learning …

Go to article

Collaborating On Machine Reading: Training Algorithms To Read Complex Collections, Carrie M. Pirmann, Brian R. King, Bhagawat Acharya, Katherine M. Faull Oct 2019

Collaborating On Machine Reading: Training Algorithms To Read Complex Collections, Carrie M. Pirmann, Brian R. King, Bhagawat Acharya, Katherine M. Faull

Bucknell University Digital Scholarship Conference

Interdisciplinary collaboration between two faculty members in the humanities and computer science, a research librarian, and an undergraduate student has led to remarkable results in an ongoing international DH research project that has at its core 18th century manuscripts. The corpus stems from a vast collection of archival materials held by the Moravian Church in the UK, Germany, and the US. The number of pages to be transcribed, differences in handwriting styles, paper quality, and original language pose enormous problems for the feasibility of human transcription. This presentation will review the hypothesis, process, and findings of a summer research project …

Go to article

Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang Aug 2019

Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang

CSE Conference and Workshop Papers

This presentation summarized and presented preliminary results from the first weeks of work conducted by the Aida research team in response to Library of Congress funding notice ID 030ADV19Q0274, “The Library of Congress – Pre-processing Pilot.” It includes overviews of projects on historic document segmentation, document classification, document quality assessment, figure and graph extraction from historic documents, text-line extraction from figures, subject and objective quality assesments, and digitization type differentiation.

Go to article

Rethinking Algorithmic Bias Through Phenomenology And Pragmatism, Johnathan C. Flowers May 2019

Rethinking Algorithmic Bias Through Phenomenology And Pragmatism, Johnathan C. Flowers

Computer Ethics - Philosophical Enquiry (CEPE) Proceedings

In 2017, Amazon discontinued an attempt at developing a hiring algorithm which would enable the company to streamline its hiring processes due to apparent gender discrimination. Specifically, the algorithm, trained on over a decade’s worth of resumes submitted to Amazon, learned to penalize applications that contained references to women, that indicated graduation from all women’s colleges, or otherwise indicated that an applicant was not male. Amazon’s algorithm took up the history of Amazon’s applicant pool and integrated it into its present “problematic situation,” for the purposes of future action. Consequently, Amazon declared the project a failure: even after attempting to …

Go to article

Every Data Point Counts: Political Elections In The Age Of Digital Analytics, Julian Kehle, Samir Naimi May 2019

Every Data Point Counts: Political Elections In The Age Of Digital Analytics, Julian Kehle, Samir Naimi

Honors Thesis

Synthesizing the investigative research and cautionary messages from experts in the fields of technology, political science, and behavioral science, this project explores the ways in which digital analytics has begun to influence the American political arena. Historically, political parties have constructed systems to target voters and win elections. However, rapid changes in the field of technology (such as big data, artificial intelligence, and the prevalence of social media) threaten to undermine the integrity of elections themselves. Future political campaigns will utilize profiling to micro-target individuals in order to manipulate and persuade them with hyper-personalized political content. Most dangerously, the average …

Go to article

Interim Performance Report, Lg‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis For Archival Discovery, March 2019, Elizabeth Lorang, Leen-Kiat Soh, John O'Brien Mar 2019

Interim Performance Report, Lg‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis For Archival Discovery, March 2019, Elizabeth Lorang, Leen-Kiat Soh, John O'Brien

CDRH Grant Reports

The primary goal of "Extending Intelligent Computational Image Analysis for Archival Discovery" is to investigate the use of image analysis as a methodology for content identification, description, and information retrieval in digital libraries and other digitized collections. Building on work started under a National Endowment for the Humanities' Office of Digital Humanities Start-up Grant, our IMLS project seeks to 1) analyze and verify our previously developed image analysis approach and extend it so that it is newspaper agnostic, type agnostic, and language agnostic; 2) scale and revise the intelligent image analysis approach and determine the ideal balance between precision and …

Go to article

Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley Jan 2019

Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley

Articles

This paper examines impressive new applications of legal text analytics in automated contract review, litigation support, conceptual legal information retrieval, and legal question answering against the backdrop of some pressing technological constraints. First, artificial intelligence (Al) programs cannot read legal texts like lawyers can. Using statistical methods, Al can only extract some semantic information from legal texts. For example, it can use the extracted meanings to improve retrieval and ranking, but it cannot yet extract legal rules in logical form from statutory texts. Second, machine learning (ML) may yield answers, but it cannot explain its answers to legal questions or …

Go to article

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang Jan 2019

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang

CSE Technical Reports

This document includes work-in-progress reports submitted to the Library of Congress as part of the Aida digital libraries research team's work on Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project. These work-in-progress reports provide a snapshot glimpse, as well as underlying rationale and decision-making, at various points in the development of the project and its machine learning explorations. Reports cover explorations on historic newspapers, minimally-processed manuscript collections, materials digitized from physical originals and those digitized from microform surrogates, and investigate challenges related to image segmentation and document zoning, classification, document image quality analysis, metadata generation, and more.

Go to article

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Collaborating On Machine Reading: Training Algorithms To Read Complex Collections, Carrie M. Pirmann, Brian R. King, Bhagawat Acharya, Katherine M. Faull

Bucknell University Digital Scholarship Conference

Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang

CSE Conference and Workshop Papers

Rethinking Algorithmic Bias Through Phenomenology And Pragmatism, Johnathan C. Flowers

Computer Ethics - Philosophical Enquiry (CEPE) Proceedings

Every Data Point Counts: Political Elections In The Age Of Digital Analytics, Julian Kehle, Samir Naimi

Honors Thesis

Interim Performance Report, Lg‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis For Archival Discovery, March 2019, Elizabeth Lorang, Leen-Kiat Soh, John O'Brien

CDRH Grant Reports

Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley

Articles

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang

CSE Technical Reports

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers