Library and Information Science | Open Access Articles

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …

Go to article

Why We Should Remember The Soviet Information Age?, Ksenia Tatarchenko Oct 2022

Why We Should Remember The Soviet Information Age?, Ksenia Tatarchenko

Research Collection College of Integrative Studies

How to navigate the rapidly changing digital geopolitics of the world today? How do we make sense of digital transformation and its many social, political, cultural, and environmental implications at different locations around the world?

Go to article

Building Capacity For Data-Driven Scholarship, Jamie Rogers Mar 2022

Building Capacity For Data-Driven Scholarship, Jamie Rogers

Works of the FIU Libraries

This talk provides an overview of "dLOC as Data: A Thematic Approach to Caribbean Newspapers," an initiative developed to increase access to digitized Caribbean newspaper text for bulk download, facilitating computational analysis. Capacity building for future research in Caribbean Studies being a crucial aspect of this initiative, a thematic toolkit was developed to facilitate use of the project data as well as provide replicable processes. The toolkit includes sample text analysis projects, as well as tutorials and detailed project documentation. While the toolkit focuses on the history of hurricanes and tropical cyclones of the region, the methodologies and tools used …

Go to article

Campus Mobile History Application, Drew Adan, Christine Sears Jan 2022

Campus Mobile History Application, Drew Adan, Christine Sears

Summer Community of Scholars (RCEU and HCR) Project Proposals

No abstract provided.

Go to article

Law Library Blog (January 2022): Legal Beagle's Blog Archive, Roger Williams University School Of Law Jan 2022

Law Library Blog (January 2022): Legal Beagle's Blog Archive, Roger Williams University School Of Law

Law Library Newsletters/Blog

No abstract provided.

Go to article

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …

Go to article

Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles Jan 2022

Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles

Computer Science Faculty Publications

Recently, the Allen Institute for Artificial Intelligence released the Semantic Scholar Open Research Corpus (S2ORC), one of the largest open-access scholarly big datasets with more than 130 million scholarly paper records. S2ORC contains a significant portion of automatically generated metadata. The metadata quality could impact downstream tasks such as citation analysis, citation prediction, and link analysis. In this project, we assess the document linking quality and estimate the document conflation rate for the S2ORC dataset. Using semi-automatically curated ground truth corpora, we estimated that the overall document linking quality is high, with 92.6% of documents correctly linking to six major …

Go to article

Guide To The Dr. L.S. Dederick Papers, 1908-1956, Undated, Orson Kingsley, Patrick Koetsch Jan 2022

Guide To The Dr. L.S. Dederick Papers, 1908-1956, Undated, Orson Kingsley, Patrick Koetsch

Archives & Special Collections Finding Aids

Louis Serle (L.S.) Dederick was born in Chicago in 1883. He received his Ph.D. in Mathematics from Harvard University in 1909. From 1909 – 1917 he was a professor at Princeton University. From 1917 – 1924 he was professor at the U.S. Naval Academy in Annapolis, Maryland. In 1926 Dederick began working for the U.S. Army, Ordnance. During his time there he was the Associate Director of the Ballistic Research Laboratory at the Aberdeen Proving Grounds in Aberdeen, Maryland where he focused on ballistics research.

While Dederick worked as a mathematician at the Aberdeen Proving Grounds, he was involved with …

Go to article

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …

Go to article

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu Jan 2022

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions …

Go to article

Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna Jan 2022

Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna

Computer Science Faculty Publications

Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to transmit informative metadata alongside data may allow such workflows to intelligently consume data, propagate metadata to downstream tasks, and thereby auto-generate reusable, reproducible analytic outputs with zero supervision. Moreover, a visual programming interface to design, develop, and execute such workflows may allow rapid prototyping for interdisciplinary research. Capitalizing on these ideas, we propose StreamingHub, a framework to build metadata propagating, interactive stream analysis workflows using visual programming. We conduct …

Go to article

Library and Information Science Commons^™

Full-Text Articles in Library and Information Science

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Why We Should Remember The Soviet Information Age?, Ksenia Tatarchenko

Research Collection College of Integrative Studies

Building Capacity For Data-Driven Scholarship, Jamie Rogers

Works of the FIU Libraries

Campus Mobile History Application, Drew Adan, Christine Sears

Summer Community of Scholars (RCEU and HCR) Project Proposals

Law Library Blog (January 2022): Legal Beagle's Blog Archive, Roger Williams University School Of Law

Law Library Newsletters/Blog

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles

Computer Science Faculty Publications

Guide To The Dr. L.S. Dederick Papers, 1908-1956, Undated, Orson Kingsley, Patrick Koetsch

Archives & Special Collections Finding Aids

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna

Computer Science Faculty Publications