Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Migrating 120,000 Legacy Publications From Several Systems Into A Current Research Information System Using Advanced Data Wrangling Techniques, Yrjö Lappalainen, Matti Lassila, Tanja Heikkilä, Jani Nieminen, Tapani Lehtilä Nov 2023

Migrating 120,000 Legacy Publications From Several Systems Into A Current Research Information System Using Advanced Data Wrangling Techniques, Yrjö Lappalainen, Matti Lassila, Tanja Heikkilä, Jani Nieminen, Tapani Lehtilä

All Works

This article describes a complex CRIS (current research information system) implementation project involving the migration of around 120,000 legacy publication records from three different systems. The project, undertaken by Tampere University, encountered several challenges in data diversity, data quality, and resource allocation. To handle the extensive and heterogenous dataset, innovative approaches such as machine learning techniques and various data wrangling tools were used to process data, correct errors, and merge information from different sources. Despite significant delays and unforeseen obstacles, the project was ultimately successful in achieving its goals. The project served as a valuable learning experience, highlighting the importance …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Application Of Artificial Intelligence And Machine Learning In Libraries: A Systematic Review, Rajesh Kumar Das, Mohammad Sharif Ul Islam Aug 2021

Application Of Artificial Intelligence And Machine Learning In Libraries: A Systematic Review, Rajesh Kumar Das, Mohammad Sharif Ul Islam

Library Philosophy and Practice (e-journal)

As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries. To achieve the objectives of the study, a systematic literature review was conducted based on the original guidelines proposed by Kitchenham et al. (2009). Data was collected from Web of Science, Scopus, LISA and LISTA databases. Following the rigorous/ established selection process, a total of thirty-two articles were …


Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones Jul 2021

Improving Collection Understanding For Web Archives With Storytelling: Shining Light Into Dark And Stormy Archives, Shawn M. Jones

Computer Science Theses & Dissertations

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding …


Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian Jun 2021

Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian

USF Tampa Graduate Theses and Dissertations

With the proliferation of smart, handheld devices, there has been a multifold increase in the ability of firms to target and engage with customers through mobile advertising. Therefore, not surprisingly, mobile advertising campaigns have become an integral aspect of firms’ brand building activities, such as improving the awareness and overall visibility of firms' brands. In addition, retailers are increasingly using mobile advertising for targeted promotional activities that increase in-store visits and eventual sales conversions. However, in recent years, mobile or in general online advertising campaigns have been facing one major challenge and one major threat that can negatively impact the …


Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak Nov 2020

Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak

USF Tampa Graduate Theses and Dissertations

The commercial platforms that use recommender systems can collect relevant information to produce useful recommendations to the platform users. However, these sources usually contain missing values, imbalanced and heterogeneous data, and noisy observations. Such characteristics render the process of exploiting the information nontrivial, as one should carefully address them during the data fusion process. In addition to the degenerative characteristics, some entries can be fake, i.e., they can be the outcomes of malicious intents to manipulate the system. These entries should be eliminated before incorporation to any recommendation task. Detecting such malicious attacks quickly and accurately and then mitigating them …


Literature Review: How U.S. Government Documents Are Addressing The Increasing National Security Implications Of Artificial Intelligence, Bert Chapman Jun 2020

Literature Review: How U.S. Government Documents Are Addressing The Increasing National Security Implications Of Artificial Intelligence, Bert Chapman

Libraries Faculty and Staff Scholarship and Research

This article emphasizes the increasing importance of artificial intelligence (AI) in military and national security policy making. It seeks to inform interested individuals about the proliferation of publicly accessible U.S. government and military literature on this multifaceted topic. An additional objective of this endeavor is encouraging greater public awareness of and participation in emerging public policy debate on AI's moral and national security implications..


Harnessing Artificial Intelligence Capabilities To Improve Cybersecurity, Sherali Zeadally, Erwin Adi, Zubair Baig, Imran A. Khan Jan 2020

Harnessing Artificial Intelligence Capabilities To Improve Cybersecurity, Sherali Zeadally, Erwin Adi, Zubair Baig, Imran A. Khan

Information Science Faculty Publications

Cybersecurity is a fast-evolving discipline that is always in the news over the last decade, as the number of threats rises and cybercriminals constantly endeavor to stay a step ahead of law enforcement. Over the years, although the original motives for carrying out cyberattacks largely remain unchanged, cybercriminals have become increasingly sophisticated with their techniques. Traditional cybersecurity solutions are becoming inadequate at detecting and mitigating emerging cyberattacks. Advances in cryptographic and Artificial Intelligence (AI) techniques (in particular, machine learning and deep learning) show promise in enabling cybersecurity experts to counter the ever-evolving threat posed by adversaries. Here, we explore AI's …


Digital Libraries, Intelligent Data Analytics, And Augmented Description: A Demonstration Project, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Jan 2020

Digital Libraries, Intelligent Data Analytics, And Augmented Description: A Demonstration Project, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

UNL Libraries: Faculty Publications

From July 16-to November 8, 2019, the Aida digital libraries research team at the University of Nebraska-Lincoln collaborated with the Library of Congress on “Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project.“ This demonstration project sought to (1) develop and investigate the viability and feasibility of textual and image-based data analytics approaches to support and facilitate discovery; (2) understand technical tools and requirements for the Library of Congress to improve access and discovery of its digital collections; and (3) enable the Library of Congress to plan for future possibilities. In pursuit of these goals, we focused our …


Final Presentation To The Library Of Congress On Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Jan 2020

Final Presentation To The Library Of Congress On Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation to Library of Congress staff, delivered onsite on January 10, 2020, presents a tour through the demonstration project pursued by the Aida digital libraries research team with the Library of Congress in 2019-2020. In addition to providing an overview and analysis of the specific machine learning projects scoped and explored, this presentation includes a number of high-level take-aways and recommendations designed to influence and inform the Library of Congress's machine learning efforts going forward.


Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Nov 2019

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Includes framing, overview, and discussion of the explorations pursued as part of the Digital Libraries, Intelligent Data Analytics, and Augmented Description demonstration project, pursued by members of the Aida digital libraries research team at the University of Nebraska-Lincoln through a research services contract with the Library of Congress. This presentation covered: Aida research team and background for the demonstration project; broad outlines of “Digital Libraries, Intelligent Data Analytics, and Augmented Description”; what changed for us as a research team over the collaboration and why; deliverables of our work; thoughts toward “What next”; and deep-dives into the explorations. The machine learning …


What Do You Mean? Research In The Age Of Machines, Arthur J. Boston Nov 2019

What Do You Mean? Research In The Age Of Machines, Arthur J. Boston

Faculty & Staff Research and Creative Activity

What Do You Mean?” was an undeniable bop of its era in which Justin Bieber explores the ambiguities of romantic communication. (I pinky promise this will soon make sense for scholarly communication librarians interested in artificial intelligence [AI].) When the single hit airwaves in 2015, there was a meta-debate over what Bieber meant to add to public discourse with lyrics like “What do you mean? Oh, oh, when you nod your head yes, but you wanna say no.” It is unlikely Bieber had consent culture in mind, but the failure of his songwriting team to take into account that some …


Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang Aug 2019

Document Images And Machine Learning: A Collaboratory Between The Library Of Congress And The Image Analysis For Archival Discovery (Aida) Lab At The University Of Nebraska, Lincoln, Ne, Yi Liu, Chulwoo Pack, Leen-Kiat Soh, Elizabeth Lorang

CSE Conference and Workshop Papers

This presentation summarized and presented preliminary results from the first weeks of work conducted by the Aida research team in response to Library of Congress funding notice ID 030ADV19Q0274, “The Library of Congress – Pre-processing Pilot.” It includes overviews of projects on historic document segmentation, document classification, document quality assessment, figure and graph extraction from historic documents, text-line extraction from figures, subject and objective quality assesments, and digitization type differentiation.


Improved Evolutionary Support Vector Machine Classifier For Coronary Artery Heart Disease Prediction Among Diabetic Patients, Narasimhan B, Malathi A Dr Apr 2019

Improved Evolutionary Support Vector Machine Classifier For Coronary Artery Heart Disease Prediction Among Diabetic Patients, Narasimhan B, Malathi A Dr

Library Philosophy and Practice (e-journal)

Soft computing paves way many applications including medical informatics. Decision support system has gained a major attention that will aid medical practitioners to diagnose diseases. Diabetes mellitus is hereditary disease that might result in major heart disease. This research work aims to propose a soft computing mechanism named Improved Evolutionary Support Vector Machine classifier for CAHD risk prediction among diabetes patients. The attribute selection mechanism is attempted to build with the classifier in order to reduce the misclassification error rate of the conventional support vector machine classifier. Radial basis kernel function is employed in IESVM. IESVM classifier is evaluated through …


The New Legal Landscape For Text Mining And Machine Learning, Matthew Sag Jan 2019

The New Legal Landscape For Text Mining And Machine Learning, Matthew Sag

Faculty Articles

Now that the dust has settled on the Authors Guild cases, this Article takes stock of the legal context for TDM research in the United States. This reappraisal begins in Part I with an assessment of exactly what the Authors Guild cases did and did not establish with respect to the fair use status of text mining. Those cases held unambiguously that reproducing copyrighted works as one step in the process of knowledge discovery through text data mining was transformative, and thus ultimately a fair use of those works. Part I explains why those rulings followed inexorably from copyright's most …


Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang Jan 2019

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang

CSE Technical Reports

This document includes work-in-progress reports submitted to the Library of Congress as part of the Aida digital libraries research team's work on Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project. These work-in-progress reports provide a snapshot glimpse, as well as underlying rationale and decision-making, at various points in the development of the project and its machine learning explorations. Reports cover explorations on historic newspapers, minimally-processed manuscript collections, materials digitized from physical originals and those digitized from microform surrogates, and investigate challenges related to image segmentation and document zoning, classification, document image quality analysis, metadata generation, and more.


Using Chronicling America’S Images To Explore Digitized Historic Newspapers & Imagine Alternative Futures, Elizabeth Lorang, Leen-Kiat Soh Sep 2018

Using Chronicling America’S Images To Explore Digitized Historic Newspapers & Imagine Alternative Futures, Elizabeth Lorang, Leen-Kiat Soh

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation situates the work of the Aida team broadly as well as hinges this work on some very specific challenges for digital libraries. In doing so demonstrate the many types of questions and domains to be explored in digitized newspapers.


Increasing Our Vision For 21st-Century Digital Libraries, Elizabeth M. Lorang, Leen-Kiat Soh Jan 2018

Increasing Our Vision For 21st-Century Digital Libraries, Elizabeth M. Lorang, Leen-Kiat Soh

University of Nebraska-Lincoln Libraries: Conference Presentations and Speeches

This presentation

  1. Reads digital library interfaces—or their "main door" interfaces—as glimpses into what we have thus far valued in the development of digital libraries
  2. Frames a visual way of thinking about textual materials
  3. Introduces the work of our research team—where we are now, and where we're headed
  4. Draws some connections between the parts

This presentation is very much a look into thinking in process and work in progress and proposes the following ideas:

  1. As a community, we can do much more with the digital images we're creating of textual materials than we've heretofore done.
  2. We aspire to have additional layers …


Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao Jan 2018

Clinical Information Extraction From Unstructured Free-Texts, Mingzhe Tao

Legacy Theses & Dissertations (2009 - 2024)

Information extraction (IE) is a fundamental component of natural language processing (NLP) that provides a deeper understanding of the texts. In the clinical domain, documents prepared by medical experts (e.g., discharge summaries, drug labels, medical history records) contain a significant amount of clinically-relevant information that is crucial to the overall well-being of patients. Unfortunately, in many cases, clinically-relevant information is presented in an unstructured format, predominantly consisting of free-texts, making it inaccessible to computerized methods. Automatic extraction of this information can improve accessibility. However, the presence of synonymous expressions, medical acronyms, misspellings, negated phrases, and ambiguous terminologies make automatic extraction …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti May 2010

A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti

UNLV Theses, Dissertations, Professional Papers, and Capstones

Automated text categorization is a supervised learning task, defined as assigning category labels to new documents based on likelihood suggested by a training set of labeled documents. Two examples of methodology for text categorizations are Naive Bayes and K-Nearest Neighbor.

In this thesis, we implement two categorization engines based on Naive Bayes and K-Nearest Neighbor methodology. We then compare the effectiveness of these two engines by calculating standard precision and recall for a collection of documents. We will further report on time efficiency of these two engines.