Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Old Dominion University

Computer Science Faculty Publications

Series

Discipline
Keyword
Publication Year

Articles 1 - 30 of 55

Full-Text Articles in Social and Behavioral Sciences

Identifying Patterns For Neurological Disabilities By Integrating Discrete Wavelet Transform And Visualization, Soo Yeon Ji, Sampath Jayarathna, Anne M. Perrotti, Katrina Kardiasmenos, Dong Hyun Jeong Jan 2024

Identifying Patterns For Neurological Disabilities By Integrating Discrete Wavelet Transform And Visualization, Soo Yeon Ji, Sampath Jayarathna, Anne M. Perrotti, Katrina Kardiasmenos, Dong Hyun Jeong

Computer Science Faculty Publications

Neurological disabilities cause diverse health and mental challenges, impacting quality of life and imposing financial burdens on both the individuals diagnosed with these conditions and their caregivers. Abnormal brain activity, stemming from malfunctions in the human nervous system, characterizes neurological disorders. Therefore, the early identification of these abnormalities is crucial for devising suitable treatments and interventions aimed at promoting and sustaining quality of life. Electroencephalogram (EEG), a non-invasive method for monitoring brain activity, is frequently employed to detect abnormal brain activity in neurological and mental disorders. This study introduces an approach that extends the understanding and identification of neurological disabilities …


Autonomous Strike Uavs For Counterterrorism Missions: Challenges And Preliminary Solutions, Meshari Aljohani, Ravi Mukkamala, Stephan Olariu Jan 2024

Autonomous Strike Uavs For Counterterrorism Missions: Challenges And Preliminary Solutions, Meshari Aljohani, Ravi Mukkamala, Stephan Olariu

Computer Science Faculty Publications

UAVs are becoming a crucial tool in modern warfare, primarily due to their cost-effectiveness, risk reduction, and ability to perform a wider range of activities. The use of autonomous UAVs to conduct strike missions against highly valuable targets is the focus of this research. Due to developments in ledger technology, smart contracts, and machine learning, such activities formerly carried out by professionals or remotely flown UAVs are now feasible. Our study provides the first in-depth analysis of challenges and potential solutions for successful implementation of an autonomous UAV mission.


Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2024

Robots Still Outnumber Humans In Web Archives In 2019, But Less Than In 2015 And 2012, Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

The significance of the web and the crucial role of web archives in its preservation highlight the necessity of understanding how users, both human and robot, access web archive content, and how best to satisfy this disparate needs of both types of users. To identify robots and humans in web archives and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012, 2015, and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based …


Building Datasets To Support Information Extraction And Structure Parsing From Electronic Theses And Dissertations, William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox Jan 2024

Building Datasets To Support Information Extraction And Structure Parsing From Electronic Theses And Dissertations, William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox

Computer Science Faculty Publications

Despite the millions of electronic theses and dissertations (ETDs) publicly available online, digital library services for ETDs have not evolved past simple search and browse at the metadata level. We need better digital library services that allow users to discover and explore the content buried in these long documents. Recent advances in machine learning have shown promising results for decomposing documents into their constituent parts, but these models and techniques require data for training and evaluation. In this article, we present high-quality datasets to train, evaluate, and compare machine learning methods in tasks that are specifically suited to identify and …


Fair Signposting Profile, Herbert Van De Sompel, Martin Klein, Shawn Jones, Michael L. Nelson, Simeon Warner, Anusuriya Devaraju, Robert Huber, Wilko Steinhoff, Vyacheslav Tykhonov, Luc Boruta, Enno Meijers, Stian Soiland-Reyes, Mark Wilkonson May 2023

Fair Signposting Profile, Herbert Van De Sompel, Martin Klein, Shawn Jones, Michael L. Nelson, Simeon Warner, Anusuriya Devaraju, Robert Huber, Wilko Steinhoff, Vyacheslav Tykhonov, Luc Boruta, Enno Meijers, Stian Soiland-Reyes, Mark Wilkonson

Computer Science Faculty Publications

[First paragraph] This page details concrete recipes that platforms that host research outputs (e.g. data repositories, institutional repositories, publisher platforms, etc.) can follow to implement Signposting, a lightweight yet powerful approach to increase the FAIRness of scholarly objects.


Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2023

Hashes Are Not Suitable To Verify Fixity Of The Public Archived Web, Mohamed Aturban, Martin Klein, Herbert Van De Sompel, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

Web archives, such as the Internet Archive, preserve the web and allow access to prior states of web pages. We implicitly trust their versions of archived pages, but as their role moves from preserving curios of the past to facilitating present day adjudication, we are concerned with verifying the fixity of archived web pages, or mementos, to ensure they have always remained unaltered. A widely used technique in digital preservation to verify the fixity of an archived resource is to periodically compute a cryptographic hash value on a resource and then compare it with a previous hash value. If the …


Mitigating Anomalous Electricity Consumption In Smart Cities Using An Ai-Based Stacked-Generalization Technique, Arshid Ali, Laiq Khan, Nadeem Javaid, Safdar Hussain Bouk, Abdulaziz Aldegheishem, Nabil Alrahjeh Jan 2023

Mitigating Anomalous Electricity Consumption In Smart Cities Using An Ai-Based Stacked-Generalization Technique, Arshid Ali, Laiq Khan, Nadeem Javaid, Safdar Hussain Bouk, Abdulaziz Aldegheishem, Nabil Alrahjeh

Computer Science Faculty Publications

Energy management and efficient asset utilization play an important role in the economic development of a country. The electricity produced at the power station faces two types of losses from the generation point to the end user. These losses are technical losses (TL) and non-technical losses (NTL). TLs occurs due to the use of inefficient equipment. While NTLs occur due to the anomalous consumption of electricity by the customers, which happens in many ways; energy theft being one of them. Energy theft majorly happens to cut down on the electricity bills. These losses in the smart grid (SG) are the …


Deeppatent2: A Large-Scale Benchmarking Corpus For Technical Drawing Understanding, Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, Diane Oyen Jan 2023

Deeppatent2: A Large-Scale Benchmarking Corpus For Technical Drawing Understanding, Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, Diane Oyen

Computer Science Faculty Publications

Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents. The advancement of other tasks such as 3D reconstruction from 2D images requires larger datasets with multiple viewpoints. We introduce DeepPatent2, a large-scale dataset, providing more than 2.7 million …


Progenitor Cell Isolation From Mouse Epididymal Adipose Tissue And Sequencing Library Construction, Qianglin Liu, Chaoyang Li, Yuxia Li, Leshan Wang, Xujia Zhang, Buhao Deng, Peidong Gao, Mohammad Shiri, Fozi Alkaifi, Junxing Zhao, Jacqueline M. Stephens, Constantine A. Simintiras, Joseph Francis, Jiangwen Sun, Xing Fu Jan 2023

Progenitor Cell Isolation From Mouse Epididymal Adipose Tissue And Sequencing Library Construction, Qianglin Liu, Chaoyang Li, Yuxia Li, Leshan Wang, Xujia Zhang, Buhao Deng, Peidong Gao, Mohammad Shiri, Fozi Alkaifi, Junxing Zhao, Jacqueline M. Stephens, Constantine A. Simintiras, Joseph Francis, Jiangwen Sun, Xing Fu

Computer Science Faculty Publications

Here, we present a protocol to isolate progenitor cells from mouse epididymal visceral adipose tissue and construct bulk RNA and assay for transposase-accessible chromatin with sequencing (ATAC-seq) libraries. We describe steps for adipose tissue collection, cell isolation, and cell staining and sorting. We then detail procedures for both ATAC-seq and RNA sequencing library construction. This protocol can also be applied to other tissues and cell types directly or with minor modifications.

For complete details on the use and execution of this protocol, please refer to Liu et al. (2023).1

*1 Liu, Q., Li, C., Deng, B., Gao, P., …


Enabling Customization Of Discussion Forums For Blind Users, Mohan Sunkara, Yash Prakash, Hae-Na Lee, Sampath Jayarathna, Vikas Ashok Jan 2023

Enabling Customization Of Discussion Forums For Blind Users, Mohan Sunkara, Yash Prakash, Hae-Na Lee, Sampath Jayarathna, Vikas Ashok

Computer Science Faculty Publications

Online discussion forums have become an integral component of news, entertainment, information, and video-streaming websites, where people all over the world actively engage in discussions on a wide range of topics including politics, sports, music, business, health, and world affairs. Yet, little is known about their usability for blind users, who aurally interact with the forum conversations using screen reader assistive technology. In an interview study, blind users stated that they often had an arduous and frustrating interaction experience while consuming conversation threads, mainly due to the highly redundant content and the absence of customization options to selectively view portions …


Claimdistiller: Scientific Claim Extraction With Supervised Contrastive Learning, Xin Wei, Md Reshad Ul Hoque, Jian Wu, Jiang Li Jan 2023

Claimdistiller: Scientific Claim Extraction With Supervised Contrastive Learning, Xin Wei, Md Reshad Ul Hoque, Jian Wu, Jiang Li

Computer Science Faculty Publications

The growth of scientific papers in the past decades calls for effective claim extraction tools to automatically and accurately locate key claims from unstructured text. Such claims will benefit content-wise aggregated exploration of scientific knowledge beyond the metadata level. One challenge of building such a model is how to effectively use limited labeled training data. In this paper, we compared transfer learning and contrastive learning frameworks in terms of performance, time and training data size. We found contrastive learning has better performance at a lower cost of data across all models. Our contrastive-learning-based model ClaimDistiller has the highest performance, boosting …


Eye Movement And Pupil Measures: A Review, Bhanuka Mahanama, Yasith Jayawardana, Sundararaman Rengarajan, Gavindya Jayawardena, Leanne Chukoskie, Joseph Snider, Sampath Jayarathna Jan 2022

Eye Movement And Pupil Measures: A Review, Bhanuka Mahanama, Yasith Jayawardana, Sundararaman Rengarajan, Gavindya Jayawardena, Leanne Chukoskie, Joseph Snider, Sampath Jayarathna

Computer Science Faculty Publications

Our subjective visual experiences involve complex interaction between our eyes, our brain, and the surrounding world. It gives us the sense of sight, color, stereopsis, distance, pattern recognition, motor coordination, and more. The increasing ubiquity of gaze-aware technology brings with it the ability to track gaze and pupil measures with varying degrees of fidelity. With this in mind, a review that considers the various gaze measures becomes increasingly relevant, especially considering our ability to make sense of these signals given different spatio-temporal sampling capacities. In this paper, we selectively review prior work on eye movements and pupil measures. We first …


Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna Jan 2022

Streaminghub: Interactive Stream Analysis Workflows, Yasith Jayawardana, Vikas G. Ashok, Sampath Jayarathna

Computer Science Faculty Publications

Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to transmit informative metadata alongside data may allow such workflows to intelligently consume data, propagate metadata to downstream tasks, and thereby auto-generate reusable, reproducible analytic outputs with zero supervision. Moreover, a visual programming interface to design, develop, and execute such workflows may allow rapid prototyping for interdisciplinary research. Capitalizing on these ideas, we propose StreamingHub, a framework to build metadata propagating, interactive stream analysis workflows using visual programming. We conduct …


D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel Jan 2022

D-Lib Magazine Pioneered Web-Based Scholarly Communication, Michael L. Nelson, Herbert Van De Sompel

Computer Science Faculty Publications

The web began with a vision of, as stated by Tim Berners-Lee in 1991, “that much academic information should be freely available to anyone”. For many years, the development of the web and the development of digital libraries and other scholarly communications infrastructure proceeded in tandem. A milestone occurred in July, 1995, when the first issue of D-Lib Magazine was published as an online, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as …


Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles Jan 2022

Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles

Computer Science Faculty Publications

Recently, the Allen Institute for Artificial Intelligence released the Semantic Scholar Open Research Corpus (S2ORC), one of the largest open-access scholarly big datasets with more than 130 million scholarly paper records. S2ORC contains a significant portion of automatically generated metadata. The metadata quality could impact downstream tasks such as citation analysis, citation prediction, and link analysis. In this project, we assess the document linking quality and estimate the document conflation rate for the S2ORC dataset. Using semi-automatically curated ground truth corpora, we estimated that the overall document linking quality is high, with 92.6% of documents correctly linking to six major …


A Synthetic Prediction Market For Estimating Confidence In Published Work, Sarah Rajtmajer, Christopher Griffin, Jian Wu, Robert Fraleigh, Laxmann Balaji, Anna Squicciarini, Anthony Kwasnica, David Pennock, Michael Mclaughlin, Timothy Fritton, Nishanth Nakshatri, Arjun Menon, Sai Ajay Modukuri, Rajal Nivargi, Xin Wei, Lee Giles Jan 2022

A Synthetic Prediction Market For Estimating Confidence In Published Work, Sarah Rajtmajer, Christopher Griffin, Jian Wu, Robert Fraleigh, Laxmann Balaji, Anna Squicciarini, Anthony Kwasnica, David Pennock, Michael Mclaughlin, Timothy Fritton, Nishanth Nakshatri, Arjun Menon, Sai Ajay Modukuri, Rajal Nivargi, Xin Wei, Lee Giles

Computer Science Faculty Publications

[First paragraph] Concerns about the replicability, robustness and reproducibility of findings in scientific literature have gained widespread attention over the last decade in the social sciences and beyond. This attention has been catalyzed by and has likewise motivated a number of large-scale replication projects which have reported successful replication rates between 36% and 78%. Given the challenges and resources required to run high-powered replication studies, researchers have sought other approaches to assess confidence in published claims. Initial evidence has supported the promise of prediction markets in this context. However, they require the coordinated, sustained effort of collections of human experts …


Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu Jan 2022

Theory Entity Extraction For Social And Behavioral Sciences Papers Using Distant Supervision, Xin Wei, Lamia Salsabil, Jian Wu

Computer Science Faculty Publications

Theories and models, which are common in scientific papers in almost all domains, usually provide the foundations of theoretical analysis and experiments. Understanding the use of theories and models can shed light on the credibility and reproducibility of research works. Compared with metadata, such as title, author, keywords, etc., theory extraction in scientific literature is rarely explored, especially for social and behavioral science (SBS) domains. One challenge of applying supervised learning methods is the lack of a large number of labeled samples for training. In this paper, we propose an automated framework based on distant supervision that leverages entity mentions …


The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson Jan 2022

The Dsa Toolkit Shines Light Into Dark And Stormy Archives, Shawn Morgan Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Klein Martin, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configured to revisit the same original resource multiple times. This is incredibly useful for understanding an unfolding news story or the evolution of an organization. Unfortunately, over time, some of these original resources can go off-topic and no longer suit the purpose for which the collection was originally created. They can go off-topic due to web site …


Smart Parking Systems: Reviewing The Literature, Architecture And Ways Forward, Can Biyik, Zaheer Allam, Gabriele Pieri, Davide Moroni, Muftah O' Fraifer, Eoin O' Connell, Stephan Olariu, Muhammad Khalid Jan 2021

Smart Parking Systems: Reviewing The Literature, Architecture And Ways Forward, Can Biyik, Zaheer Allam, Gabriele Pieri, Davide Moroni, Muftah O' Fraifer, Eoin O' Connell, Stephan Olariu, Muhammad Khalid

Computer Science Faculty Publications

The Internet of Things (IoT) has come of age, and complex solutions can now be implemented seamlessly within urban governance and management frameworks and processes. For cities, growing rates of car ownership are rendering parking availability a challenge and lowering the quality of life through increased carbon emissions. The development of smart parking solutions is thus necessary to reduce the time spent looking for parking and to reduce greenhouse gas emissions. The principal role of this research paper is to analyze smart parking solutions from a technical perspective, underlining the systems and sensors that are available, as documented in the …


A Survey Of Enabling Technologies For Smart Communities, Amna Iqbal, Stephan Olariu Jan 2021

A Survey Of Enabling Technologies For Smart Communities, Amna Iqbal, Stephan Olariu

Computer Science Faculty Publications

In 2016, the Japanese Government publicized an initiative and a call to action for the implementation of a "Super Smart Society" announced as Society 5.0. The stated goal of Society 5.0 is to meet the various needs of the members of society through the provisioning of goods and services to those who require them, when they are required and in the amount required, thus enabling the citizens to live an active and comfortable life. In spite of its genuine appeal, details of a feasible path to Society 5.0 are conspicuously missing. The first main goal of this survey is to …


Understanding The Impact Of Encrypted Dns On Internet Censorship, Lin Jin, Shuai Hao, Haining Wang, Chase Cotton Jan 2021

Understanding The Impact Of Encrypted Dns On Internet Censorship, Lin Jin, Shuai Hao, Haining Wang, Chase Cotton

Computer Science Faculty Publications

DNS traffic is transmitted in plaintext, resulting in privacy leakage. To combat this problem, secure protocols have been used to encrypt DNS messages. Existing studies have investigated the performance overhead and privacy benefits of encrypted DNS communications, yet little has been done from the perspective of censorship. In this paper, we study the impact of the encrypted DNS on Internet censorship in two aspects. On one hand, we explore the severity of DNS manipulation, which could be leveraged for Internet censorship, given the use of encrypted DNS resolvers. In particular, we perform 7.4 million DNS lookup measurements on 3,813 DoT …


Detecting Incentivized Review Groups With Co-Review Graph, Yubao Zhang, Shuai Hao, Haining Wang Jan 2021

Detecting Incentivized Review Groups With Co-Review Graph, Yubao Zhang, Shuai Hao, Haining Wang

Computer Science Faculty Publications

Online reviews play a crucial role in the ecosystem of nowadays business (especially e-commerce platforms), and have become the primary source of consumer opinions. To manipulate consumers’ opinions, some sellers of e-commerce platforms outsource opinion spamming with incentives (e.g., free products) in exchange for incentivized reviews. As incentives, by nature, are likely to drive more biased reviews or even fake reviews. Despite e-commerce platforms such as Amazon have taken initiatives to squash the incentivized review practice, sellers turn to various social networking platforms (e.g., Facebook) to outsource the incentivized reviews. The aggregation of sellers who …


Recognizing Figure Labels In Patents, Ming Gong, Xin Wei, Diane Oyen, Jian Wu, Martin Gryder Jan 2021

Recognizing Figure Labels In Patents, Ming Gong, Xin Wei, Diane Oyen, Jian Wu, Martin Gryder

Computer Science Faculty Publications

Scientific documents often contain significant information in figures. The United States Patent and Trademark Office (USPTO) awards thousands of patents each week, with each patent containing on the order of a dozen figures. The information conveyed by these figures typically include a drawing or diagram, a label, caption and reference text within the document. Yet associating the short bits of text to the figure is challenging when labels are embedded within the figure, as they typically are in patents. Using patents as a testbench, this paper highlights an open challenge in analyzing all of the information presented in scientific/technical documents …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


Vehicular Crowdsourcing For Congestion Support In Smart Cities, Stephan Olariu Jan 2021

Vehicular Crowdsourcing For Congestion Support In Smart Cities, Stephan Olariu

Computer Science Faculty Publications

Under present-day practices, the vehicles on our roadways and city streets are mere spectators that witness traffic-related events without being able to participate in the mitigation of their effect. This paper lays the theoretical foundations of a framework for harnessing the on-board computational resources in vehicles stuck in urban congestion in order to assist transportation agencies with preventing or dissipating congestion through large-scale signal re-timing. Our framework is called VACCS: Vehicular Crowdsourcing for Congestion Support in Smart Cities. What makes this framework unique is that we suggest that in such situations the vehicles have the potential to cooperate with various …


Large Scale Subject Category Classification Of Scholarly Papers With Deep Attentive Neural Networks, Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C. Lee Giles Jan 2021

Large Scale Subject Category Classification Of Scholarly Papers With Deep Attentive Neural Networks, Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

Subject categories of scholarly papers generally refer to the knowledge domain(s) to which the papers belong, examples being computer science or physics. Subject category classification is a prerequisite for bibliometric studies, organizing scientific publications for domain knowledge extraction, and facilitating faceted searches for digital library search engines. Unfortunately, many academic papers do not have such information as part of their metadata. Most existing methods for solving this task focus on unsupervised learning that often relies on citation networks. However, a complete list of papers citing the current paper may not be readily available. In particular, new papers that have few …


Ssentiaa: A Self-Supervised Sentiment Analyzer For Classification From Unlabeled Data, Salim Sazzed, Sampath Jayarathna Jan 2021

Ssentiaa: A Self-Supervised Sentiment Analyzer For Classification From Unlabeled Data, Salim Sazzed, Sampath Jayarathna

Computer Science Faculty Publications

In recent years, supervised machine learning (ML) methods have realized remarkable performance gains for sentiment classification utilizing labeled data. However, labeled data are usually expensive to obtain, thus, not always achievable. When annotated data are unavailable, the unsupervised tools are exercised, which still lag behind the performance of supervised ML methods by a large margin. Therefore, in this work, we focus on improving the performance of sentiment classification from unlabeled data. We present a self-supervised hybrid methodology SSentiA (Self-supervised Sentiment Analyzer) that couples an ML classifier with a lexicon-based method for sentiment classification from unlabeled data. We first introduce LRSentiA …


Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles Jan 2021

Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

Presentation slides generated from original research papers provide an efficient form to present research innovations. Manually generating presentation slides is labor-intensive. We propose a method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures the importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods …


Systematizing Confidence In Open Research And Evidence (Score), Nazanin Alipourfard, Beatrix Arendt, Daniel M. Benjamin, Noam Benkler, Michael Bishop, Mark Burstein, Martin Bush, James Caverlee, Yiling Chen, Chae Clark, Anna Dreber Almenberg, Timothy M. Errington, Fiona Fidler, Nicholas Fox, Aaron Frank, Hannah Fraser, Scott Friedman, Ben Gelman, James Gentile, Jian Wu, Et Al., Score Collaboration Jan 2021

Systematizing Confidence In Open Research And Evidence (Score), Nazanin Alipourfard, Beatrix Arendt, Daniel M. Benjamin, Noam Benkler, Michael Bishop, Mark Burstein, Martin Bush, James Caverlee, Yiling Chen, Chae Clark, Anna Dreber Almenberg, Timothy M. Errington, Fiona Fidler, Nicholas Fox, Aaron Frank, Hannah Fraser, Scott Friedman, Ben Gelman, James Gentile, Jian Wu, Et Al., Score Collaboration

Computer Science Faculty Publications

Assessing the credibility of research claims is a central, continuous, and laborious part of the scientific process. Credibility assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts. Such assessments can require substantial time and effort. Research progress could be accelerated if there were rapid, scalable, accurate credibility indicators to guide attention and resource allocation for further assessment. The SCORE program is creating and validating algorithms to provide confidence scores for research claims at scale. To investigate the viability of scalable tools, teams are creating: a database of claims from papers in the social and behavioral …


A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox Jan 2020

A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox

Computer Science Faculty Publications

Extracting metadata from scholarly papers is an important text mining problem. Widely used open-source tools such as GROBID are designed for born-digital scholarly papers but often fail for scanned documents, such as Electronic Theses and Dissertations (ETDs). Here we present a preliminary baseline work with a heuristic model to extract metadata from the cover pages of scanned ETDs. The process started with converting scanned pages into images and then text files by applying OCR tools. Then a series of carefully designed regular expressions for each field is applied, capturing patterns for seven metadata fields: titles, authors, years, degrees, academic programs, …