Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Deepfrag-K: A Fragment-Based Deep Learning Approach For Protein Fold Recognition, Wessam Elhefnawy, Min Li, Jianxin Wang, Yaohang Li Nov 2020

Deepfrag-K: A Fragment-Based Deep Learning Approach For Protein Fold Recognition, Wessam Elhefnawy, Min Li, Jianxin Wang, Yaohang Li

Computer Science Faculty Publications

Background: One of the most essential problems in structural bioinformatics is protein fold recognition. In this paper, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features at fragment level to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multi-modal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolutional neural network (CNN) to classify the fragment vector into the corresponding fold.

Results: Our results show that DeepFrag-k yields …


A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox Jan 2020

A Heuristic Baseline Method For Metadata Extraction From Scanned Electronic Theses And Dissertations, Muntabir H. Choudhury, Jian Wu, William A. Ingam, Edward A. Fox

Computer Science Faculty Publications

Extracting metadata from scholarly papers is an important text mining problem. Widely used open-source tools such as GROBID are designed for born-digital scholarly papers but often fail for scanned documents, such as Electronic Theses and Dissertations (ETDs). Here we present a preliminary baseline work with a heuristic model to extract metadata from the cover pages of scanned ETDs. The process started with converting scanned pages into images and then text files by applying OCR tools. Then a series of carefully designed regular expressions for each field is applied, capturing patterns for seven metadata fields: titles, authors, years, degrees, academic programs, …


A Saliency-Driven Video Magnifier For People With Low Vision, Ali Selman Aydin, Shirin Feiz, Iv Ramakrishnan, Vikas Ashok Jan 2020

A Saliency-Driven Video Magnifier For People With Low Vision, Ali Selman Aydin, Shirin Feiz, Iv Ramakrishnan, Vikas Ashok

Computer Science Faculty Publications

Consuming video content poses significant challenges for many screen magnifier users, which is the “go to” assistive technology for people with low vision. While screen magnifier software could be used to achieve a zoom factor that would make the content of the video visible to low-vision users, it is oftentimes a major challenge for these users to navigate through videos. Towards making videos more accessible for low-vision users, we have developed the SViM video magnifier system [6]. Specifically, SViM consists of three different magnifier interfaces with easy-to-use means of interactions. All three interfaces are driven by visual saliency as a …


Streaming Analytics And Workflow Automation For Dfs, Yasith Jayawardana, Sampath Jayarathna Jan 2020

Streaming Analytics And Workflow Automation For Dfs, Yasith Jayawardana, Sampath Jayarathna

Computer Science Faculty Publications

Researchers reuse data from past studies to avoid costly re-collection of experimental data. However, large-scale data reuse is challenging due to lack of consensus on metadata representations among research groups and disciplines. Dataset File System (DFS) is a semi-structured data description format that promotes such consensus by standardizing the semantics of data description, storage, and retrieval. In this paper, we present analytic-streams – a specification for streaming data analytics with DFS, and streaming-hub – a visual programming toolkit built on DFS to simplify data analysis work-flows. Analytic-streams facilitate higher-order data analysis with less computational overhead, while streaming-hub enables storage, retrieval, …


Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Shari- An Integration Of Tools To Visualize The Story Of The Day, Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

Tools such as google news and flipboard exist to convey daily news, but what about the news of the past? In this paper, we describe how to combine several existing tools and web archive holdings to convey the “biggest story” for a given date in the past. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by Story-Graph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then takes the output of these tools to produce a …


Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson Jan 2020

Mementoembed And Raintale For Web Archive Storytelling, Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

Computer Science Faculty Publications

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as cards consisting of text snippets, titles, and images. Web storytelling is a popular method for grouping these cards in order to summarize a topic. Unfortunately, social media platforms are not archive-aware and fail to consistently create a good experience for …


Tmvis: Visualizing Webpage Changes Over Time, Abigail Mabe, Dhruv Patel, Maheedhar Gunnam, Surbhi Shankar, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle Jan 2020

Tmvis: Visualizing Webpage Changes Over Time, Abigail Mabe, Dhruv Patel, Maheedhar Gunnam, Surbhi Shankar, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Computer Science Faculty Publications

TMVis is a web service to provide visualizations of how individual webpages have changed over time. We leverage past research on summarizing collections of webpages with thumbnail-sized screenshots and on choosing a small number of representative archived webpages from a large collection. We offer four visualizations: Image Grid, Image Slider, Timeline, and Animated GIF. Embed codes for the Image Grid and Image Slider can be produced to include these visualizations on separate webpages. This tool can be used to allow scholars from various disciplines, as well as the general public, to explore the temporal nature of webpages.


Repurposing Visual Input Modalities For Blind Users: A Case Study Of Word Processors, Hae-Na Lee, Vikas Ashok, I.V. Ramakrishnan Jan 2020

Repurposing Visual Input Modalities For Blind Users: A Case Study Of Word Processors, Hae-Na Lee, Vikas Ashok, I.V. Ramakrishnan

Computer Science Faculty Publications

Visual 'point-and-click' interaction artifacts such as mouse and touchpad are tangible input modalities, which are essential for sighted users to conveniently interact with computer applications. In contrast, blind users are unable to leverage these visual input modalities and are thus limited while interacting with computers using a sequentially narrating screen-reader assistive technology that is coupled to keyboards. As a consequence, blind users generally require significantly more time and effort to do even simple application tasks (e.g., applying a style to text in a word processor) using only keyboard, compared to their sighted peers who can effortlessly accomplish the same tasks …


Towards Making Videos Accessible For Low Vision Screen Magnifier Users, Ali Selman Aydin, Shirin Feiz, Vikas Ashok, Iv Ramakrishnan Jan 2020

Towards Making Videos Accessible For Low Vision Screen Magnifier Users, Ali Selman Aydin, Shirin Feiz, Vikas Ashok, Iv Ramakrishnan

Computer Science Faculty Publications

People with low vision who use screen magnifiers to interact with computing devices find it very challenging to interact with dynamically changing digital content such as videos, since they do not have the luxury of time to manually move, i.e., pan the magnifier lens to different regions of interest (ROIs) or zoom into these ROIs before the content changes across frames.

In this paper, we present SViM, a first of its kind screen-magnifier interface for such users that leverages advances in computer vision, particularly video saliency models, to identify salient ROIs in videos. SViM's interface allows users to zoom in/out …


Sail: Saliency-Driven Injection Of Aria Landmarks, Ali Selman Aydin, Shirin Feiz, Vikas Ashok, Iv Ramakrishnan Jan 2020

Sail: Saliency-Driven Injection Of Aria Landmarks, Ali Selman Aydin, Shirin Feiz, Vikas Ashok, Iv Ramakrishnan

Computer Science Faculty Publications

Navigating webpages with screen readers is a challenge even with recent improvements in screen reader technologies and the increased adoption of web standards for accessibility, namely ARIA. ARIA landmarks, an important aspect of ARIA, lets screen reader users access different sections of the webpage quickly, by enabling them to skip over blocks of irrelevant or redundant content. However, these landmarks are sporadically and inconsistently used by web developers, and in many cases, even absent in numerous web pages. Therefore, we propose SaIL, a scalable approach that automatically detects the important sections of a web page, and then injects ARIA landmarks …


Psu At Clef-2020 Arqmath Track: Unsupervised Re-Ranking Using Pretraining, Shaurya Rohatgi, Jian Wu, C. Lee Giles Jan 2020

Psu At Clef-2020 Arqmath Track: Unsupervised Re-Ranking Using Pretraining, Shaurya Rohatgi, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

This paper elaborates on our submission to the ARQMath track at CLEF 2020. Our primary run for the main Task-1: Question Answering uses a two-stage retrieval technique in which the first stage is a fusion of traditional BM25 scoring and tf-idf with cosine similarity-based retrieval while the second stage is a finer re-ranking technique using contextualized embeddings. For the re-ranking we use a pre-trained robertabase model (110 million parameters) to make the language model more math-aware. Our approach achieves a higher NDCG0 score than the baseline, while our MAP and P@10 scores are competitive, performing better than the best submission …


Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu Jan 2020

Smartcitecon: Implicit Citation Context Extraction From Academic Literature Using Unsupervised Learning, Chenrui Gao, Haoran Cui, Li Zhang, Jiamin Wang, Wei Lu, Jian Wu

Computer Science Faculty Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers in the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a …


Acknowledgement Entity Recognition In Cord-19 Papers, Jian Wu, Pei Wang, Xin Wei, Sarah Rajtmajer, C. Lee Giles, Christopher Griffin Jan 2020

Acknowledgement Entity Recognition In Cord-19 Papers, Jian Wu, Pei Wang, Xin Wei, Sarah Rajtmajer, C. Lee Giles, Christopher Griffin

Computer Science Faculty Publications

Acknowledgements are ubiquitous in scholarly papers. Existing acknowledgement entity recognition methods assume all named entities are acknowledged. Here, we examine the nuances between acknowledged and named entities by analyzing sentence structure. We develop an acknowledgement extraction system, AckExtract based on open-source text mining software and evaluate our method using manually labeled data. AckExtract uses the PDF of a scholarly paper as input and outputs acknowledgement entities. Results show an overall performance of F1=0.92. We built a supplementary database by linking CORD-19 papers with acknowledgement entities extracted by AckExtract including persons and organizations and find that only up to …


Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano Jan 2020

Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano

Computer Science Faculty Publications

Background: As more protein atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. Methods: We applied a histogram-based outlier score (HBOS) to six sets of cryo-EM atomic structures and five sets of X-ray atomic structures, including one derived from X-ray data with better than 1.5 Å resolution. Cryo-EM data sets contain structures released by December 2016 and those released between 2017 and 2019, derived from resolution ranges 0–4 Å and 4–6 Å respectively. Results: The distribution of HBOS values in five sets of X-ray structures show that HBOS is sensitive distinguishing …


Smart Communities: From Sensors To Internet Of Things And To A Marketplace Of Services, Stephan Olariu, Nirwan Ansari (Editor), Andreas Ahrens (Editor), Cesar Benavente-Preces (Editor) Jan 2020

Smart Communities: From Sensors To Internet Of Things And To A Marketplace Of Services, Stephan Olariu, Nirwan Ansari (Editor), Andreas Ahrens (Editor), Cesar Benavente-Preces (Editor)

Computer Science Faculty Publications

Our paper was inspired by the recent Society 5.0 initiative of the Japanese Government that seeks to create a sustainable human-centric society by putting to work recent advances in technology: sensor networks, edge computing, IoT ecosystems, AI, Big Data, robotics, to name just a few. The main contribution of this work is a vision of how these technological advances can contribute, directly or indirectly, to making Society 5.0 reality. For this purpose we build on a recently-proposed concept of Marketplace of Services that, in our view, will turn out to be one of the cornerstones of Society 5.0. Instead of …


Rotate-And-Press: A Non-Visual Alternative To Point-And-Click, Hae-Na Lee, Vikas Ashok, I. V. Ramakrishnan Jan 2020

Rotate-And-Press: A Non-Visual Alternative To Point-And-Click, Hae-Na Lee, Vikas Ashok, I. V. Ramakrishnan

Computer Science Faculty Publications

Most computer applications manifest visually rich and dense graphical user interfaces (GUIs) that are primarily tailored for an easy-and-efficient sighted interaction using a combination of two default input modalities, namely the keyboard and the mouse/touchpad. However, blind screen-reader users predominantly rely only on keyboard, and therefore struggle to interact with these applications, since it is both arduous and tedious to perform the visual 'point-and-click' tasks such as accessing the various application commands/features using just keyboard shortcuts supported by screen readers.

In this paper, we investigate the suitability of a 'rotate-and-press' input modality as an effective non-visual substitute for the visual …


A Genome-Wide Association Study Of Cocaine Use Disorder Accounting For Phenotypic Heterogeneity And Gene–Environment Interaction, Jiangwen Sun, Henry R. Kranzler, Joel Gelernter, Jinbo Bi Jan 2020

A Genome-Wide Association Study Of Cocaine Use Disorder Accounting For Phenotypic Heterogeneity And Gene–Environment Interaction, Jiangwen Sun, Henry R. Kranzler, Joel Gelernter, Jinbo Bi

Computer Science Faculty Publications

Background: Phenotypic heterogeneity and complicated gene-environment interplay in etiology are among the primary factors that hinder the identification of genetic variants associated with cocaine use disorder. Methods: To detect novel genetic variants associated with cocaine use disorder, we derived disease traits with reduced phenotypic heterogeneity using cluster analysis of a study sample (n = 9965). We then used these traits in genome-wide association tests, performed separately for 2070 African Americans and 1570 European Americans, using a new mixed model that accounted for the moderating effects of 5 childhood environmental factors. We used an independent sample (918 African Americans, 1382 European …


Gabapentin Drug Misuse Signals: A Pharmacovigilance Assessment Using The Fda Adverse Event Reporting System, Rachel Vickers-Smith, Jiangwen Sun, Richard J. Charnigo, Michelle R. Lofwall, Sharon L. Walsh, Jennifer R. Havens Jan 2020

Gabapentin Drug Misuse Signals: A Pharmacovigilance Assessment Using The Fda Adverse Event Reporting System, Rachel Vickers-Smith, Jiangwen Sun, Richard J. Charnigo, Michelle R. Lofwall, Sharon L. Walsh, Jennifer R. Havens

Computer Science Faculty Publications

Background: Although there have been increasing reports of intentional gabapentin misuse, epidemiological evidence for the phenomenon is limited. The purpose of this study was to determine whether there are pharmacovigilance abuse signals for gabapentin.

Methods: Using FDA Adverse Events Reporting System reports from January 1, 2005 to December 31, 2015, we calculated pharmacovigilance signal measures (i.e., reporting odds ratio, proportional reporting ratio, information component, and empirical Bayes geometric mean) for abuse-related adverse event (AR-AE)-gabapentin pairs. Loglinear modeling assessed the frequency of concurrent reporting of abuse-related and abusespecific AEs (AS-AEs) associated with gabapentin. Findings were compared to a positive (pregabalin) and …


Extreme Ultraviolet Quasar Colours From Galex Observations Of The Sdss Dr14q Catalogue, Daniel E. Vanden Berk, Sarah C. Wesolowski, Mary J. Yeckley, Joseph M. Marcinik, Jean M. Quashnock, Lawrence M. Machia, Jian Wu Jan 2020

Extreme Ultraviolet Quasar Colours From Galex Observations Of The Sdss Dr14q Catalogue, Daniel E. Vanden Berk, Sarah C. Wesolowski, Mary J. Yeckley, Joseph M. Marcinik, Jean M. Quashnock, Lawrence M. Machia, Jian Wu

Computer Science Faculty Publications

The rest-frame far to extreme ultraviolet (UV) colour–redshift relationship has been constructed from data on over 480,000 quasars carefully cross-matched between SDSS Data Release 14 and the final GALEX photometric catalogue. UV matching and detection probabilities are given for all the quasars, including dependencies on separation, optical brightness, and redshift. Detection limits are also provided for all objects. The UV colour distributions are skewed redward at virtually all redshifts, especially when detection limits are accounted for. The median GALEX far-UV minus near-UV (FUV − NUV) colour–redshift relation is reliably determined up to z ≈ 2.8, corresponding to rest-frame wavelengths as …


Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu Jan 2020

Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu

Computer Science Faculty Publications

Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. …