Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 60

Full-Text Articles in Physical Sciences and Mathematics

The Maritime Domain Awareness Center– A Human-Centered Design Approach, Gary Gomez Nov 2021

The Maritime Domain Awareness Center– A Human-Centered Design Approach, Gary Gomez

Political Science & Geography Faculty Publications

This paper contends that Maritime Domain Awareness Center (MDAC) design should be a holistic approach integrating established knowledge about human factors, decision making, cognitive tasks, complexity science, and human information interaction. The design effort should not be primarily a technology effort that focuses on computer screens, information feeds, display technologies, or user interfaces. The existence of a room with access to vast amounts of information and wall-to-wall video screens of ships, aircraft, weather data, and other regional information does not necessarily correlate to possessing situation awareness. Fundamental principles of human-centered information design should guide MDAC design and technology selection, and …


A Look Into Increasing The Number Of Veterans And Former Government Employees Converting To Career And Technical Cybersecurity Teachers, Vukica M. Jovanovic, Michael Anthony Crespo, Drew E. Brown, Deborah Marshall, Otilia Popescu, Murat Kuzlu, Petros J. Katsioloudis, Linda Vahala Jul 2021

A Look Into Increasing The Number Of Veterans And Former Government Employees Converting To Career And Technical Cybersecurity Teachers, Vukica M. Jovanovic, Michael Anthony Crespo, Drew E. Brown, Deborah Marshall, Otilia Popescu, Murat Kuzlu, Petros J. Katsioloudis, Linda Vahala

Engineering Technology Faculty Publications

The current state of technology with recent explosions in the digital processing of paperwork, computer networking use, and online and virtual approaches to areas, which until very recently had traditional and non-computerized ways of operating, led to a steady increase in the demand for jobs in the area of computer science and cybersecurity. The education system, the pipeline for the incoming workforce, needs to keep up with this tremendous pace in technology and the job market. The current K-12 school system has been extensively challenged to fill out necessary positions in order to address the increasing need for programs that …


Wg2An: Synthetic Wound Image Generation Using Generative Adversarial Network, Salih Sarp, Murat Kuzlu, Emmanuel Wilson, Ozgur Guler Mar 2021

Wg2An: Synthetic Wound Image Generation Using Generative Adversarial Network, Salih Sarp, Murat Kuzlu, Emmanuel Wilson, Ozgur Guler

Engineering Technology Faculty Publications

In part due to its ability to mimic any data distribution, Generative Adversarial Network (GAN) algorithms have been successfully applied to many applications, such as data augmentation, text-to-image translation, image-to-image translation, and image inpainting. Learning from data without crafting loss functions for each application provides broader applicability of the GAN algorithm. Medical image synthesis is also another field that the GAN algorithm has great potential to assist clinician training. This paper proposes a synthetic wound image generation model based on GAN architecture to increase the quality of clinical training. The proposed model is trained on chronic wound datasets with various …


Statistical Analysis And Comparison Of Optical Classification Of Atmospheric Aerosol Lidar Data, Mohammed Alqawba, Norou Diawara, Kwasi G. Afrifa, Mohamed I. Elbakary, Mecit Cetin, Khan Iftekharuddin Feb 2021

Statistical Analysis And Comparison Of Optical Classification Of Atmospheric Aerosol Lidar Data, Mohammed Alqawba, Norou Diawara, Kwasi G. Afrifa, Mohamed I. Elbakary, Mecit Cetin, Khan Iftekharuddin

Mathematics & Statistics Faculty Publications

In this article, we present a new study for the analysis and classification of atmospheric aerosols in remote sensing LIDAR data. Information on particle size and associated properties are extracted from these remote sensing atmospheric data which are collected by a ground-based LIDAR system. This study first considers optical LIDAR parameter-based classification methods for clustering and classification of different types of harmful aerosol particles in the atmosphere. Since accurate methods for aerosol prediction behaviors are based upon observed data, computational approaches must overcome design limitations, and consider appropriate calibration and estimation accuracy. Consequently, two statistical methods based on generalized linear …


Role Of Artificial Intelligence In The Internet Of Things (Iot) Cybersecurity, Murat Kuzlu, Corinne Fair, Ozgur Guler Feb 2021

Role Of Artificial Intelligence In The Internet Of Things (Iot) Cybersecurity, Murat Kuzlu, Corinne Fair, Ozgur Guler

Engineering Technology Faculty Publications

In recent years, the use of the Internet of Things (IoT) has increased exponentially, and cybersecurity concerns have increased along with it. On the cutting edge of cybersecurity is Artificial Intelligence (AI), which is used for the development of complex algorithms to protect networks and systems, including IoT systems. However, cyber-attackers have figured out how to exploit AI and have even begun to use adversarial AI in order to carry out cybersecurity attacks. This review paper compiles information from several other surveys and research papers regarding IoT, AI, and attacks with and against AI and explores the relationship between these …


Internet-Of-Things Devices In Support Of The Development Of Echoic Skills Among Children With Autism Spectrum Disorder, Krzysztof J. Rechowicz, John B. Stull, Michelle M. Hascall, Saikou Y. Diallo, Kevin J. O'Brien Jan 2021

Internet-Of-Things Devices In Support Of The Development Of Echoic Skills Among Children With Autism Spectrum Disorder, Krzysztof J. Rechowicz, John B. Stull, Michelle M. Hascall, Saikou Y. Diallo, Kevin J. O'Brien

VMASC Publications

A significant therapeutic challenge for people with disabilities is the development of verbal and echoic skills. Digital voice assistants (DVAs), such as Amazon’s Alexa, provide networked intelligence to billions of Internet-of-Things devices and have the potential to offer opportunities to people, such as those diagnosed with autism spectrum disorder (ASD), to advance these necessary skills. Voice interfaces can enable children with ASD to practice such skills at home; however, it remains unclear whether DVAs can be as proficient as therapists in recognizing utterances by a developing speaker. We developed an Alexa-based skill called ASPECT to measure how well the DVA …


Methods For Weighting Decisions To Assist Modelers And Decision Analysts: A Review Of Ratio Assignment And Approximate Techniques, Barry Ezell, Christopher J. Lynch, Patrick T. Hester Jan 2021

Methods For Weighting Decisions To Assist Modelers And Decision Analysts: A Review Of Ratio Assignment And Approximate Techniques, Barry Ezell, Christopher J. Lynch, Patrick T. Hester

VMASC Publications

Computational models and simulations often involve representations of decision-making processes. Numerous methods exist for representing decision-making at varied resolution levels based on the objectives of the simulation and the desired level of fidelity for validation. Decision making relies on the type of decision and the criteria that is appropriate for making the decision; therefore, decision makers can reach unique decisions that meet their own needs given the same information. Accounting for personalized weighting scales can help to reflect a more realistic state for a modeled system. To this end, this article reviews and summarizes eight multi-criteria decision analysis (MCDA) techniques …


Simulation For Cybersecurity: State Of The Art And Future Directions, Hamdi Kavak, Jose J. Padilla, Daniele Vernon-Bido, Saikou Y. Diallo, Ross Gore, Sachin Shetty Jan 2021

Simulation For Cybersecurity: State Of The Art And Future Directions, Hamdi Kavak, Jose J. Padilla, Daniele Vernon-Bido, Saikou Y. Diallo, Ross Gore, Sachin Shetty

VMASC Publications

In this article, we provide an introduction to simulation for cybersecurity and focus on three themes: (1) an overview of the cybersecurity domain; (2) a summary of notable simulation research efforts for cybersecurity; and (3) a proposed way forward on how simulations could broaden cybersecurity efforts. The overview of cybersecurity provides readers with a foundational perspective of cybersecurity in the light of targets, threats, and preventive measures. The simulation research section details the current role that simulation plays in cybersecurity, which mainly falls on representative environment building; test, evaluate, and explore; training and exercises; risk analysis and assessment; and humans …


Human Factors, Ergonomics And Industry 4.0 In The Oil & Gas Industry: A Bibliometric Analysis, Francesco Longo, Antonio Padovano, Lucia Gazzaneo, Jessica Frangella, Rafael Diaz Jan 2021

Human Factors, Ergonomics And Industry 4.0 In The Oil & Gas Industry: A Bibliometric Analysis, Francesco Longo, Antonio Padovano, Lucia Gazzaneo, Jessica Frangella, Rafael Diaz

VMASC Publications

Over the last few years, the Human Factors and Ergonomics (HF/E) discipline has significantly benefited from new human-centric engineered digital solutions of the 4.0 industrial age. Technologies are creating new socio-technical interactions between human and machine that minimize the risk of design-induced human errors and have largely contributed to remarkable improvements in terms of process safety, productivity, quality, and workers’ well-being. However, despite the Oil&Gas (O&G) sector is one of the most hazardous environments where human error can have severe consequences, Industry 4.0 aspects are still scarcely integrated with HF/E. This paper calls for a holistic understanding of the changing …


Developing An Artificial Intelligence Framework To Assess Shipbuilding And Repair Sub-Tier Supply Chains Risk, Rafael Diaz, Katherine Smith, Beatriz Acero, Francesco Longo, Antonio Padovano Jan 2021

Developing An Artificial Intelligence Framework To Assess Shipbuilding And Repair Sub-Tier Supply Chains Risk, Rafael Diaz, Katherine Smith, Beatriz Acero, Francesco Longo, Antonio Padovano

VMASC Publications

The defense shipbuilding and repair industry is a labor-intensive sector that can be characterized by low-product volumes and high investments in which a large number of shared resources, technology, suppliers, and processes asynchronously converge into large construction projects. It is mainly organized by the execution of a complex combination of sequential and overlapping stages. While entities engaged in this large-scale endeavor are often knowledgeable about their first-tier suppliers, they usually do not have insight into the lower tiers suppliers. A sizable part of any supply chain disruption is attributable to instabilities in sub-tier suppliers. This research note conceptually delineates a …


Hybrid Models As Transdisciplinary Research Enablers, Andreas Tolk, Alison Harper, Navonil Mustafee Jan 2021

Hybrid Models As Transdisciplinary Research Enablers, Andreas Tolk, Alison Harper, Navonil Mustafee

Computational Modeling & Simulation Engineering Faculty Publications

Modelling and simulation (M&S) techniques are frequently used in Operations Research (OR) to aid decision-making. With growing complexity of systems to be modelled, an increasing number of studies now apply multiple M&S techniques or hybrid simulation (HS) to represent the underlying system of interest. A parallel but related theme of research is extending the HS approach to include the development of hybrid models (HM). HM extends the M&S discipline by combining theories, methods and tools from across disciplines and applying multidisciplinary, interdisciplinary and transdisciplinary solutions to practice. In the broader OR literature, there are numerous examples of cross-disciplinary approaches in …


Advancing Cyanobacteria Biomass Estimation From Hyperspectral Observations: Demonstrations With Hico And Prisma Imagery, Ryan E. O'Shea, Nima Pahlevan, Brandon Smith, Mariano Bresciani, Todd Egerton, Claudia Giardino, Lin Li, Tim Moore, Antonio Ruiz-Verdu, Steve Ruberg, Stefan G.H. Simis, Richard Stumpf, Diana Vaičiūtė Jan 2021

Advancing Cyanobacteria Biomass Estimation From Hyperspectral Observations: Demonstrations With Hico And Prisma Imagery, Ryan E. O'Shea, Nima Pahlevan, Brandon Smith, Mariano Bresciani, Todd Egerton, Claudia Giardino, Lin Li, Tim Moore, Antonio Ruiz-Verdu, Steve Ruberg, Stefan G.H. Simis, Richard Stumpf, Diana Vaičiūtė

Biological Sciences Faculty Publications

Retrieval of the phycocyanin concentration (PC), a characteristic pigment of, and proxy for, cyanobacteria biomass, from hyperspectral satellite remote sensing measurements is challenging due to uncertainties in the remote sensing reflectance (∆Rrs) resulting from atmospheric correction and instrument radiometric noise. Although several individual algorithms have been proven to capture local variations in cyanobacteria biomass in specific regions, their performance has not been assessed on hyperspectral images from satellite sensors. Our work leverages a machine-learning model, Mixture Density Networks (MDNs), trained on a large (N = 939) dataset of collocated in situ chlorophyll-a concentrations (Chla), …


See-Trend: Secure Traffic-Related Event Detection In Smart Communities, Stephan Olariu, Dimitrie C. Popescu Jan 2021

See-Trend: Secure Traffic-Related Event Detection In Smart Communities, Stephan Olariu, Dimitrie C. Popescu

Computer Science Faculty Publications

It has been widely recognized that one of the critical services provided by Smart Cities and Smart Communities is Smart Mobility. This paper lays the theoretical foundations of SEE-TREND, a system for Secure Early Traffic-Related EveNt Detection in Smart Cities and Smart Communities. SEE-TREND promotes Smart Mobility by implementing an anonymous, probabilistic collection of traffic-related data from passing vehicles. The collected data are then aggregated and used by its inference engine to build beliefs about the state of the traffic, to detect traffic trends, and to disseminate relevant traffic-related information along the roadway to help the driving public make informed …


Continuity Of Chen-Fliess Series For Applications In System Identification And Machine Learning, Rafael Dahmen, W. Steven Gray, Alexander Schmeding Jan 2021

Continuity Of Chen-Fliess Series For Applications In System Identification And Machine Learning, Rafael Dahmen, W. Steven Gray, Alexander Schmeding

Electrical & Computer Engineering Faculty Publications

Model continuity plays an important role in applications like system identification, adaptive control, and machine learning. This paper provides sufficient conditions under which input-output systems represented by locally convergent Chen-Fliess series are jointly continuous with respect to their generating series and as operators mapping a ball in an Lp-space to a ball in an Lq-space, where p and q are conjugate exponents. The starting point is to introduce a class of topological vector spaces known as Silva spaces to frame the problem and then to employ the concept of a direct limit to describe convergence. The proof of the main …


Deapsecure Computational Training For Cybersecurity Students: Improvements, Mid-Stage Evaluation, And Lessons Learned, Wirawan Purwanto, Yuming He, Jewel Ossom, Qiao Zhang, Liuwan Zhu, Karina Arcaute, Masha Sosonkina, Hongyi Wu Jan 2021

Deapsecure Computational Training For Cybersecurity Students: Improvements, Mid-Stage Evaluation, And Lessons Learned, Wirawan Purwanto, Yuming He, Jewel Ossom, Qiao Zhang, Liuwan Zhu, Karina Arcaute, Masha Sosonkina, Hongyi Wu

University Administration Publications

DeapSECURE is a non-degree computational training program that provides a solid high-performance computing (HPC) and big-data foundation for cybersecurity students. DeapSECURE consists of six modules covering a broad spectrum of topics such as HPC platforms, big-data analytics, machine learning, privacy-preserving methods, and parallel programming. In the second year of this program, to improve the learning experience, we implemented a number of changes, such as grouping modules into two broad categories, "big-data" and "HPC"; creating a single cybersecurity storyline across the modules; and introducing post-workshop (optional) "hackshops." Two major goals of these changes are, firstly, to effectively engage students to maintain …


Biocybersecurity: A Converging Threat As An Auxiliary To War, Lucas Potter, Orlando Ayala, Xavier-Lewis Palmer Jan 2021

Biocybersecurity: A Converging Threat As An Auxiliary To War, Lucas Potter, Orlando Ayala, Xavier-Lewis Palmer

Engineering Technology Faculty Publications

Biodefense is the discipline of ensuring biosecurity with respect to select groups of organisms and limiting their spread. This field has increasingly been challenged by novel threats from nature that have been weaponized such as SARS, Anthrax, and similar pathogens, but has emerged victorious through collaboration of national and world health groups. However, it may come under additional stress in the 21st century as the field intersects with the cyberworld-- a world where governments have already been struggling to keep up with cyber attacks from small to state-level actors as cyberthreats have been relied on to level the playing field …


Hidden Markov Model And Cyber Deception For The Prevention Of Adversarial Lateral Movement, Md Ali Reza Al Amin, Sachin Shetty, Laurent Njilla, Deepak K. Tosh, Charles Kamhoua Jan 2021

Hidden Markov Model And Cyber Deception For The Prevention Of Adversarial Lateral Movement, Md Ali Reza Al Amin, Sachin Shetty, Laurent Njilla, Deepak K. Tosh, Charles Kamhoua

Computational Modeling & Simulation Engineering Faculty Publications

Advanced persistent threats (APTs) have emerged as multi-stage attacks that have targeted nation-states and their associated entities, including private and corporate sectors. Cyber deception has emerged as a defense approach to secure our cyber infrastructure from APTs. Practical deployment of cyber deception relies on defenders' ability to place decoy nodes along the APT path optimally. This paper presents a cyber deception approach focused on predicting the most likely sequence of attack paths and deploying decoy nodes along the predicted path. Our proposed approach combines reactive (graph analysis) and proactive (cyber deception technology) defense to thwart the adversaries' lateral movement. The …


De Novo Prediction Of Drug–Target Interactions Using Laplacian Regularized Schatten P-Norm Minimization, Gaoyan Wu, Mengyun Yang, Yaohang Li, Jianxin Wang Jan 2021

De Novo Prediction Of Drug–Target Interactions Using Laplacian Regularized Schatten P-Norm Minimization, Gaoyan Wu, Mengyun Yang, Yaohang Li, Jianxin Wang

Computer Science Faculty Publications

In pharmaceutical sciences, a crucial step of the drug discovery is the identification of drug–target interactions (DTIs). However, only a small portion of the DTIs have been experimentally validated. Moreover, it is an extremely laborious, expensive, and time-consuming procedure to capture new interactions between drugs and targets through traditional biochemical experiments. Therefore, designing computational methods for predicting potential interactions to guide the experimental verification is of practical significance, especially for de novo situation. In this article, we propose a new algorithm, namely Laplacian regularized Schatten p-norm minimization (LRSpNM), to predict potential target proteins for novel drugs and potential drugs for …


Fmri Feature Extraction Model For Adhd Classification Using Convolutional Neural Network, Senuri De Silva, Sanuwani Udara Dayarathna, Gangani Ariyarathne, Dulani Meedeniya, Sampath Jayarathna Jan 2021

Fmri Feature Extraction Model For Adhd Classification Using Convolutional Neural Network, Senuri De Silva, Sanuwani Udara Dayarathna, Gangani Ariyarathne, Dulani Meedeniya, Sampath Jayarathna

Computer Science Faculty Publications

Biomedical intelligence provides a predictive mechanism for the automatic diagnosis of diseases and disorders. With the advancements of computational biology, neuroimaging techniques have been used extensively in clinical data analysis. Attention deficit hyperactivity disorder (ADHD) is a psychiatric disorder, with the symptomology of inattention, impulsivity, and hyperactivity, in which early diagnosis is crucial to prevent unwelcome outcomes. This study addresses ADHD identification using functional magnetic resonance imaging (fMRI) data for the resting state brain by evaluating multiple feature extraction methods. The features of seed-based correlation (SBC), fractional amplitude of low-frequency fluctuation (fALFF), and regional homogeneity (ReHo) are comparatively applied to …


Automated Filtering Of Eye Movements Using Dynamic Aoi In Multiple Granularity Levels, Gavindya Jayawardena, Sampath Jayarathna Jan 2021

Automated Filtering Of Eye Movements Using Dynamic Aoi In Multiple Granularity Levels, Gavindya Jayawardena, Sampath Jayarathna

Computer Science Faculty Publications

Eye-tracking experiments involve areas of interest (AOIs) for the analysis of eye gaze data. While there are tools to delineate AOIs to extract eye movement data, they may require users to manually draw boundaries of AOIs on eye tracking stimuli or use markers to define AOIs. This paper introduces two novel techniques to dynamically filter eye movement data from AOIs for the analysis of eye metrics from multiple levels of granularity. The authors incorporate pre-trained object detectors and object instance segmentation models for offline detection of dynamic AOIs in video streams. This research presents the implementation and evaluation of object …


Smart Parking Systems: Reviewing The Literature, Architecture And Ways Forward, Can Biyik, Zaheer Allam, Gabriele Pieri, Davide Moroni, Muftah O' Fraifer, Eoin O' Connell, Stephan Olariu, Muhammad Khalid Jan 2021

Smart Parking Systems: Reviewing The Literature, Architecture And Ways Forward, Can Biyik, Zaheer Allam, Gabriele Pieri, Davide Moroni, Muftah O' Fraifer, Eoin O' Connell, Stephan Olariu, Muhammad Khalid

Computer Science Faculty Publications

The Internet of Things (IoT) has come of age, and complex solutions can now be implemented seamlessly within urban governance and management frameworks and processes. For cities, growing rates of car ownership are rendering parking availability a challenge and lowering the quality of life through increased carbon emissions. The development of smart parking solutions is thus necessary to reduce the time spent looking for parking and to reduce greenhouse gas emissions. The principal role of this research paper is to analyze smart parking solutions from a technical perspective, underlining the systems and sensors that are available, as documented in the …


Vehicular Crowdsourcing For Congestion Support In Smart Cities, Stephan Olariu Jan 2021

Vehicular Crowdsourcing For Congestion Support In Smart Cities, Stephan Olariu

Computer Science Faculty Publications

Under present-day practices, the vehicles on our roadways and city streets are mere spectators that witness traffic-related events without being able to participate in the mitigation of their effect. This paper lays the theoretical foundations of a framework for harnessing the on-board computational resources in vehicles stuck in urban congestion in order to assist transportation agencies with preventing or dissipating congestion through large-scale signal re-timing. Our framework is called VACCS: Vehicular Crowdsourcing for Congestion Support in Smart Cities. What makes this framework unique is that we suggest that in such situations the vehicles have the potential to cooperate with various …


Large Scale Subject Category Classification Of Scholarly Papers With Deep Attentive Neural Networks, Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C. Lee Giles Jan 2021

Large Scale Subject Category Classification Of Scholarly Papers With Deep Attentive Neural Networks, Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

Subject categories of scholarly papers generally refer to the knowledge domain(s) to which the papers belong, examples being computer science or physics. Subject category classification is a prerequisite for bibliometric studies, organizing scientific publications for domain knowledge extraction, and facilitating faceted searches for digital library search engines. Unfortunately, many academic papers do not have such information as part of their metadata. Most existing methods for solving this task focus on unsupervised learning that often relies on citation networks. However, a complete list of papers citing the current paper may not be readily available. In particular, new papers that have few …


Understanding The Impact Of Encrypted Dns On Internet Censorship, Lin Jin, Shuai Hao, Haining Wang, Chase Cotton Jan 2021

Understanding The Impact Of Encrypted Dns On Internet Censorship, Lin Jin, Shuai Hao, Haining Wang, Chase Cotton

Computer Science Faculty Publications

DNS traffic is transmitted in plaintext, resulting in privacy leakage. To combat this problem, secure protocols have been used to encrypt DNS messages. Existing studies have investigated the performance overhead and privacy benefits of encrypted DNS communications, yet little has been done from the perspective of censorship. In this paper, we study the impact of the encrypted DNS on Internet censorship in two aspects. On one hand, we explore the severity of DNS manipulation, which could be leveraged for Internet censorship, given the use of encrypted DNS resolvers. In particular, we perform 7.4 million DNS lookup measurements on 3,813 DoT …


Extraction And Evaluation Of Statistical Information From Social And Behavioral Science Papers, Sree Sai Teja Lanka, Sarah Rajtmajer, Jian Wu, C. Lee Giles Jan 2021

Extraction And Evaluation Of Statistical Information From Social And Behavioral Science Papers, Sree Sai Teja Lanka, Sarah Rajtmajer, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

With substantial and continuing increases in the number of published papers across the scientific literature, development of reliable approaches for automated discovery and assessment of published findings is increasingly urgent. Tools which can extract critical information from scientific papers and metadata can support representation and reasoning over existing findings, and offer insights into replicability, robustness and generalizability of specific claims. In this work, we present a pipeline for the extraction of statistical information (p-values, sample size, number of hypotheses tested) from full-text scientific documents. We validate our approach on 300 papers selected from the social and behavioral science literatures, and …


Ranked List Fusion And Re-Ranking With Pre-Trained Transformers For Arqmath Lab, Shaurya Rohatgi, Jian Wu, C. Lee Giles Jan 2021

Ranked List Fusion And Re-Ranking With Pre-Trained Transformers For Arqmath Lab, Shaurya Rohatgi, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

This paper elaborates on our submission to the ARQMath track at CLEF 2021. For our submission this year we use a collection of methods to retrieve and re-rank the answers in Math Stack Exchange in addition to our two-stage model which was comparable to the best model last year in terms of NDCG’. We also provide a detailed analysis of what the transformers are learning and why is it hard to train a math language model using transformers. This year’s submission to Task-1 includes summarizing long question-answer pairs to augment and index documents, using byte-pair encoding to tokenize formula and …


Recognizing Figure Labels In Patents, Ming Gong, Xin Wei, Diane Oyen, Jian Wu, Martin Gryder Jan 2021

Recognizing Figure Labels In Patents, Ming Gong, Xin Wei, Diane Oyen, Jian Wu, Martin Gryder

Computer Science Faculty Publications

Scientific documents often contain significant information in figures. The United States Patent and Trademark Office (USPTO) awards thousands of patents each week, with each patent containing on the order of a dozen figures. The information conveyed by these figures typically include a drawing or diagram, a label, caption and reference text within the document. Yet associating the short bits of text to the figure is challenging when labels are embedded within the figure, as they typically are in patents. Using patents as a testbench, this paper highlights an open challenge in analyzing all of the information presented in scientific/technical documents …


Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles Jan 2021

Extractive Research Slide Generation Using Windowed Labeling Ranking, Athar Sefid, Prasenjit Mitra, Jian Wu, C. Lee Giles

Computer Science Faculty Publications

Presentation slides generated from original research papers provide an efficient form to present research innovations. Manually generating presentation slides is labor-intensive. We propose a method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites. The sentence labeling module of our method is based on SummaRuNNer, a neural sequence model for extractive summarization. Instead of ranking sentences based on semantic similarities in the whole document, our algorithm measures the importance and novelty of sentences by combining semantic and lexical features within a sentence window. Our method outperforms several baseline methods …


Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox Jan 2021

Automatic Metadata Extraction Incorporating Visual Features From Scanned Electronic Theses And Dissertations, Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, Edward A. Fox

Computer Science Faculty Publications

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a …


Systematizing Confidence In Open Research And Evidence (Score), Nazanin Alipourfard, Beatrix Arendt, Daniel M. Benjamin, Noam Benkler, Michael Bishop, Mark Burstein, Martin Bush, James Caverlee, Yiling Chen, Chae Clark, Anna Dreber Almenberg, Timothy M. Errington, Fiona Fidler, Nicholas Fox, Aaron Frank, Hannah Fraser, Scott Friedman, Ben Gelman, James Gentile, Jian Wu, Et Al., Score Collaboration Jan 2021

Systematizing Confidence In Open Research And Evidence (Score), Nazanin Alipourfard, Beatrix Arendt, Daniel M. Benjamin, Noam Benkler, Michael Bishop, Mark Burstein, Martin Bush, James Caverlee, Yiling Chen, Chae Clark, Anna Dreber Almenberg, Timothy M. Errington, Fiona Fidler, Nicholas Fox, Aaron Frank, Hannah Fraser, Scott Friedman, Ben Gelman, James Gentile, Jian Wu, Et Al., Score Collaboration

Computer Science Faculty Publications

Assessing the credibility of research claims is a central, continuous, and laborious part of the scientific process. Credibility assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts. Such assessments can require substantial time and effort. Research progress could be accelerated if there were rapid, scalable, accurate credibility indicators to guide attention and resource allocation for further assessment. The SCORE program is creating and validating algorithms to provide confidence scores for research claims at scale. To investigate the viability of scalable tools, teams are creating: a database of claims from papers in the social and behavioral …