Artificial Intelligence and Robotics | Open Access Articles

Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen Jan 2022

Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen

Computer Science Faculty Publications

Technical drawings used for illustrating designs are ubiquitous in patent documents, especially design patents. Different from natural images, these drawings are usually made using black strokes with little color information, making it challenging for models trained on natural images to recognize objects. To facilitate indexing and searching, we propose an effective and efficient visual descriptor model that extracts object names and aspects from patent captions to annotate benchmark patent figure datasets. We compared two state-of-the-art named entity recognition (NER) models and found that with a limited number of annotated samples, the BiLSTM-CRF model outperforms the Transformer model by a significant …

Go to article

Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu Jan 2020

Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu

Computer Science Faculty Publications

Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. …

Go to article

Clinical Big Data And Deep Learning: Applications, Challenges, And Future Outlooks, Ying Yu, Liangliang Liu, Yaohang Li, Jianxin Wang Jan 2019

Clinical Big Data And Deep Learning: Applications, Challenges, And Future Outlooks, Ying Yu, Liangliang Liu, Yaohang Li, Jianxin Wang

Computer Science Faculty Publications

The explosion of digital healthcare data has led to a surge of data-driven medical research based on machine learning. In recent years, as a powerful technique for big data, deep learning has gained a central position in machine learning circles for its great advantages in feature representation and pattern recognition. This article presents a comprehensive overview of studies that employ deep learning methods to deal with clinical data. Firstly, based on the analysis of the characteristics of clinical data, various types of clinical data (e.g., medical images, clinical notes, lab results, vital signs and demographic informatics) are discussed and details …

Go to article

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse Aug 2010

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Computer Science Faculty Publications

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …

Go to article

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Aug 2010

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …

Go to article

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya Dec 2009

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya

Computer Science Faculty Publications

A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of …

Go to article

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Nov 2009

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …

Go to article

An Empirical Investigation Of Filter Attribute Selection Techniques For Software Quality Classification, Kehan Gao, Taghi M. Khoshgoftaar, Huanjing Wang Aug 2009

An Empirical Investigation Of Filter Attribute Selection Techniques For Software Quality Classification, Kehan Gao, Taghi M. Khoshgoftaar, Huanjing Wang

Computer Science Faculty Publications

Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, Automatic Hybrid Search (AHS), …

Go to article

Artificial Intelligence and Robotics Commons^™

Full-Text Articles in Artificial Intelligence and Robotics

Visual Descriptor Extraction From Patent Figure Captions: A Case Study Of Data Efficiency Between Bilstm And Transformer, Xin Wei, Jian Wu, Kehinde Ajayi, Diane Oyen

Computer Science Faculty Publications

Opening Books And The National Corpus Of Graduate Research, William A. Ingram, Edward A. Fox, Jian Wu

Computer Science Faculty Publications

Clinical Big Data And Deep Learning: Applications, Challenges, And Future Outlooks, Ying Yu, Liangliang Liu, Yaohang Li, Jianxin Wang

Computer Science Faculty Publications

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Computer Science Faculty Publications

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya

Computer Science Faculty Publications

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

An Empirical Investigation Of Filter Attribute Selection Techniques For Software Quality Classification, Kehan Gao, Taghi M. Khoshgoftaar, Huanjing Wang

Computer Science Faculty Publications