Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Development Of Guidelines For Collecting Transit Ridership Data, Hong Yang, Kun Xie, Sherif Ishak, Qingyu Ma, Yang Liu Feb 2022

Development Of Guidelines For Collecting Transit Ridership Data, Hong Yang, Kun Xie, Sherif Ishak, Qingyu Ma, Yang Liu

Computational Modeling & Simulation Engineering Faculty Publications

Transit ridership is a critical determinant for many transit applications such as operation optimizations and project prioritization under performance-based funding mechanisms. As a result, the quality of ridership data is of utmost importance to both transit administrative agencies and transit operators. Many transit operators in Virginia report their ridership data to the Department of Rail and Public Transportation (DRPT) and the National Transit Database (NTD). However, with no specific guidelines available to transit agencies in Virginia for collecting ridership data, the heterogeneous mixture of diverse data collection methods and technologies has often raised concerns about the consistency and quality of …


Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles Jan 2022

Scholarly Big Data Quality Assessment: A Case Study Of Document Linking And Conflation With S2orc, Jian Wu, Ryan Hiltabrand, Dominik Soós, C. Lee Giles

Computer Science Faculty Publications

Recently, the Allen Institute for Artificial Intelligence released the Semantic Scholar Open Research Corpus (S2ORC), one of the largest open-access scholarly big datasets with more than 130 million scholarly paper records. S2ORC contains a significant portion of automatically generated metadata. The metadata quality could impact downstream tasks such as citation analysis, citation prediction, and link analysis. In this project, we assess the document linking quality and estimate the document conflation rate for the S2ORC dataset. Using semi-automatically curated ground truth corpora, we estimated that the overall document linking quality is high, with 92.6% of documents correctly linking to six major …