Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

PDF

Computer Engineering

Singapore Management University

2023

Learning projects

Articles 1 - 1 of 1

Full-Text Articles in Engineering

Niche: A Curated Dataset Of Engineered Machine Learning Projects In Python, Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, David Lo, David Lo May 2023

Niche: A Curated Dataset Of Engineered Machine Learning Projects In Python, Ratnadira Widyasari, Zhou Yang, Ferdian Thung, Sheng Qin Sim, Fiona Wee, Camellia Lok, Jack Phan, Haodi Qi, Constance Tan, David Lo, David Lo

Research Collection School Of Computing and Information Systems

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on the evidence of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. This …