Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

2013

PDF

Boise State University Theses and Dissertations

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Document Classification, Shane K. Panter May 2013

Document Classification, Shane K. Panter

Boise State University Theses and Dissertations

We present an overview of the document classification process and present research conducted against the newly constructed SBIR-STTR corpus. Specifically, the current methods in use for annotation, corpus construction, feature construction, feature weighting, and classifier algorithms are surveyed. We introduce a new dataset derived from public data downloaded from sbir.gov and the Text Annotation Toolkit (TAT) 1 for use in classification research.

TAT is a collection of independent components packaged together into one open source software application. TAT was engineered to support the document classification process and workflow. Tracking of changes in a working corpus, saving data used in the …