Open Access. Powered by Scholars. Published by Universities.®

Law Commons

Open Access. Powered by Scholars. Published by Universities.®

Administrative Law

Cornell University Law School

Series

Text categorization

Articles 1 - 3 of 3

Full-Text Articles in Law

Active Learning For E-Rulemaking: Public Comment Categorization, Stephen Purpura, Claire Cardie, Jesse Simons May 2008

Active Learning For E-Rulemaking: Public Comment Categorization, Stephen Purpura, Claire Cardie, Jesse Simons

Cornell e-Rulemaking Initiative Publications

We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking - by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample …


Facilitating Issue Categorization & Analysis In Rulemaking, Thomas R. Bruce, Claire Cardie, Cynthia R. Farina, Stephen Purpura May 2008

Facilitating Issue Categorization & Analysis In Rulemaking, Thomas R. Bruce, Claire Cardie, Cynthia R. Farina, Stephen Purpura

Cornell e-Rulemaking Initiative Publications

One task common to all notice-and-comment rulemaking is identifying substantive claims and arguments made in the comments by stakeholders and other members of the public. Extracting and summarizing this material may be helpful to internal decisionmaking; to produce the legally required public explanation of the final rule, it is essential. When comments are lengthy or numerous, natural language processing and machine learning techniques can help the rulewriter work more quickly and comprehensively. Even when a smaller volume of comment material is received, the ability to annotate relevant portions and store information about them in a way that permits retrieval and …


An Erulemaking Corpus: Identifying Substantive Issues In Public Comments, Claire Cardie, Cynthia R. Farina, Matt Rawding, Adil Aijaz May 2008

An Erulemaking Corpus: Identifying Substantive Issues In Public Comments, Claire Cardie, Cynthia R. Farina, Matt Rawding, Adil Aijaz

Cornell e-Rulemaking Initiative Publications

We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators. We also briefly describe the results of experiments that apply standard and hierarchical text categorization techniques to the eRulemaking data sets. The corpus is the first in a series of related sentence-level text categorization corpora to be developed in the eRulemaking domain.