Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
Dataset 1: Mobile Text Dataset, Keith Vertanen, Per Ola Kristensson
Dataset 1: Mobile Text Dataset, Keith Vertanen, Per Ola Kristensson
Mobile Text Dataset and Language Models
This zip file contains the sentences mined from public web forums and blogs. Additional details about the dataset:
- The data is split into training, development, and test sets based on the original domain name the text was mined from.
- The sent_*.txt files are tab-delimited and contain one sentence parsed from a particular post. Each line contains the device name, forum software, device form factor (tablet or phone), and device input (touch or touch+key) associated with the post it was obtained from.
- The set's subdirectory contains the groupings used in Section 2.
- 64K word list (used in the paper), 5K and …