Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

2008

Entry selection

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Representative Entry Selection For Profiling Blogs, Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun, Rong Jin Oct 2008

Representative Entry Selection For Profiling Blogs, Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun, Rong Jin

Research Collection School Of Computing and Information Systems

Many applications on blog search and mining often meet the challenge of handling huge volume of blog data, in which one single blog could contain hundreds or even thousands of entries. We investigate novel techniques for profiling blogs by selecting a subset of representative entries for each blog. We propose two principles for guiding the entry selection task: representativeness and diversity. Further, we formulate the entry selection task into a combinatorial optimization problem and propose a greedy yet effective algorithm for finding a good approximate solution by exploiting the theory of submodular functions. We suggest blog classification for judging the …


On Profiling Blogs With Representative Entries, Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun Jul 2008

On Profiling Blogs With Representative Entries, Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun

Research Collection School Of Computing and Information Systems

With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching and matching. In this paper, we investigate a new problem of profiling a blog by choosing a set of m most representative entries from the blog, where m is a predefined number that is application-dependent. With the set of selected representative entries, applications on blogs avoid handling hundreds or even thousands of entries …