Web search engines are well known for aggregating news vertical content into their result rankings in response to queries classified
as news-related. However, no dataset currently exists upon which
approaches to news query classification can be evaluated and compared.
This paper studies the generation and validation of a news
query classification dataset comprised of labels crowdsourced from
Amazon’s Mechanical Turk and details insights gained. Notably,
our study focuses around two challenges when crowdsourcing news
query classification labels: 1) how to overcome our workers’ lack
of information about the news stories from the time of each query
and 2) how to ensure the resulting labels are of high enough quality
to make the dataset useful. We empirically show that a worker’s
lack of information about news stories can be addressed through
the integration of news-related content into the labelling interface
and that this improves the quality of the resulting labels. Overall,
we find that crowdsourcing is suitable for building a news query
classification dataset.
Richard McCreadie, Craig Macdonald, and Iadh Ounis.
Crowdsourcing a News Query Classification Dataset.
In Proceedings of CSE 2010.
Geneva, Switzerland, 2010.
0 comments:
Post a Comment