Tuesday, 15 January 2013

Bieber no more: First Story Detection using Twitter and Wikipedia

0 comments
Twitter is a well known source of information regarding
breaking news stories. This aspect of Twitter makes it ideal
for identifying events as they happen. However, a key problem
with Twitter-driven event detection approaches is that
they produce many spurious events, i.e., events that are
wrongly detected or simply are of no interest to anyone.
In this paper, we examine whether Wikipedia (when viewed
as a stream of page views) can be used to improve the quality
of discovered events in Twitter. Our results suggest that
Wikipedia is a powerful filtering mechanism, allowing for
easy blocking of large numbers of spurious events. Our results
also indicate that events within Wikipedia tend to lag
behind Twitter.

Miles Osborne and Saša Petrovic, Richard McCreadie, Craig Macdonald, and Iadh Ounis
Bieber no more: First Story Detection using Twitter and Wikipedia
In Proceedings of TAIA 2012 at SIGIR 2012

PDF
BIBTEX
newer post

Tuesday, 20 November 2012

Evaluating Real-Time Search over Tweets

0 comments
Twitter offers a phenomenal platform for the social sharing of information. We describe new resources that have been created in the context of the Text Retrieval Conference (TREC) to support the academic study of Twitter as a real-time information source. We formalize an information seeking task—real-time search—and offer a methodology for measuring system effectiveness. At the TREC 2011 Microblog Track, 58 research groups participated in the first ever evaluation of this task. We present data from the effort to illustrate and support our methodology.

Ian Soboroff, Dean McCullough, Jimmy Lin, Craig Macdonald, Iadh Ounis and Richard McCreadie
Evaluating Real-Time Search over Tweets
In Proceedings of ICWSM 2012

PDF
Bibtex

newer post

CrowdTerrier: Automatic Crowdsourced Relevance Assessments with Terrier

0 comments
Information retrieval (IR) systems rely on document relevance assessments for queries to gauge their effectiveness for a variety of tasks, e.g. Web result ranking. Evaluation forums such as TREC and CLEF provide relevance assessments for common tasks. However, it is not possible for such venues to cover all of the collections and tasks currently investigated in IR. Hence, it falls to the individual researchers to generate the relevance assessments for new tasks and/or collections. Moreover, relevance assessment generation can be a time-consuming, difficult and potentially costly process. Recently, crowdsourcing has been shown to be a fast and cheap method to generate relevance assessments in a semi-automatic manner [1]. In this case, the relevance assessment task is outsourced to a large group of non-expert workers, where workers are rewarded via micro-payments....

Richard McCreadie, Craig Macdonald and Iadh Ounis.
CrowdTerrier: Automatic Crowdsourced Relevance Assessments with Terrier.
In Proceedings of SIGIR 2012

PDF
Bibtex

newer post

On Building a Reusable Twitter Corpus

0 comments
The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus – referred to as Tweets2011 – was distributed via lists of tweet identifiers and specialist tweet crawling software. Furthermore, we analyse whether this distribution approach remains robust over time, as tweets in the corpus are removed either by users or Twitter itself. Tweets2011 was successfully used by 58 participating groups for the TREC 2011 Microblog track, while our results attest to the robustness of the crawling methodology over time.

Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, and Dean McCullough.
On Building a Reusable Twitter Corpus.
In Proceedings of SIGIR 2012


PDF

Bibtex



newer post

University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks

0 comments
In TREC 2011, we focus on tackling the new challenges proposed by the pilot Crowdsourcing and Microblog tracks, using our Terrier Information Retrieval Platform. Meanwhile, we continue to build upon our novel xQuAD framework and data-driven ranking approaches within Terrier to achieve effective and efficient ranking for the TREC Web track. In particular, the aim of our Microblog track participation is the development of a learning to rank approach for filtering within a tweet ranking environment, where tweets are ranked in reverse chronological order. In the Crowdsourcing track, we work to achieve a closer integration between the crowdsourcing marketplaces that are used for relevance assessment, and Terrier, which produces the document rankings to be assessed. Moreover, we focus on generating relevance assessments quickly and at a minimal cost. For the Web track, we enhance the data-driven learning support within Terrier by proposing a novel framework for the fast computation of document features for learning to rank.

Richard McCreadie, Craig Macdonald, Rodrygo. Santos and Iadh Ounis.
University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks.
In Proceedings of TREC 2011

PDF
Bibtex


newer post

Identifying Top News using Crowdsourcing

0 comments
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.

Richard McCreadie, Craig Macdonald and Iadh Ounis.
Identifying Top News using Crowdsourcing.
Information Retrieval Journal, 2012

Springer Link
Bibtex


newer post

A Learned Approach for Ranking News in Real-time using the Blogosphere

0 comments
Newspaper websites and news aggregators rank news stories by their newsworthiness in real-time for display to the user. Recent work has shown that news stories can be ranked automatically in a retrospective manner based upon related discussion within the blogosphere. However, it is as yet undetermined whether blogs are sufficiently fresh to rank stories in real-time. In this paper, we propose a novel learning to rank framework which leverages current blog posts to rank news stories in a real-time manner. We evaluate our proposed learning framework within the context of the TREC Blog track top stories identification task. Our results show that, indeed, the blogosphere can be leveraged for the realtime ranking of news, including for unpredictable events. Our approach improves upon state-of-the-art story ranking approaches, outperforming both the best TREC 2009/2010 systems and its single best performing
feature.

Richard McCreadie, Craig Macdonald and Iadh Ounis.
A Learned Approach for Ranking News in Real-time using the Blogosphere.
In Proceedings of SPIRE 2011

PDF
Bibtex


newer post
newer post older post Home