Tuesday, 20 November 2012

Evaluating Real-Time Search over Tweets

0 comments
Twitter offers a phenomenal platform for the social sharing of information. We describe new resources that have been created in the context of the Text Retrieval Conference (TREC) to support the academic study of Twitter as a real-time information source. We formalize an information seeking task—real-time search—and offer a methodology for measuring system effectiveness. At the TREC 2011 Microblog Track, 58 research groups participated in the first ever evaluation of this task. We present data from the effort to illustrate and support...
newer post

CrowdTerrier: Automatic Crowdsourced Relevance Assessments with Terrier

0 comments
Information retrieval (IR) systems rely on document relevance assessments for queries to gauge their effectiveness for a variety of tasks, e.g. Web result ranking. Evaluation forums such as TREC and CLEF provide relevance assessments for common tasks. However, it is not possible for such venues to cover all of the collections and tasks currently investigated in IR. Hence, it falls to the individual researchers to generate the relevance assessments for new tasks and/or collections. Moreover, relevance assessment generation can be a time-consuming,...
newer post

On Building a Reusable Twitter Corpus

0 comments
The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus – referred to...
newer post

University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks

0 comments
In TREC 2011, we focus on tackling the new challenges proposed by the pilot Crowdsourcing and Microblog tracks, using our Terrier Information Retrieval Platform. Meanwhile, we continue to build upon our novel xQuAD framework and data-driven ranking approaches within Terrier to achieve effective and efficient ranking for the TREC Web track. In particular, the aim of our Microblog track participation is the development of a learning to rank approach for filtering within a tweet ranking environment, where tweets are ranked in reverse chronological...
newer post

Identifying Top News using Crowdsourcing

0 comments
The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track...
newer post

A Learned Approach for Ranking News in Real-time using the Blogosphere

0 comments
Newspaper websites and news aggregators rank news stories by their newsworthiness in real-time for display to the user. Recent work has shown that news stories can be ranked automatically in a retrospective manner based upon related discussion within the blogosphere. However, it is as yet undetermined whether blogs are sufficiently fresh to rank stories in real-time. In this paper, we propose a novel learning to rank framework which leverages current blog posts to rank news stories in a real-time manner. We evaluate our proposed learning...
newer post

University of Glasgow at TREC 2010: Experiments with Terrier in Blog and Web Tracks

0 comments
In TREC 2010, we continue to build upon the Voting Model and experiment with our novel xQuAD framework within the auspices of the Terrier IR Platform. In particular, our focus is the development of novel applications for data-driven learning in the Blog and Web tracks, with experimentation spanning hundreds of features. In the Blog track, we propose novel feature sets for the ranking of blogs, news stories and blog posts. In the Web track, we propose novel selective approaches for adhoc and diversity search. Rodrygo. Santos, Richard...
newer post

MapReduce indexing strategies: Studying scalability and efficiency

0 comments
In Information Retrieval (IR), the efficient indexing of terabyte-scale and larger corpora is still a difficult problem. MapReduce has been proposed as a framework for distributing data-intensive operations across multiple processing machines. In this work, we provide a detailed analysis of four MapReduce indexing strategies of varying complexity. Moreover, we evaluate these indexing strategies by implementing them in an existing IR framework, and performing experiments using the Hadoop MapReduce implementation, in combination with...
newer post

Crowdsourcing Blog Track Top News Judgments at TREC

0 comments
Since its inception, the venerable TREC retrieval conference has relied upon specialist assessors or participating groups to create relevance judgments for the tracks that it runs. However, recently crowdsourcing has been proposed as a possible alternative to traditional TREC-like assessments, supporting fast accumulation of judgments at a low cost. 2010 was the first year that TREC experimented with crowdsourcing. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news...
newer post

Crowdsourcing a News Query Classification Dataset

0 comments
Web search engines are well known for aggregating news vertical content into their result rankings in response to queries classified as news-related. However, no dataset currently exists upon which approaches to news query classification can be evaluated and compared. This paper studies the generation and validation of a news query classification dataset comprised of labels crowdsourced from Amazon’s Mechanical Turk and details insights gained. Notably, our study focuses around two challenges when crowdsourcing news query classification...
newer post

Leveraging User-Generated Content for News Search

0 comments
Over the last few years both availability and accessibility of current news stories on the Web have dramatically improved [3]. In particular, users can now access news from a variety of sources hosted on theWeb, from newswire presences such as the New York Times, to integrated news search within Web search engines. However, of central interest is the emerging impact that user-generated content (UGC) is having on this online news landscape. Indeed, the emergence of Web 2.0 has turned a static news consumer base into a dynamic news machine,...
newer post

News Article Ranking: Leveraging the Wisdom of Bloggers

0 comments
Every day, editors rank news articles for placement within theirnewspapers. In this paper, we investigate how news article rankingcan be performed automatically. In particular, we investigatethe blogosphere as a prime source of evidence, on the intuition thatbloggers, and by extension their blog posts, can indicate interestin one news article or another. Moreover, we propose to model thisautomatic news article ranking task as a voting process, where eachrelevant blog post acts as a vote for one or more news articles. Weevaluate this...
newer post

Insights of the Horizons of News Search

0 comments
In recent years, news reporting and consumption has made the profound shift from paper-based media to free online publications, while the simultaneous emergence of Web 2.0 has fundamentally changed the way we react to news. In this paper, we argue that the rapid increase in volume of user-generated content now available presents new and exciting opportunities for the furtherment of news search. In particular, we discuss new applications for user-generated content when determining the stories of the moment, as well as adding value to...
newer post
newer post older post Home