Friday, 29 August 2014

On Choosing an Effective Automatic Evaluation Metric for Microblog Summarisation

0 comments
Popular microblogging services, such as Twitter, are engaging millions of users who constantly post and share information about news and current events each day, resulting in millions of messages discussing what is happening in the world. To help users obtain an overview of microblog content relating to topics and events that they are interested in, classical summarisation techniques from the newswire domain have been successfully applied and extended for use on microblogs. However, much of the current literature on microblog summarisation assumes that the summarisation evaluation measures that have been shown to be e ffective on newswire, are still appropriate for evaluating microblog summarisation. Hence, in this paper, we aim to determine whether the traditional automatic newswire summarisation evaluation metrics generalise to the task of microblog summarisation. In particular, using three microblog summarisation datasets, we determine a ranking of summarisation systems under three automatic summarisation evaluation metrics from the literature. We then compare and contrast this ranking of systems produced under each metric to system rankings produced through a qualitative user evaluation, with the aim of determining which metric best simulates human summarisation preferences. Our results indicate that, for the automatic evaluation metrics we investigate, they do not always concur with each other. Further, we find that Fraction of Topic Words better agrees with what users tell us about the quality and e ectiveness of microblog summaries than the ROUGE-1 measure that is most commonly reported in the literature.

Stuart Mackie, Richard McCreadie, Craig Macdonald and Iadh Ounis.
On Choosing an Effective Automatic Evaluation Metric for Microblog Summarisation
In Proceedings of IIIX 2014.

PDF
BIBTEX
newer post

Monday, 25 August 2014

Real-Time Detection, Tracking, and Monitoring of Automatically Discovered Events in Social Media

0 comments
We introduce ReDites, a system for realtime event detection, tracking, monitoring and visualisation. It is designed to assist Information Analysts in understanding and exploring complex events as they unfold in the world. Events are automatically detected from the Twitter stream. Then those that are categorised as being security-relevant are tracked, geolocated, summarised and visualised for the end-user. Furthermore, the system tracks changes in emotions over events, signalling possible flashpoints or abatement. We demonstrate the capabilities of ReDites using an extended use case from the September 2013 Westgate shooting incident. Through an evaluation of system latencies, we also show that enriched events are made available for users to explore within seconds of that event occurring.

Miles Osborne , Sean Moran, Richard McCreadie, Alexander Von Lunen, Martin Sykora, Elizabeth Cano, Neil Ireson, Craig Macdonald, Iadh Ounis, Yulan He, Tom Jackson, Fabio Ciravegn, Ann O’Brien
Real-Time Detection, Tracking, and Monitoring of Automatically
Discovered Events in Social Media

In Proceedings of ACL 2014.

PDF
BIBTEX
newer post

Tuesday, 3 September 2013

Scalable Distributed Event Detection for Twitter

1 comments
Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose a novel event detection approach that can both effectively detect events within social streams like Twitter and can scale to thousands of posts every second. Through experimentation on a large Twitter dataset, we show that our approach can process the equivalent to the full Twitter Firehose stream, while maintaining event detection accuracy and outperforming an alternative distributed event detection system.

Richard McCreadie, Craig Macdonald, Iadh Ounis, Miles Osborne, Sasa Petrovic
Scalable Distributed Event Detection for Twitter
In Proceedings of the IEEE Big Data Conference 2013

PDF
BIBTEX
newer post

Wednesday, 17 July 2013

News Vertical Search: When and What to Display to Users

1 comments


News reporting has seen a shift toward fast-paced online reporting in new sources such as social media. Web Search engines that support a news vertical have historically relied upon articles published by major newswire providers when serving news-related queries. In this paper, we investigate to what extent real-time content from newswire, blogs, Twitter and Wikipedia sources are useful to return to the user in the current fast-paced news search setting. In particular, we perform a detailed user study using the emerging medium of crowdsourcing to determine when and where integrating news-related content from these various sources can better serve the user’s news need. We sampled approximately 300 news-related search queries using Google Trends and Bitly data in real-time for two time periods. For these queries, we have crowdsourced workers compare Web search rankings for each, with similar rankings integrating real-time news content from sources such as Twitter or the blogosphere. Our results show that users exhibited a preference for rankings integrating newswire articles for only half of our queries, indicating that relying solely on newswire providers for news-related content is now insufficient. Moreover, our results show that users preferred rankings that integrate tweets more often than those that integrate newswire articles, showing the potential of using social media to better serve news queries.

Richard McCreadie, Craig Macdonald, Iadh Ounis
News Vertical Search: When and What to Display to Users
In Proceedings of SIGIR 2013


PDF
BIBTEX
newer post

Monday, 15 July 2013

Learning to Process Big Data with MapReduce and Hadoop - Hands-On Exercises

151 comments
I recently ran a hands-on session at the SICSA Summer School on Big Data Information Visualisation on MapReduce and Hadoop. The aim was to teach students how to process Big Data in a fast scalable manner using the  MapReduce programming paradigm. The students started with a simple word counting program, learned its structure and how to deploy it both locally and to a Amazon Web Services Elastic MapReduce Cluster. They then learned how to modify MapReduce programs to find top hashtags in Twitter data and design output formats to target another application, i.e. Twitter search using the open source Terrier Information Retrieval platform.

The session was highly successful and as such I am making the exercise sheets and source code freely available. The exercise sheets assume that you have some familiarity with Java programming, are running either Linux or Mac and if you want to use Elastic MapReduce then you have already created an account (if not then I suggest you run through the video tutorials available here).

Download the Lab Code-Base Now
Download the Exercise Sheets Now
  • This is a PDF containing the three exercise sheets.
newer post

Tuesday, 2 April 2013

Relevance in Microblogs: Enhancing Tweet Retrieval using Hyperlinked Documents

0 comments
Twitter serves over 1.6 billion searches each day, ranking tweets for display to the user in reverse-chronological order. However, finding relevant tweets can be a challenging task, since the relevance of a tweet is dependant both on its content and whether it links to a useful document. In this paper, we investigate how the content of documents hyperlinked from a tweet can be used to better estimate that tweet’s relevance. In particular, we propose three approaches for incorporating the content of hyperlinked documents when ranking tweets. Within the context of the TREC 2011 and 2012 Microblog Tracks, we thoroughly evaluate to what extent hyperlinked documents can aid tweet retrieval effectiveness. Our results show that the application of hyperlinked documents can improve retrieval effectiveness over using the tweet content alone as well as using the presence of a URL within the tweet as a feature.

Richard McCreadie, Craig Macdonald
Relevance in Microblogs: Enhancing Tweet Retrieval using Hyperlinked Documents
In Proceedings of OAIR 2013


PDF
BIBTEX
newer post

Wednesday, 27 March 2013

University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks

0 comments
In TREC 2012, we focus on tackling the new challenges posed by the Medical, Microblog and Web tracks, using our Terrier Information Retrieval Platform. In particular, for the Medical track, we investigate how to exploit implicit knowledge within medical records, with the aim of better identifying those records from patients with specific medical conditions. For the Microblog track adhoc task, we investigate novel techniques to leverage documents hyperlinked from tweets to better estimate relevance of those tweets and increase recall. Meanwhile, for the Microblog track filtering task, we developed a new stream processing infrastructure for real-time adaptive filtering on top of the Storm framework. For the TREC Web track, we continue to build upon our learning-to-rank approaches and novel xQuAD framework within Terrier, increasing both effectiveness and efficiency when ranking.

Nut Limsopatham, Richard McCreadie, M-Dyaa Albakour, Craig Macdonald, Rodrygo L. T. Santos, and Iadh Ounis
University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks
Proceedings of TREC 2012

PDF
BIBTEX
newer post
newer post older post Home