Tuesday, 3 September 2013

Scalable Distributed Event Detection for Twitter

1 comments
Social media streams, such as Twitter, have shown themselves to be useful sources of real-time information about what is happening in the world. Automatic detection and tracking of events identified in these streams have a variety of real-world applications, e.g. identifying and automatically reporting road accidents for emergency services. However, to be useful, events need to be identified within the stream with a very low latency. This is challenging due to the high volume of posts within these social streams. In this paper, we propose...
newer post

Wednesday, 17 July 2013

News Vertical Search: When and What to Display to Users

1 comments
News reporting has seen a shift toward fast-paced online reporting in new sources such as social media. Web Search engines that support a news vertical have historically relied upon articles published by major newswire providers when serving news-related queries. In this paper, we investigate to what extent real-time content from newswire, blogs, Twitter and Wikipedia sources are useful to return to the user in the current fast-paced news search setting. In particular, we perform a detailed user study using the emerging medium of...
newer post

Monday, 15 July 2013

Learning to Process Big Data with MapReduce and Hadoop - Hands-On Exercises

149 comments
I recently ran a hands-on session at the SICSA Summer School on Big Data Information Visualisation on MapReduce and Hadoop. The aim was to teach students how to process Big Data in a fast scalable manner using the  MapReduce programming paradigm. The students started with a simple word counting program, learned its structure and how to deploy it both locally and to a Amazon Web Services Elastic MapReduce Cluster. They then learned how to modify MapReduce programs to find top hashtags in Twitter data and design output formats to...
newer post

Tuesday, 2 April 2013

Relevance in Microblogs: Enhancing Tweet Retrieval using Hyperlinked Documents

0 comments
Twitter serves over 1.6 billion searches each day, ranking tweets for display to the user in reverse-chronological order. However, finding relevant tweets can be a challenging task, since the relevance of a tweet is dependant both on its content and whether it links to a useful document. In this paper, we investigate how the content of documents hyperlinked from a tweet can be used to better estimate that tweet’s relevance. In particular, we propose three approaches for incorporating the content of hyperlinked documents when ranking...
newer post

Wednesday, 27 March 2013

University of Glasgow at TREC 2012: Experiments with Terrier in Medical Records, Microblog, and Web Tracks

0 comments
In TREC 2012, we focus on tackling the new challenges posed by the Medical, Microblog and Web tracks, using our Terrier Information Retrieval Platform. In particular, for the Medical track, we investigate how to exploit implicit knowledge within medical records, with the aim of better identifying those records from patients with specific medical conditions. For the Microblog track adhoc task, we investigate novel techniques to leverage documents hyperlinked from tweets to better estimate relevance of those tweets and increase recall....
newer post

Thursday, 7 February 2013

Thesis: News Vertical Search using User-Generated Content

0 comments
The thesis investigates how content produced by end-users on the World Wide Web --- referred to as user-generated content --- can enhance the news vertical aspect of a universal Web search engine, such that news-related queries can be satisfied more accurately, comprehensively and in a more timely manner. We propose a news search framework to describe the news vertical aspect of a universal web search engine. This framework is comprised of four components, each providing a different piece of functionality. The Top Events Identification...
newer post

Tuesday, 15 January 2013

Bieber no more: First Story Detection using Twitter and Wikipedia

0 comments
  Twitter is a well known source of information regarding breaking news stories. This aspect of Twitter makes it ideal for identifying events as they happen. However, a key problem with Twitter-driven event detection approaches is that they produce many spurious events, i.e., events that are wrongly detected or simply are of no interest to anyone. In this paper, we examine whether Wikipedia (when viewed as a stream of page views) can be used to improve the quality of discovered events in Twitter. Our results suggest that Wikipedia...
newer post
newer post older post Home