The automatic summarization of long-running events from news steams is a challenging problem. A long-running event can contain hundreds of unique ‘nuggets’ of information to summarize, spread-out over its lifetime. Meanwhile, informatio reported about it can rapidly become outdated and is often highly redundant. Incremental update summarization (IUS) aims to select sentences from news streams to issue as updates to the user, summarising that event over time. The updates issued should cover all of the key nuggets concisely and before the information contained in those nuggets becomes outdated. Prior summarization approaches when applied to IUS can fail, since they define a fixed summary length that cannot effectively account for the different magnitudes and varying rate of development of such events. In this paper, we propose a novel IUS approach that adaptively alters the volume of content issued as updates over time with respect to the prevalence and novelty of discussions about the event. It incorporates existing state-of-the art summarization techniques to rank candidate sentences, followed by a supervised regression model that balances novelty, nugget coverage and timeliness when selecting sentences from the top ranks. We empirically evaluate our approach using the TREC 2013 Temporal Summarization dataset extended with additional assessments. Our results show that by adaptively adjusting the number of sentences to select over time, our approach can nearly double the performance of effective summarization baselines.
Monday, 29 September 2014
Tuesday, 9 September 2014
A Study of Personalised Medical Literature Search
Medical search engines are used everyday by both medical practitioners and the public to find the latest medical literature and guidance regarding conditions and treatments. Importantly, the information needs that drive medical search can vary between users for the same query, as clinicians search for content specific to their own area of expertise, while the public search about topics of interest to them. However, prior research into personalised search has so far focused on the Web search domain, and it is not clear whether personalised approaches will prove similarly effective in a medical environment. Hence, in this paper, we investigate to what extent personalisation can enhance medical search effectiveness. In particular, we first adapt three classical approaches for the task of personalisation in the medical domain, which leverage the user’s clicks, clicks by similar users and explicit/implicit user profiles, respectively. Second, we perform a comparative user study with users from the TRIPDatabase.com medical article search engine to determine whether they outperform an effective baseline production system. Our results show that search result personalisation in the medical domain can be effective, with users stating a preference for personalised rankings for 68% of the queries assessed. Furthermore, we show that for the queries tested, users mainly preferred personalised rankings that promote recent content clicked by similar users, highlighting time as a key dimension of medical article search.
Comparing Algorithms for Microblog Summarisation
Event detection and tracking using social media and user-generated content has received a lot of attention from the research community in recent years, since such sources can purportedly provide up-to-date information about events as they evolve, e.g. earthquakes. Concisely reporting (summarising) events for users/emergency services using information obtained from social media sources like Twitter is not a solved problem. Current systems either directly apply, or build upon, classical summarisation approaches previously shown to be effective within the newswire domain. However, to-date, research into how well these approaches generalise from the newswire to the microblog domain is limited. Hence, in this paper, we compare the performance of eleven summarisation approaches using four microblog summarisation datasets, with the aim of determining which are the most effective and therefore should be used as baselines in future research. Our results indicate that the SumBasic algorithm and Centroid-based summarisation with redundancy reduction are the most effective approaches, across the four datasets and five automatic summarisation evaluation measures tested.
Subscribe to:
Posts (Atom)