Andy Secker

Andy Secker leads a number of research and innovation projects on behalf of the BBC News Labs, one of which is SUMMA (Scalable Understanding of Multilingual MediA).

Philippe Wacker interviewed Andy in advance of the Vienna Meetup on 'Fake News and other AI Challenges for the News Media in the 21st Century' (29-30 November 2018).

Philippe Wacker: Just remind us about SUMMA. What were the original objectives of the project and its partners?

Andy Secker: Four years ago, BBC News Labs hosted a hackathon inviting media organisations, research institutions and start-ups working in the field of language technology. SUMMA, is one partnership that came out of that event. Funded by the EU’s biggest research and innovation programme Horizon 2020, the SUMMA consortium has spent the last three years building a media monitoring platform that can help journalists keep abreast of global news. The SUMMA consortium consists of eight partners from academia and industry across Europe each bringing their own unique expertise to the project. The University of Edinburgh are co-ordinating with the remainder of the consortium comprising the BBC, University College London, Deutsche Welle, IDIAP, Priberam, LETA, the University of Sheffield, with QCRI outside Europe.

By chaining together multiple language technologies in a pipeline, SUMMA gives journalists the ability to get a high-level overview of global news trends — as well as more nuanced views on stories and topics that they’re interested in. The BBC’s primary use case is one of “External Media Monitoring”. BBC News has a specific department for monitoring and understanding the world‘s media. However, this is primarily a manual operation at present with multilingual journalists sitting in front of 4 or more news channels received from around the world. As sources of news, both broadcast and online become more numerous, machine assisted help is fast becoming an essential component to a successful media monitoring operation.

The SUMMA project aims to facilitate Media Monitoring by combining language technologies to create a platform which can simultaneously monitor 200 or more live TV and radio channels across multiple input languages. Crucially, this constitutes a huge amount of incoming data and we’re learning a lot about how to combine these technologies to make the overwhelming amount of data understandable for a user.

PW: What are the main results of the project?

AS: The primary result is a platform which allows monitoring hundreds of news sources simultaneously, speech-to-text transcription of broadcast audio in nine languages and translation into English, followed by Segmentation and structuring of ingested content. The platform also then provides information extraction in the form of topic identification, entity recognition and storyline detection and knowledge processing in the form of knowledge graph construction and fact checking.

In the research space, I think it’s fair to say research undertaken as part of the project has significantly advanced the state of the art across a wide range of language technologies. At last count, around 50 publications have been generated. By way of examples, some highlights include the entity linking system which scored the best for English in two categories in the NIST’s TAC Knowledge Base Population competition in 2017. Created by Priberam, this entity linking system also works in Spanish, Portuguese and German. For news story clustering, agglomerating articles together which report the same story across different languages, the project has reported state-of-the-art results for English, Spanish and German, according to the dataset used from Event Registry. Automatic summarisation has gained a surprising amount of interest, and in this regard the University of Edinburgh and Priberam are actively researching novel Neural Network models for extractive, compressive and abstractive summarisation. The University of Edinburgh have been deploying the-state-of-the-art machine translation models, which won nine tasks in the constrained news category at the WMT conference in 2017.

From inside the BBC, we have gained a huge amount of value from learning how these technologies work in a real-world media monitoring situation. We have had an opportunity to learn how accurate these technologies really need to be to support, rather than hinder, journalists. Often developments in this space exploit one specific language technology, but SUMMA has allowed us to assess how these technologies work together in a pipeline.

PW: Are these results available for exploitation by third parties? If yes, what is the procedure to follow?

AS: The majority of the individual technologies which combine to create the platform will be available as fully open source or with a permissive license. Ultimately, SUMMA is committed to making an end-to-end platform available under a non-commercial license. We are pulling this together right now and it will be publicly available before the end of January 2019. Details will appear on our twitter (@SummaEU) and website in due course.

PW: The project is organising an event on 20th – 22nd November in Bonn, Germany. What do you expect from this event? Who should participate?

AS: The event on 20th November is the project’s final “User Day” but in a departure from the previous User Days, we’re co-locating it with two “Hands On” days on the 21st and 22nd November at Deutsche Welle in Bonn. The purpose of the day is to explore taking the results of SUMMA and transferring them to commercial, industrial, and public service users

We have organised a couple of these User Days previously during the project, they are workshop style days where anyone interested in SUMMA and the underlying technologies can attend talks, poster sessions and panel sessions to find out more about the research and development work being done by the project. As this is our last User Day, it gives us a great opportunity to summarise the research outcomes and demonstrate the final product.

Given SUMMA is now almost complete, we are now at a stage where we’re inviting people to come along to hands-on sessions where they can try the platform out for themselves in an informal environment but with guidance on hand.

The user days have previously attracted an extremely diverse audience, from researchers interested in the details of the individual language technologies, to strategists on the lookout for a media monitoring solution for their respective businesses. We try and cater for everyone. Registration details will be available on the project website.

PW: Your project is soon coming to an end. What do you see as next steps, remaining / new challenges?

AS: First and foremost we are finalising the open source platform. In terms of outcomes, this is very important to the project as it will allow the greatest number of people to access the technologies which have been created. Whilst SUMMA was primarily a research project, the European Commission is also keen that we examine the commercial viability of running the SUMMA platform, so we are doing that too, whilst a couple of partners are looking into the possibility of setting up spin-off companies.

For the BBC News and Deutsche Welle, we are now looking back at the evaluations of the platform and asking “other than the original use case of media monitoring, what can we use these language technologies for in the wider newsroom?”. Working on SUMMA has given us a real opportunity to understand how some of these technologies may complement our existing workflows and enable new areas for innovation. I am starting a couple of these types of projects within BBC News Labs around fact checking/question answering and translation, and expect they will continue well into 2019. Deutsche Welle is experimenting in a similar manner, most notably with their news.bridge project.

We recently received the news that a follow-on Horizon 2020 funded project has been funded. Called GoURMET (Global Under-Resourced Media Translation), the three-year project will investigate how deep neural networks can be most effectively used to power machine-assisted translation for low resource languages. The SUMMA project has allowed us to start understanding the potential uses of machine assisted translation within BBC News and World Service, and this project will allow us to continue to prototype and innovate in this area. It is great that we will retain the existing partnerships with the University of Edinburgh and Deutsche Welle and look forward to working with the Universities of Amsterdam and Alicante.