Blog

Blog posts are written by project team members. Topics range from conferences we attend, musings on current affairs of relevance, internal project findings and news and more succinct content which can be found in our Digital Humanities Case studies or project related publications. Blog posts will mainly be posted in English but will from time to time feature in the language of the project team member’s preference, since we are a multilingual bunch! Happy reading!

 

The NewsEye Project: An Overview

NewsEye: A Digital Investigator for Historical Newspapers, which ran from May 2018 to January 2022, was a project which aimed to change the way European digital heritage data is (re)searched, accessed, used and analysed. In this last article of the NewsEye blog, we look back on 45 months of collaborative work.

By building on one of the largest and most significant digital collections of cultural heritage in Europe, the core NewsEye objective was to deliver innovative tools and services to significantly improve the way historical newspapers can be accessed, explored and analysed, ensuring widespread use and a large impact. The project aimed to create a valuable, inexpensive, and immediately useful NewsEye toolbox for assisting users of all types. The developed toolbox is composed of four main layers, each providing advanced techniques and tools for:

  • Text Recognition and Article Separation, aiming to extract the layout of newspapers (e.g., articles and graphical regions) from digitised newspapers and to transform the content to textual format, providing full articles (including titles, dates and full articles text) through automatic layout analysis, text recognition and article separation. 
  • Semantic Text Enrichment, aiming to enhance the utility of the newspaper collections by enriching the texts with higher-level semantic annotation using named-entity recognition. Extracted named entities are linked to external references (such as Wikipedia) across languages, with the goal to support multilingual analysis. This layer also ensures keyword and event detection, as support for pattern discovery from textual contents. 
  • Dynamic Text Analysis, aiming to provide tools to exploit the enriched data for a more elaborate analysis of user-selected newspaper content, supporting interactive queries to discover different viewpoints, sub-topics or trends concerning the selected topic, named entity, newspaper, timeframe or other category, so as to provide insights into the newspaper collection in contextualised and comparative manners. 
  • Intelligent analysis and reporting (‘Personalised Research Assistant’), aiming to provide an alternative, ‘intelligent’ interface to the other tools and the data, carrying out iterative cycles of analysis and reporting to the user in natural language. The user should be able to authorise the Personal Research Assistant to investigate a given topic (or time window or newspaper, etc.) on the user’s behalf, and the Assistant will report back on findings which it assesses as potentially interesting for the user, together with a rationale for how they were found and why they might be interesting, all in natural language and in a transparent manner so the findings can be understood and verified by the user. Given the European context, we were able not only to analyse newspapers written in multiple languages but also to report on the findings in multiple languages; to this end, the Assistant uses multilingual natural language generation (NLG) to produce textual descriptions of the results obtained by the Investigator. In NewsEye, a special focus was made on French, German, Finnish and Swedish (as in the newspaper collections), and English as the common project language. 

While the intent of the NewsEye project was to produce tools that are operational on any newspaper collection, the document collection used to demonstrate the utility of the developed tools was provided by the three national libraries involved in the project: the Austrian National Library, the National Library of France and the National Library of Finland. These included 15 million pages (11.5 million by the Austrian National Library, 2.17 million by the National Library of France and 2.5 million by the National Library of Finland).

The following titles were fully processed by the NewsEye toolbox:

  • From the Austrian National Library: Arbeiter Zeitung, Illustrierte Kronen Zeitung, Neue Frei Presse, Innsbrucker Nachrichten
  • From the National Library of France: L’Oeuvre, La Fronde, La Presse, Le Gaulois, Le Matin, Marie Claire, Candide, The New York Herald
  • From the National Library of Finland: Aura, Helsingin Sanomat, Paivalehti, Sanomia Turusta, Suometar, Uusi Aura, Uusi Suometar, Abo Underrattelser, Hufvudstadsbladet, Vastra Finland

Six universities were also involved in the project in various capacities: La Rochelle University (France), the University of Helsinki (Finland), the University of Innsbruck (Austria), the University of Montpellier (France), the University of Rostock (Germany) and the University of Vienna (Austria).

The many innovative results created include:

This work was also complemented by the organisation of an International Conference and a seminar in March 2021, along with two User Workshops in April 2021 and December 2021, and numerous other events. Throughout the project’s duration,  members of the consortium each provided unique skill sets, making the NewsEye project a truly international and interdisciplinary success.