As the Europeana Newspapers Project is slowly coming to a close, we want to take a closer look at its achievements and the Austrian National Library’s (ONB) role in the project.
So what is it all about?
The Europeana Newspapers Project (ENP) is funded by the European Commission’s Competitiveness and Innovation Framework Programme (CIP) and brings together 18 project partners, 11 associated partners and 22 networking partners from all over Europe. The ambitious goal of the project has been to convert about 10 million digitised, historical newspaper pages to full text and make the content easily accessible via a content browser developed by The European Library. Since metadata to an additional 8 million pages is also being made available by the project, a total of 18 million historic newspaper pages are being aggregated by Europeana and The European Library. Thus efficiency and precision will be increased and the user experience significantly improved. In short, the Europeana Newspapers Project allows for ‘a European view’ of historical events, utilising full-text search.
Click on the image for a short video visualising the project
Technology and content
The Europeana Newspapers Project combines technical aspects with content related issues. For this reason, the project consists of both technical and content providing partners. When it comes to technology, there are three aspects of the project: Optical Character Recognition (OCR), Optical Layout Recognition (OLR) and Named Entity Recognition (NER).
OCR is the technology that converts digital images of historical newspapers into full-text. Optical Layout Recognition is applied on a smaller scale and is used to capture the layout of the text. The structure of about two million newspaper pages has been tagged using OLR in order to differentiate between headlines, subtitles and an article’s content. As a result, the full-text has been enriched with relevant metadata. In addition, resources for Named Entity Recognition have been produced for Dutch, German and French, thus enabling searches of person and geographic names in several European languages.
Our role within the project
The Austrian National Library is one of the largest content providers for the project. We are providing about 1.6 million newspaper pages that have been converted into full-text, selected from our holdings dated before 1876. This threshold was deliberately chosen in order to provide material that is free of any copyright restrictions. We are also sharing metadata to an additional 5.5 million pages to be ingested into The European Library and Europeana.
On 16 October 2014, the ONB hosted an Information Day not only disseminating the goals and current state of the project but also showcasing the wider context of newspaper digitisation. The event was centred around the theme “digitized, historical newspapers as a source”, thus allowing for the combination of talks given by project members with presentations on more practical issues such as how to use this source in historical research. You can find a summary of the event and links to the presentations here.
What to do now?
The content browser developed by The European Library is online and ready to be used. In different blog entries, we present research carried out using the content browser to show a European perspective of different historical events. Why not try it out yourself? To get started, all you need to do is follow this link: http://www.theeuropeanlibrary.org/tel4/newspapers/!
Author: Martin Schaller