Pages

Tuesday, March 10, 2026

Findmypast announces major expansion of digitisation studio to accelerate access to British, Irish and Commonwealth newspapers


 https://www.findmypast.com/search-newspapers

We are currently witnessing a seismic shift in how we access these micro-histories. Findmypast, in its enduring partnership with the British Library, has reached a staggering milestone: 100 million digitized newspaper pages. 

This archive is a monument to long-term preservation ethics and collaborative vision. Spanning the years 1699 to 2012, the collection now encompasses over 2,700 historical titles and more than 7 million individual issues. This scale fundamentally democratizes history, moving the researcher away from the logistical hurdles of niche physical archives and into a centralized digital ecosystem where four centuries of human experience are searchable in seconds.

The milestone is the fruition of a partnership initiated in 2011 (though the groundwork for such collaboration spans over 15 years) between the British Library and DC Thomson History, the parent company of Findmypast. Lee Wilkinson, Managing Director of DC Thomson History, reflects on the weight of this achievement:

"Reaching 100 million digitised pages was a major milestone but it was also a reminder of just how many stories remain fragile, scattered, or inaccessible. This expansion is about meeting that responsibility. Increasing our capacity means widening the lens through which people can understand their past."

The work of a digital archivist is never truly finished; the milestone is simply a foundation for further expansion. Findmypast has announced a major investment to nearly double the capacity of its digitization studio located at the British Library’s Boston Spa site in Yorkshire. This expansion is not merely about "more" data; it is about the systematic recovery of records that have long been at risk.

Over the next three to four years, the studio is projected to increase its output by over 60% in paper titles and more than 80% in microfilm titles. This focus on microfilm is particularly significant. Once considered the "redundancy of the 20th century," microfilm is now a vital bridge to the 21st, preserving content from fragile originals that are often too degraded for frequent handling. By doubling down on these "under-utilised" resources, the project ensures that the long-tail research of the future is supported by a stable, digital surrogate.

The most profound shift in this new phase of digitization is the commitment to inclusivity. For too long, digital archives have been dominated by Western-centric narratives, leaving vast regions "digitisation-silent." The increased capacity at Boston Spa is specifically earmarked for under-represented Commonwealth and South Asian newspapers.

For the first time, titles from Bangladesh and Sri Lanka will be brought into public access, offering a critical counter-narrative to colonial reports. By digitizing local community and nationalist newspapers, the archive captures South Asian voices directly, providing a primary source for understanding social and political change from the perspective of those who lived it.

"Digitising them will help build a more equitable, accessible historical record for South Asian diaspora communities in the UK, North America and beyond, ensuring that more families can connect with the stories that shaped them."

To process millions of pages with varying typography and centuries of wear, the archivists at Boston Spa employ what I like to call "digital sorcery." The pipeline begins with high-resolution scanning that captures the minute details of the physical artifact, followed by sophisticated Optical Character Recognition (OCR) to extract text data.

However, the real breakthrough lies in Findmypast’s proprietary machine learning technology. It doesn't just "read" words; it performs entity recognition. By identifying specific data points like names, locations, and dates, the system enables researchers to move beyond basic keyword hunting toward complex phrase searching.

This technology bridges the gap between the messy, ink-stained reality of 18th-century printing and the precise demands of modern data science. It allows the system to understand the context of a page, ensuring that even when the original print is faded or the font is archaic, the "human story" remains discoverable.

The expansion carries significant weight for the academic sector, specifically through The Social History Archive. As universities globally accelerate research into the processes of colonisation and decolonisation, the need for diverse primary sources has never been greater.

The inclusion of English-language publications shaped by colonial transitions provides scholars with a window into regions where civil records—such as census data or vital statistics—may be incomplete, inaccessible, or non-existent. These newspapers serve as a surrogate for the state, recording the ordinary rhythms and tensions of communities during periods of immense upheaval. This transition from "genealogy" to "historiography" allows the archive to meet a growing demand for a more globally relevant historical record.

The partnership between Findmypast and the British Library is more than a commercial venture; it is an act of collective memory. By widening the lens and actively seeking out the voices of the under-represented, they are ensuring that the digital record of the future is as diverse as the reality of the past.

The above post is based on a press release. 

No comments:

Post a Comment