institution: British Library
category: Exhibition and Collection Extension
Six million objects. One new website. Just seven weeks
The Endangered Archives Programme (EAP) contributes to the preservation of archival material that is in danger of destruction, neglect or physical deterioration worldwide.
The Programme has funded over 320 projects in 80 countries around the world.
Delivered by the British Library, and funded by Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin, EAP supports preservation of important, at-risk collections of photographs, documents, manuscripts and other items from around the world; it facilitates digital capture of these items; and shares over six million images online using their new website built by Cogapp.
Goals for the online archive project
- Display high-resolution, zoomable versions of all six million images using the open source Universal Viewer and leveraging IIIF (International Image Interoperability Framework)
- Bring the site into the British Library brand
- Make the site accessible across devices
- Improved user experience, including powerful search
- Increased stability and scalability
- Rapid delivery, with less than two months from project start to launch.
The images on the old site were medium resolution and not zoomable, making some portions of text illegible. For the new site, we made the images high-res, flexible and dynamic using the IIIF Image API.
Content management, search and systems
Content is editable using Drupal, an open source content management system.
Search is powered by Solr on top of a similar Harvester/Mill system to those used on the Clyfford Still Museum Online Collection, the Qatar Digital Library, Yiddish Book Center and other Cogapp projects.
Our system harvests from the British Library’s internal archive management system, then the Mill processes it into a format ready to ingest into Apache Solr. Once in Solr metadata becomes indexed and searchable, with additional features like faceted search filters.
The Endangered Archives Programme has been running for over a decade, and has amassed over six million images. The previous EAP site could not process all of the images, meaning some archives were unavailable to users. The scale of some archives regularly caused the site to crash, requiring effort from Library staff to rectify.
The new site is built on solid, scalable infrastructure enabling all of the images to be presented together online for the first time, and with the ability to add many, many more as new digitisation projects are commissioned.
The Endangered Archives Programme has over 300,000 archive data records, and we imported all of these into Apache Solr to allow for rapid searching, filtering and retrieving of this data.
We used the Drupal CMS to allow the library to add and edit details for the hundreds of individual projects that contribute these archives, making sure to account for the different languages and scripts used around the world.
To deliver high-quality imagery, we used high-resolution TIFF-format master images that the library has as preservation copies, but with more than six million images, this equated to over 200TB of data! We needed a way to transfer this data quickly and efficiently from British Library storage to the new website servers.
We recommended using the Amazon Snowball service, which bypasses the internet, making the data transfer cost-effective and significantly faster (even for an organisation like the British Library with incredible broadband).
We then created a system that automatically detects a new TIFF image upload and converts it to JPEG2000 format suitable for serving web-friendly JPEG images dynamically using the IIPImage service.
Our system categorises the images by the archival file and project that they belonged to, as well as extracting their height and width, and stored all this information in Apache Solr.
The final system automatically scales up and processes images as fast as they can be imported to AWS: a rate of around eight images per second for direct import from the Snowball appliance.
All these systems are running on Amazon Web Services (AWS) infrastructure so that we can quickly scale the system to meet current and future needs.