Backfiles Re-scanning Project

Printer-friendly version

Elsevier was very much a pioneer when making historical publications available online. The so-called Backfiles collections were first launched on ScienceDirect as early as 2001 with all the Chemistry content. A few years later customers began drawing attention to low-quality images. To rectify the situation a project team realised the analysis of 19 million suspected papers and replacement of 600,000 pages throughout the collections.

Plasma cell in old rat lymph node:

Before After
Before After

Background

The ScienceDirect Backfiles collections represent the largest digitization project of its kind...

The ScienceDirect Backfiles collections represent the largest digitization project of its kind in the world of scientific, technical and medical literature (STM). When initiated in 2000, resolution levels used were at 300 dpi, and produced black and white, which for text is perfectly good but less so for images. To include grey scale and colour would have been impossible in those days as storage requirements would have been astronomical and rendering far too slow. As a consequence, many of the scanned images were of lower quality than we would have liked. From 2004, technological improvements allowed for the inclusion of higher quality images complete with the full use of colour and grey scale which no longer compromised performance or storage requirements.

In addition, many of the older journal articles were scanned from microfiche, which was of fairly low quality even in the 1970s, let alone pre 1950s.

Since then, we have sourced print copies and rescanned and replaced many low-quality images, based on feedback from users. In some cases we have simply rescanned the entire journals e.g., Brain Research, Neuroscience and Icarus. As the problem was more widespread than initially thought a larger scale solution was warranted; the image re-scanning project was born.

Image re-scanning project

Obviously, finding a minority of lower quality images in such a large volume of content available on ScienceDirect would be too time-consuming as well as costly. Therefore, we developed sophisticated algorithms to automate the recognition of low-quality images in PDFs. With our suppliers we worked on rescanning and replacing these pages.

Now, almost two years later 19 million pages have been analyzed and 600,000 pages containing low quality images have been rescanned. Users of the Backfiles collections can now reap the benefits of these improvements at no extra cost to the library purchasers.

Although more than 99% of the detected pages have been rescanned there are still images we have not been able to replace due to missing source files or a lack of high-quality source files. A list of these missing high-quality sources can be found here. In the meantime, we are cooperating with selected universities and libraries to find these sources.

Note: All form fields are required.

In which sector do you work?