Does Size Matter…?

…said the Analyst to the EDRMS

Well, of course it does.

When considering a document scanning project, there are a plethora of technical settings which need to be examined.  I am constantly asked what impact scan resolution has on the file size and quality. For the purposes of this article I will focus exclusively on these two aspects and address other considerations in subsequent articles.  (The next article reviews what impact compression has on the image quality).

When scanning files, there is often a play-off between getting a document scanned at the highest possible resolution to provide the best visual quality, yet keeping the file size manageable.

File Size

Firstly, let’s start off by looking at how file size impacts usability.  To put it in unscientific terms, the faster an image appears on the screen – the better and in this context, smaller files open faster than larger files.  Conversely, frustration levels soar, if a user has to wait tens of seconds before an image appears (and even longer for larger files).  This problem may be exacerbated if a user is accessing a hosted solution rather than accessing files on their own computer.

Image Quality

On the other side of this play-off is; image quality.  The rule of thumb is: The higher the resolution – the better the quality. (In reality though, there are a number of examples where this is not so, but this too will be addressed in a subsequent article).

The State Archives lists in its recommendations to have archive scanning performed at 600 PPI (Pixels per inch).  Note: this is a recommendation and not a standard.  The guidelines go on to suggest the resolution can be adjusted to ensure the image is “fit for purpose”.  This suggests the resolution can be adjusted to the appropriate level for the particular document type and circumstances)

The best way to illustrate the effect resolution has on Image Size, when document scanning, is via an example.  I have taken a typical single A4 page of content, scanned it at varying resolutions (both Black & White and Colour). The relative sizes of the documents are listed in the table below:

Resolution (PPI)Uncompressed
Black and White Size (KB)
Colour Size (KB)
20047411 312
3001 06625 380
4001 89245 162
6004 257101 597

This clearly demonstrates that file size almost doubles every time the resolution increases by 100 PPI.  A typical multi-page PDF document consisting of 30 – 40 double sided pages therefore varies quite significantly in size (even when compression is factored in) when comparing a low, to a high resolution scan.  Not only will accessing a large file present frustration, you may well have the IT department up in arms over significant storage space requirements and network traffic bottlenecks.

Output Quality

Then next aspect of the document scanning process to review is the quality of the output and compare differences.  I have taken a screen shot of the same snippet of the document for the resolutions: 200, 300 and 600 PPI respectively (Don’t be concerned about the content.  The snippets of each document are merely to demonstrate the relative quality of result).  I have specifically used a document containing a pattern as it is where the patterns intersect where changing quality is best observed.

Resolution (PPI)B&W ImageColour Image
200200PPI B&W Sample Image200PPI Colour Sample Image
300300PPI B&W Sample Image300PPI Colour Sample Image
600600PPI B&W Sample Image600PPI Colour Sample Image

From the samples, we can see that a high resolution scan creates a crisper clearer image.  This difference is best noted between 200 PPI and the others, but  is less obvious to the naked eye between 300PPI and 600PPI.  On face value, it seems that the slightly higher quality we get with the 600 PPI image, may not necessarily be sufficient to justify creating files which are more than four times the size of the 300 PPI image.  For colour images the impact is amplified.

Fit For Purpose

Now we know the impact resolution has on file size and image quality, the next action item is to define how “Fit for Purpose” applies to your document scanning project. [Refer to page 6 of the Digitisation Disposal Policy –  Queensland State Archives]

The best place to start answering this question is to examine the reason for initiating a document scanning project and the types of records involved.  This has the greatest impact on resolution settings.  If for example, you are bulk scanning financial documents, (Invoices etc.) which only have a retention period of 7 years with no real requirement above being a legible representation of the original, then scanning at 200 – 300 PPI Black & White, may well be sufficient.  This produces a usable image where the content can be read with confidence.  If the same images are intended to be OCR’d for data capture, then you would not go below 300 PPI as going under that would negatively impact the result.

If however, you are imaging a legal document (say), where the expectation is for the image to be as close a representation to the original as possible and the smallest detail is clearly visible, it follows that higher resolution colour may be needed. Even still, you would have to think that 600 PPI would be overkill.  An alternative approach would be to step up to 400 PPI, if there are compliance concerns regarding 300 PPI.

There is a concept called “Point of Diminishing Returns”. There comes a point with resolution where the higher the resolution scans, only makes a marginal difference to quality.  For the example used above, if the document had been scanned to 1200 PPI colour, the increased quality would be minimal but the price paid for file size would be dramatic. Note:  A situation where higher resolutions do make a difference is when scaling up the image, such as a photographic negative that is to be enlarged.  For this article, I am focussing on 1:1 scale document scanning.

Given that each project requires specific considerations around file size and resolution, it is difficult to make hard and fast recommendations to cover all scenarios. This article rather highlights the factors which need to be considered when document scanning.  More often than not, we get asked to perform bulk scanning on documents at 300 PPI (either B&W or Auto-Colour) as this provides a good balance between the size and the quality of output.

Measuring tape picture