The Fine Art of Archiving Emails with PDF/A
The unique qualities of PDF/A, such as full-text searching, have made it the format of choice for long-term archiving in today’s market. Most source formats can be converted to PDF/A, and saving in PDF/A eliminates the need to work with any number of different formats. That means just one application – one viewer – is needed in order to display all documents. Many businesses have already recognised these advantages and use PDF/A for archive migration, for example. With the right approach, and powerful tools, emails and their attachments can also be archived in PDF/A. This can close the final gap, allowing you to live the “everything to PDF/A” dream.
The chief problem in email archiving has to be that users are confronted with a whole menagerie of different formats. There’s the email itself, of course, but it’s the attachments more than anything else: images, scanned documents and of course PDF, Word and Excel files – and that’s just the start. Just saving emails and their attachments won’t do if you want them to still be readable in 30 years’ time, say – and without the original software and hardware.
Strategies for email archiving
A prerequisite for reliable email archiving is creating the proper interfaces between your email and archiving systems, for conversion to PDF/A. You’ll also need to use the right PDF/A conversion and validation tools on the market.
Once these requirements are met, it’s time to put it all into action. There are a number of options available which differ in the level of automation, the user’s involvement and the QA stage.
Server-side conversion provides the greatest level of automation. Here, all emails - including any attachments - are converted to PDF/A. This option requires that the tools you use can handle a large volume of emails. Experience has shown that, depending on the application, 95 percent or even over 99 percent of files can be converted automatically. You’ll need to define rules for how to handle the few exceptions, such as videos or signed attachments and emails. You’ll also need a sophisticated error handling system, so as not to put the automation process at risk.
There are two approaches to client-side conversion: either having the client itself handle conversion, or having the client initiate the conversion and actually running it on the server. Either scenario can be done with or without QA.
For direct client-side conversion, the user manually starts a conversion engine, implemented into their email programme as an add-in. The back end of this add-in then passes the PDF/A file on to the archive interface. Any errors in conversion are corrected by the user themselves. This process is very processor-intensive, meaning low-performance desktops can be put under significant strain. This means in turn that users often have to wait until the process is complete. Any QA further down the line is done with a GUI module, with which the user visually checks the PDF/A files and approves them for archiving. This avoids potential errors, but increases the workload placed on the user. A basic disadvantage of client-side conversion is that the PDF/A files are generated individually, making it hard to control for conformity. One solution is to carry out downstream validation on the server, but this does still create something of an overhead.
Alternatively, the user can initiate conversion through the client, again using an add-in, and then assign the job to the server. A “quick test” should be run beforehand to check that the material can in fact be converted. As with a purely server-side scenario, this quickly and centrally converts emails using powerful, scalable tools. The user can then carry out a quality check, too. As the conversion process is very complex and thus time-intensive, arrangements should be made for asynchronous processing so that the user can receive a message when conversion is completed.
Server-side conversion is the best option for most projects. It’s highly scalable and runs in a reliable environment, which means you don’t need to check documents individually for PDF/A conformance. When choosing a product, you should make sure that your conversion tool of choice can do all of these things:
- • Logging each step of processing (e.g. according to GoBS)
- • Quick test for incoming material
- • Assigning incoming material to the correct stage of PDF/A processing
- • Converting PDF to PDF/A
- • Compressing and converting images and scanned documents, performing OCR where applicable
- • Validating incoming PDF/A files
- • Extracting from attachments, if necessary (ZIP, 7-ZIP, rar etc.)
- • Converting born-digital documents (Word, Excel, email header/body etc.) to PDF/A
- • Configurable settings for handling exceptions and errors, e.g. videos or signed material
- • Interface with email and DMS/ECM/BPM systems
Users often employ creation and validation tools from a variety of suppliers, despite all the advice against it. It’s important to seek out expert advice when choosing and integrating PDF/A tools, and a particularly good source for this is the PDF Association. The PDF/A Competence Centre exists within this organisation as a place to meet a wide range of PDF/A experts.
A look into the future
The soon-to-be-published third PDF/A standard offers a feature that will make it more practical than ever to archive emails in PDF/A: integrating non-compliant documents, such as the original email in MSG format, into a PDF/A file. This means a single file holds both the long-term PDF/A document and the native-format file, making both of them centrally accessible.
About the author:
Carsten Heiermann is a shareholder and Executive Director of the LuraTech Group based in Berlin, Germany, with an office in Redwood City, USA as well. Since its foundation in 1995, the firm has been a leading provider of open and ISO-standards-based document and image compression solutions. Among others, these include its successful PDF, PDF/A and JPEG2000 products.