LuraTech's Blog

PDF/A-2

PDF/A-2 for scanned documents

Question:
What extended features does PDF/A provide for scanned documents?

Answer:
The further part of the standard PDF/A-2 is based on the PDF standard ISO 32000, which in turn draws on PDF 1.7 (which corresponds to Acrobat 8).

Five functions can make the use of PDF/A interesting for scanning applications:

1. JPEG2000
JPEG2000 was introduced with PDF 1.5 and consequently was not included with PDF/A-1 for reasons of time. It offers lossless compression which is often used for “digital originals” in libraries, for example. Now this lossless file can be embedded in PDF/A, with the advantage that OCR can be performed and metadata can also be embedded in the file in a manner compliant with standards.

JPEG2000 with lossy compression is interesting for all documents which are processed with the LuraTech layers method (MRC or mixed raster content). About 10 to 20% greater compression is achieved compared to PDF/A-1. The quality of image portions improves particularly with the new process.

2. Layers vs. optional content
This function is generally interesting in PDF/A-2 for cases such as multilingual documents with languages which can then be switched or when construction plans are to be shown first in an overview with details revealed afterward.

With LuraTech PDF/A files, the three layers of the MRC process can be shown and hidden for this. This is helpful, for example, if you only want to show the black-and-white layer of a color document or use B/W and text colors for printing a file without the interference of the background.

3. PDF/A-2u (U for Unicode)
PDF/A-1 has conformance levels 1a and 1b. For scanned documents (and output applications as well), 1b is usually used, because the requirements of 1a for tagging can only be met with considerable manual effort, which is not economically viable for bulk scanning.

PDF/A-2 also has levels 2a and 2b, but an intermediate level (PDF/A-2u) was introduced to be able to take advantage of Unicode.

For scanned documents with OCR full text search capability, Unicode helps achieve reliable text extraction and improved searching. It is expected that the 2u level will be the predominant choice when using PDF/A-2.

4. Portfolios bzw. collections
Here PDF/A-2 generally offers the option to embed PDF/A files in PDF/A. A classic example of an application for this is converting e-mail messages to PDF/A; with collections the e-mail and its attachments can be put together in a single logical file.

Collections ca be helpful for scanned documents, for example when individual pages of incoming mail must be scanned and digitally signed. Subsequent resorting is possible in processing without invalidating the signature.

5. Greater page size
PDF/A-1 has a page size limit of about 5 m by 5 m. PDF/A-2 increases this to 381 km by 381 km.

For ordinary business documents of standard letter size that doesn’t matter, but with large format scans or very long documents the old limit was often reached. This enables geographic applications to save documents in 1:1 scale, which enables measurements to be made in the digitized map or diagram.

As a leading provider of PDF/A technology, LuraTech introduced a “PDF/A-2 ready” version of its LuraDocument PDF Compressor at the PDF/A conference in Rome. Thus shortly after the publication of the PDF/A-2 standard by the ISO, the company will provide its customers with a release version able to handle the new formats.

DocYard

Implementation of DocYard

Question:
What will happen with my investments I have already done in my company with the implementation of DocYard?

Answer:
The investments you have already taken won´t be lost. Far from it! Because of the modular architecture DocYard can integrate and optimize existing components with very little effort. Once integrated in this platform the modules can be controlled and managed in a centralized way, and can be used as a complete system. Investments already made in systems and components like OCR solutions for instance are protected and given added value by the integration in DocYard.

 

DocYard

Advantage of LuraTech’s DocYard

Question:
What is the advantage of LuraTech’s DocYard against other products? What is so special about this platform?

Answer:
DocYard is LuraTech’s new and comprehensive platform for managing custom document conversion workflows, in which all process steps can be integrated. DocYard enables companies and organizations to create a production environment for document processing that can be centrally managed. We deliver not only high technical functionality, but also excellent support. We guide you through your complete project as long as you need us. We design a comprehensive workflow for this which users can implement or develop further as required, because DocYard is not only highly scalable but also completely flexible with its modular system architecture.

Customers who already work with DocYard commend first of all the exact monitoring and controlling of jobs, made possible with the file manager and the fast, accurate OCR engine. Jobs which in the past had to be performed manual are fully automated with DocYard

DocYard gives users and providers the possibility to take all their various individual tools and forge them into a complete solution for all requirements. Flexible license models mean that it pays to use this solution regardless of whether the volumes to process are large or small.

DocYard the integration platform

Production-level document conversion workflows

Question:
LuraTech's new DocYard product is an integration platform - how does
my scan service bureau benefit from such a product?

Answer:
DocYard focuses on production-level document conversion
workflows. And this platform really focuses on integration - you
can keep existing tools and integrate them into DocYard in the shape
of DocYard Modules. In particular, this means that you do not have to
unleash a huge migration project, plan an enterprise-wide rollout
top-down and then turn your whole organization inside out.

Instead you can start small, implementing one workflow or even a part
of one workflow and then expand over time in a stepwise
approach. Still, you immediately benefit from DocYard's unified
management and reporting and its parallel processing support. Plus
you can avoid unecessary manual or script-based copying of files
- the DocYard infrastructure moves the data around for you.

Finally, DocYard offers menas of integrating manual processes with
automated ones. This lets you seamlessly combine e.g. manual indexing
or QC tasks with fully automated processes, such as compression or OCR.

To top

New PDF/A standard?

Question:
I have been told that there is a new version of the PDF/A standard coming soon. What does this mean for me?  

Answer:
The first important message is that the new part PDF/A-2 will not substitute or ‘fix’ the current one. PDF/A-1 will remain available as an independent, valid standard. All existing PDF/A-1 documents and those that will be created in future are perfectly well suited for long-term archiving. That said, why do we need a part of the standard then? The PDF format is constantly being enhanced and improved. The current version of PDF/A is based on the PDF specification 1.4. But in the meantime the PDF specification reached version 1.7 and has even been published as an ISO standard itself (ISO 32000-1).  Since PDF 1.4 there were added numbers of new features to the PDF format, and some of these are also useful for long-term archiving. So PDF/A-2 will be based on the new PDF standard. The new features in PDF/A-2 cover document collections, metadata, image formats, transparency, among other things. The most important aspect for LuraTech customers will be the new support for JPEG 2000 in PDF/A-2. Now highly compressed PDF/A documents will be possible with the same great visual quality and small file size as it could so far only be gained for standard PDF output. LuraTech is actively involved in the development of the PDF/A standard. So our customers will always be among the first who can benefit from the new possibilities.

The Top Ten Myths about PDF/A

With the increasing spread of PDF/A as the ISO standard for long-term archiving, unfortunately a few misunderstandings have been popularized as well. After nearly four years, some DMS providers still seem intent on “riding out” the PDF wave. But as I see it, the saying applies that “you snooze, you lose!”

Myth #1: TIFF is secured against tampering, PDF and PDF/A are not

This assertion is clearly incorrect. There is no document format which is inherently secured against alteration and compliant with auditing requirements. A TIFF-file can be modified with simple tools just like a PDF/A document or any other format. “Inalterability” of documents can only be achieved using a signature. If files must be archived in compliance with auditing requirements, then a system or process is necessary to ensure protection against changes.

Myth #2: PDF is a standard from one provider, TIFF is a disclosed standard

Yes and no.  TIFF is a de facto industry “standard”, but it has never been standardized by an international standards organization such as ISO or DIN. Both PDF itself (ISO 32000) and PDF/A (ISO 19005) are disclosed ISO standards and are thus not only de facto but also de jure standards.

Myth #3: PDF/A does not support signatures

On the contrary.  PDF/A even permits embedded signatures – including qualified electronic signatures. To do this, the signature provider must simply apply the product in a PDF/A-compliant manner, but there are still some signature providers who have not yet accomplished this with their products.

Myth #4: PDF/A does not support compression

Wrong. PDF/A permits all common compression methods to be used, such as JBIG2, JPEG, etc. The exception is LZW, where at the time of the standardization patents were still in force. For these reasons of time, JPEG2000 was not incorporated in the PDF/A-1 standard, but it will be covered in the new version (PDF/A-2).

Myth #5: PDF/A does not allow OCR for scanned documents

Wrong. OCR is possible in both PDF/A-1b as well as PDF/A-1a, of course. A minor point – perhaps the cause of the confusion - is the exception that this invisible font does not have to be embedded.

Myth #6: PDF/A files are too large due to font embedding

Yes and no.  It is true that fonts (except for OCR) must be embedded. Based on practical experience, this is only a problem in the particular application area for bulk outgoing mail. In this regard, one can apply font reduction and subsetting or pragmatically omit font embedding in a solution tailored to individual company needs. These files are then no longer PDF/A-compliant in a strict sense. However, except for the deliberate exception they retain all the advantages of PDF/A.

Myth #7: PDF/A does not support metadata

On the contrary. XMP particularly facilitates standardized metadata in PDF/A. Metadata can be managed in the surrounding systems as before. An advantage of PDF/A is that these data can also be embedded inseparably in the document.

Myth #8: PDF/A is not supported by DMS systems

Yes and no.  Simply put, an ECM system which can handle PDF can also support PDF/A well. However, (unfortunately) there are still a number of DMS providers wedded to their outmoded TIFF viewers, and that can sometimes be a stumbling block in practice.

Myth #9: PDF/A is only supported by a small group of local German providers

Not at all! It is certainly true that PDF/A was first accepted in German-speaking countries – and that the PDF/A Competence Center originated in Germany. However, in the meantime many countries and industries recommend PDF/A or even require it by statute. Moreover, the PDF/A Competence Center now has over 100 members from about 20 countries!

Myth #10 PDF/A is expensive!

Yes and no.  Of course the deployment of PDF/A tools requires an initial investment. Sometimes the ROI from highly compressed PDF/A files can be calculated within a few months even without an Excel spreadsheet, for example with the Sparkasse savings banks.  But that is perhaps more of an exception. The problem here is assessing the benefits: how much is it worth if unifying formats saves training time and expense as well as viewer license fees. And when fewer migrations are necessary in the future? And last but not least, how do you place a value on a “good” archive thanks to standardized PDF/A files?

Thomas Zellmann is an executive board member of the PDF/A Competence Center

Convert to PDF/A and OCR?

Question: 
I have hundreds of boxes of documents that contain information I am required to store for at least ten years. I understand the best format to archive these documents is PDF/A? Will your PDF Compressor Enterprise output to PDF/A and make these documents full-text searchable?

Answer: 
Yes, PDF/A is the best format for long-term archiving (defined by ISO 19005-1:2005). This standard offers assurance that archived documents will maintain their appearance and readability regardless of which applications and systems were used to create them. And yes, the PDF Compressor Enterprise has an integrated ABBYY FineReader OCR engine and with this tool you can create full-text searchable PDF/A documents in one pass. Additionally, the PDF Compressor applies award-winning mixed raster content (MRC) compression technology and therefore you will save on storage costs with smaller file sizes!

Learn more about PDF/A at www.pdfa.org

Click here to download a trial of PDF Compressor

Estimate for Scan Project

Question:
We are a scan service provider and we’d like to offer your PDF Compressor Enterprise to one of our customers. We’d like to calculate the time needed to complete their project before we finalize the deal. What information is needed in order to calculate how long it will take to compress and convert all of their documents to PDF/A with the PDF Compressor Enterprise?

Answer:
Thank you for your inquiry. The time it will take for you to process this job (or any job) depends on a number of factors. So that we can best estimate, can you provide us with more information regarding the scope of this project? Here are some important variables that we need to solve for before estimating the time it will take to complete this conversion project:

  • Number of pages
  • Size and resolution (e.g. 8.5 x 11’’ at 300 DPI)
  • Various page sizes? Quantity of varying page sizes
  • Quality of documents (e.g. grayscale, full-color, black and white)
  • File format (e.g. PDF, TIFF, JPEG)
  • With or without OCR
  • Do you have a deadline or timeframe for completion
  • Available hours per day (e.g limited time per day, 24/7)

As soon as we have this information, we can best calculate the time it will take for you to complete this project. Additionally, this information will allow us to recommend a license model best suited for your project. For example, if you are working to meet a deadline we might suggest purchasing additional CPU-core licenses to complete your projects on time.

Click here to download a trial of PDF Compressor

For more information about our license models please click here

Best License for Scan Project?

Question:
I am planning a scan project at the moment. I would like to scan, compress and convert 4 million pages in full color to PDF/A within the next four month. These pages are letter size and we are planning to scan at 150 dpi, 24-bit color and without OCR. Which PDF Compressor Enterprise license model would be the best for this project type?

Answer:
Thank you for your inquiry. First, we recommend that you scan at 200-300 dpi for optimum image quality. Extreme document compression with the PDF Compressor Enterprise allows for scanning at higher resolutions without a concern for file size. So, let‘s assume you’ll be scanning at 200 dpi.

There are two possible license models to meet your needs:

1) You could choose the Basic license model which is ideal for project-based conversion. The standard Basic license is for 20,000 pages. So, in this case you would purchase one license and an additional cartridge for 4,000,000 pages. The greatest advantage of this option is that the Basic license will use all available CPU-cores on the computer it is installed on. For example, if you install the software on a Quadcore Computer, you will benefit from all 4 CPU-cores. If you process this project on a Quadcore machine you’ll be finished in one month or two. Additionally, support is included at no additional fee.

2) If you expect future projects to arise, you might want to consider the Server license model which is an unlimited license per CPU-core without page or time limitations. To process your 4 million pages in four months, we’d recommend purchasing an additional CPU-core to ensure you meet your deadline. By investing in annual maintenance and support you can reduce the cost on ongoing processing.

Above all, we recommend that download and install a trial version of the PDF Compressor Enterprise so you can begin testing your documents within your environment to best judge compression rates and processing time.

Click here to download a trial of PDF Compressor

For more information about our license models please click here

Complex Folder Structure

Question:
I have a large number of images in a complex folder structure - one folder containing multiple subfolders for each customer. With the PDF Compressor is it possible to compress all images, even files in subfolders, and maintain the folder structure?

Answer: 
Yes, in the PDF Compressor job input settings please enable the “Include subfolders” option. PDF Compressor will compress all images within your input folder, even files in subfolders, and duplicate the subfolder structure of the input folder in the output folder.

Click here to download a trial of PDF Compressor

Visit LuraTech´s Blog

Subscribe to RSS feed

Latest Posts

PDF/A-2
06.12.2010 16:59
DocYard
11.10.2010 14:16
DocYard
11.10.2010 14:13
DocYard the integration platform
21.05.2010 13:49

Archive

LuraTech|Press Releases|Contact|Site Map|GTC|Privacy|Masthead