The cultural memory of the Netherlands
The Dutch National Library puts eight million pages of newsprint from several centuries on the Web using the LuraWave JP2 Image Content Server for access.
Since 2001, the Dutch National Library, also known as the “Koninklijke Bibliotheek” or “KB”, has been digitizing its document collections and making them available online.
Proper format and functional viewing technology have proven particularly important for presenting newspapers on the Internet. The library has selected JPEG 2000 as the standard format for this and uses the LuraWave JP2 Image Content Server (ICS).
The Koninklijke Bibliotheek in The Hague is rich in tradition and has been the national library since 1798. It is one of the largest and most modern libraries in Europe with importance far beyond the country’s borders. In addition to that, it is a competence center for digitization and archiving matters and serves as a role model. The Koninklijke Bibliotheek was one of the first online with a web site in the early 1990s, as the public first became aware of the significance and potential of the Internet. In 2001, the project dubbed the “memory of the Netherlands” began. The objective of this national initiative coordinated by the library was to save documents from various institutions in society which are part of the Dutch cultural heritage by digitizing them and making them publicly accessible on the Internet.
Newspaper articles from 1618 to 1995
One of the first steps in 2003 was digitizing parliamentary documents from 1814 to 1995. With about 2.3 million scanned pages, this was the first large-scale project and at the same time a dress rehearsal to prepare for similar plans in the future. One such plan began its implementation in 2006, digitizing eight million daily newspapers published from 1618 to 1995. The five-year plan, dubbed “Digital Daily Newspapers” or “DDD” (a Dutch acronym for the same) will be completed at the end of 2011; later phases of the project are planned to digitize additional newspapers and magazines.
For the library, Project DDD is by far the largest digitizing plan in the “memory of the Netherlands” initiative. “Converting an average of 200,000 pages per month from paper to electronic format and and preparing then for presentation on the Web placed completely new demands on our entire organization, as far as personnel is concerned, as well as the creation of suitable IT infrastructures and workflow management,” explained Edwin Klijn, the project manager at the National Library.
The choice of viewer is critical
Given the huge volume of data from nearly eight million pages and the large newspaper format, first the library had to consider the question of which file format to choose and an appropriate viewer. For search and retrieval of the parliamentary documents, all that was required was to prepare their scans as complete PDFs. “Displaying newspapers is trickier,” said Klijn. “First of all because the user must zoom automatically to examine individual content. Second, it should also be possible to separate particular articles from the news page and store them locally on a user’s data medium.”
The introduction of the LuraWave JP2 Image Content Server enabled the National Library to provide these possibilities to the Dutch people. The software from LuraTech was designed particularly for archives and libraries, and enables high-quality images to be provided on the Web. Internet users require no additional software to search and browse the document collections of the national library. What is particularly noteworthy here is that the individual pages are first scanned at high resolution, then transformed without loss into ISO-compliant JPEG 2000 files.
The Image Content Server from LuraTech: a market-tested solution with the security of investment needed
“We took a look at the market and realized that there is not an overabundance of tools for JPEG 2000. LuraTech offers the one with the greatest range of function,” noted Astrid Verheusen, a program manager in the National Library’s Department for Innovation and Development. She commented further that, “The Image Content Server is a technology which has proven itself on the market and prospective security for our investment. Many well-known organizations worldwide use the software, thus there is also a broad community of users with which we can share information as needed. And LuraTech’s support is very good.”
In changing its archiving strategy from TIFF files to JPEG 2000 (ISO 15444) for the newspapers, the National Library has assumed a leading role in Europe and taken an important step toward further establishing the standard, particularly in archives and libraries. Around the world, these institutions face the challenge of archiving their historical collections of information in a lossless manner as “digital originals”. Using JPEG 2000 leads to significant reduction of the storage requirements compared to uncompressed TIFF files. Long-term archiving of the newspapers as TIFF files would have required about 650 terabytes of storage space at the library and cost millions in maintenance. With regard to storage capacity, image quality, longevity and functionality, JPEG 2000 was the best among the possible alternatives.
Zooming in on individual articles, extracting and saving them locally
The library’s choice of the Image Content Server made a powerful viewing tool available. Users can use it to zoom in to particular parts of a newspaper page. A special function enables individual articles to be selected and stored locally in JPEG format. “The Image Content Server is easily expandable with functions,” Verheusen explained. “We have programmed various add-ons, for example to highlight or hide texts.”
A number of document scanning service providers in the Netherlands are busy digitizing the newspaper pages and saving them in JPEG 2000 format. For each page, both a high-resolution master file for backup purposes and a smaller file for access and display on the Web are created. Service providers in Germany, Romania, Cambodia and Laos handle further processing. Enabling access to individual articles within the scans requires that they be subdivided into various articles. The National Library has defined four categories to classify the individual parts: advertisements, general news, family announcements and pictures, including captions.
Collaboration with Google for further book projects
Like other well-respected institutions, the library has chosen to work with Google for plans to digitize its collection of books. This is particularly the case for books published prior to 1900, which are no longer under copyright. The model here differs from the collaboration with scanning partners in Project DDD. These partners received paid orders as service providers, while Google is producing the scans free of charge and providing them to the library in exchange for using them for ha own purposes.
The objective of the Koninklijke Bibliotheek is to make all its digital content available in the future through a centralized Web site. Users still have to go to separate web addresses to search the collections of parliamentary documents, newspapers, magazines, radio broadcasts and soon books. “But it won’t be long until we provide a central portal for this to bring the collections, which have been separate for so long, together with a single interface,” explained Klijn. “This makes sense, because users generally search for content independently of the source.”
Scan Service Provider improves productivity and precision in document processing using DocYard from LuraTech
DOCSOLUTIONS reduces its process throughput times by more than 60 per cent
IDI Impresses Prospective Clients with PDF Compression Solution
Solution: PDF/A, PDF Compressor
Industry: Government
Banishing the Paper Archive: Rhine-Ruhr Transport Association Moves to PDF/A
Solution: PDF/A, PDF Compressor
Industry: Government
German City of Erlangen Uses LuraTech’s PDF/A Solutions to Support Long-Term Archiving and Data Compression for eGovernment Initiative
Solution: PDF/A, PDF Compressor
Industry: Government
DAK migrates their information archive to PDF/A with LuraTech
Solution: PDF/A, PDF Compressor
Industry: Health Care
The German Resistance Memorial Center
Solution: JPEG2000
Industry: Cultural Heritage
LINE Imaging Systems
Solution: JPEG2000
Industry: Health Care
Overcoming Data Overload
Solution: PDF/A, PDF Compressor
Industry: Scan Service Provider