Help:Converting DjVu to PDF

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page will explain how to convert scanned books from DjVu to PDF file format.

Converting DjVu to PDF can be useful for general reasons given in Help:Converting and because PDF is more accessible:

  • some people prefer to deal with PDF format and don't have DjVu readers installed
  • PDF can be viewed directly in most browsers
  • some sites that have online viewers can't display DjVu but only PDF
  • PDF is easier to manipulate, there is more software for working with PDF than DjVu

So having PDF versions of worthy DjVu books is desirable.

Improving the Windows command line

As with other tasks, command line tools are likely to be used for this type of conversion. While the standard Windows command shell is all right for the task, it can be improved with additional helpful features to make it easier to use. See Help:Improving the Windows command shell.

Using ready software solutions

Using GUI software

The STDU Converter utility allows to convert DjVu to PDF. It has a user-friendly interface with useful features. It supports the efficient JBIG2 encoding for bitonal (black-and-white) documents.

Using command line tools

DjVuLibre's ddjvu command line tool has an option to convert DjVu to PDF (see documentation):

ddjvu -format=pdf -mode=black input.djvu output.pdf

However, the problem is that if the document is in color, the resulting PDF is often huge in size, so there is a need for some means to get PDF files of more manageable size. That's why black mode was specified in the above example. In case of bitonal (black-and-white) DjVu's, the produced PDFs are acceptable in size, but they use Fax (CCITT Group 4) compression. There is another compression method available for PDF called JBIG2 which produces output of about twice less in size. Fortunately, there are ways to convert using this optimal bitonal compression. STDU Converter mentioned above supports it.

Using PDF virtual printers

DjVu can be converted to PDF by means of so-called virtual printers. A virtual printer is a piece of software that installs itself as a printer which appears on the list of printers in the Print dialog box. When 'printing' with that printer, the result is saved as a file on your computer.

With a virtual printer, any printable document of any format can be converted to PDF, but the results are not always satisfactory.

Converting manually

Colour documents

To get a coloured PDF of a reasonable size, these steps may work:

Converting only the layer that bears text

Converting coloured DjVu to PDF directly often results in unnecessarily large size of the output. The DjVu format uses a smart system of layers which allows to separate the text from the background and have a different approach to compression of them, while the PDF format simply uses the JPEG compression. When converting, all these DjVu layers of a page are combined into one image, which for some reason almost always turns out to be such that JPEG compression doesn't perform very well on it, and as the result, the output file turns out to be several times larger than the original.

What we can do is extract only the layer that bears text (and maybe some other meaningful foreground information with it) and make the PDF out of it. This approach is suitable for scanned DjVu books that have a multi-coloured background that came from scanner, which in PDF will take up a lot of size. It is not strictly necessary for the document to still look all right. If the remaining meaningful layer turns out to be bitonal (black-and-white), then the optimal JBIG2 compression can be used. Depending on the way the DjVu file was encoded, the meaningful information may be in the foreground, background or the mask layer. In some cases, images extracted from the mask layer can be much smaller in size while bearing a good representation of the text.

To extract only one layer of a DjVu page as images with DjVuLibre ddjvu utility, the following commands can be used (see which one of them produces a better result):

ddjvu -format=tif -eachpage -mode=foreground book.djvu pageimage%d.tif

ddjvu -format=tif -eachpage -mode=mask book.djvu pageimage%d.tif

ddjvu -format=tif -eachpage -mode=background book.djvu pageimage%d.tif

After figuring out the needed value for the mode parameter, the conversion can be done like this:

ddjvu -format=pdf -mode=<value> book.djvu book.pdf

This way you may end up with a document that has only the text or only the images or only the part of what was in the original, depending on its layout.

For manual conversion, the remaining steps are the same as described above, except if the images are bitonal, you may want to use the more efficient JBIG2 encoding. To avoid setting up a special encoder needed for that, it may be not unreasonable to assemble the acquired images back into DjVu and then simply use the STDU Converter to turn it into PDF with the appropriate setting.

Black-and-white (bitonal) documents

Unless you need to convert a large number of documents, edit the images of the pages, or use some special output settings, STDU Converter is quite sufficient, since can do the conversion for you with a few clicks and makes going through the manual process pretty much unnecessary. That process is the same as for coloured documents, except if you want to use the more efficient JBIG2 encoding, the conversion of the images to PDF is done with a special encoder.

Using online services

  • Internet Archive actually converts all uploaded DjVu files to PDF automatically, which is called derive. You need to create an item (or add DjVu files to an existing item) and simply wait. With time, PDF versions ought to appear.
  • djvu2pdf.com can convert DjVu to PDF, although it has limitations.
  • Other online services exist that claim the ability to convert PDF to DjVu, but they may have limitations and not always deliver the expected results. You can easily find them by searching for convert pdf to djvu online (or the same in your native language).

Checking the result

  • PDF file should look similar to the original DjVu, it should be well-readable and not be extreme in file size.
  • Inspecting PDF files. PDFXplorer can used to inspect the PDF files structure in details, check which compression methods it uses and even extract data streams.

Transferring outlines

Programming challenges

  • The big challenge that remains it to create quality colour PDF files that not are extreme in file size compared to DjVu. Cuminas claims to have a technology that creates PDFs that approach DjVu in terms of quality and size. A Ruby application called PDFBeads tries to apply the approach used in DjVu to PDF, but getting it to work is hard.
  • To automate the manual conversion described above with scripts.

Leaving requests

If you wish to convert some file at Commons but can't do it yourself, you may leave a request at the Commons requests page.

See also