Log in

View Full Version : Anybody know how to edit djvu files?

April 25th, 2004, 04:52 PM
I want to create pdf files with text-under-image OCR’d results. The images to be OCR’d are digital photographs and are quite huge to give good OCR accuracy; we are talking 300-400 K each. Once the images have been made into PDF pages there is no way reduce their size to the same extent I can with other image editing software. I can cut the images down to a tenth or less of their original size, but then I can’t OCR those.

As far as I know there is no way to remove the text layer from a PDF document and import it under a new PDF document (in this case made from low resolution images).

DJVU files on the other hand can export their text layer as an XML document. I could still get good OCR accuracy from the larger images, export the text, and then create a new DJVU file made of much smaller sized images, and import the XML text.

The trouble is I can’t seem to find any software that enables you to create or edit DJVU files. There is something called Document Express (desktop, pro, and enterprise editions) that might do the trick, but I can’t find any, eh, let us say free copies. I think only the enterprise edition lets you export and import text as XML, although I could be wrong on this.

Are there any freeware appz out there that let you create and edit DJVU files in this way? A DJVU to PDF converter would also be handy.

Naturally if anyone knows how to extract OCR’d text layers from one PDF document and import it into another I would like to know.

April 25th, 2004, 08:16 PM
I can upload Document Express to the FTP. Just give me a few days.

April 26th, 2004, 03:57 PM
I was wondering about DejaVu as a more secure means of distributing my DVD, on the presumption that it would be more difficult, if not impossible, to make a stripped copy of it, like can be done with PDF.

Is it possible to edit a djvu document? I've tried doing screenshot/OCR of djvu files and the results are (desireably) crappy. :) If there is an editor, will it work on un'ocr'd files? I could always convert my file pages into pictures, then make a djvu file of that, defeating OCR?

Basically, I'm looking for a more secure alternative to PDF, to prevent k3wls from possibly ripping my file. :mad:

April 26th, 2004, 08:08 PM
I'm looking for a more secure alternative to PDF, to prevent k3wls from possibly ripping my file.
What about FOLIO?

The excellent LSS+ ( Locks, Safes, and Security) Electronic Infobase Edition is based on the Folio database software.
Some crackers are working on it, but till now there isn't any complete crack for Folio.

http://www.security.org and http://www.security.org/lss-tour/UNLOCK.HTM



Do you have a source for the government edition of LSS+?
I'd like to watch these movies: http://www.security.org/lss-tour/lssdsk2.htm :)

April 27th, 2004, 04:43 PM
What good does having the government edition do you if you don't have the video files?

April 28th, 2004, 07:39 PM
I thought of the complete "government" version on 10 cd-roms ;)