|
|
| Author |
Message |
java
Consumer
|
| Joined: 07 Feb 2005 |
| Posts: 736 |
| Location: The Mexican Republic |
21794.14 Points
|
|
|
| Back to top |
|
 |
CherrieBaby
chouchou
|
| Joined: 01 Mar 2005 |
| Posts: 67 |
|
3070.02 Points
|
|
|
Tue Apr 05, 2005 9:30 pm |
|
|
Converting books to DjVu. This used to be important but is less so now because PDF version 6.0 is more economical with filesize and the Acrobat PDF reader loads into the browser much faster.
I feel ambivalent about encouraging people to convert to DjVu because: a) many people have broadband so large file sizes aren't a problem to download. Big hard disks are really cheap now and you can get a DVD burner for about $60 with a pack of 100 x 4.7 GB discs costing about $40. $100 - about the cost of a night on the town. Newbees tend to skimp on file quality and think that if the software gives them the option to reduce file size then that must be a good thing. Wrong! I'm scanning my new articles at 600 dpi which would have been considered very extravagant a few years ago. 600 dpi needs 4 times the storage space of 300 dpi and 16 times the storage space of 150 dpi. I still think it's worth it for the improvement in quality it gives. Do I want to encourage people to downgrade these files to 72dpi and 'save' space? - not really!
If you're careful you can save a lot of space by other means. OCR the image and convert it to text. This saves a massive amount of space and the advantage is that the file can then be searched using keywords. However, when done properly, it needs a lot of work proofing. Scan the article in black & white only (this setting is called 'Text/Line Art' in ABBYY). I know it sounds dumb but I've seen masses of books & articles scanned in colour or grey-scale when the person could've used black & white. I.e. They were scanning something that was originally black text on white paper!
Any2DjVu will do a decent job of converting PDF to DjVu but is limited in the file size it can handle. Anything about 2 Mb is dodgy and may or may not work. Document Express is available but only seems to be able to either scan (from a scanner) or join image files (Tif, Gif, JPEG, etc.). There's a crack for Document Express but I'm not sure it works with the latest trial download.
You can export a PDF from the full version of Acrobat as a series of images - it takes ages. These images can be imported into Document Express but that's a fucking pain if your PDF file has text not images!
Yo. I just did this and I reduced the file size form 190 Kb (PDF) 238 Kb (DjVu)!! I repeated the process with a image only file and reduced a PDF of 2390 KB down to a 481 KB DjVu but the image was down-graded significantly.
If your PDF is small use any2DjVu
http://any2djvu.djvuzone.org/
This worked much better. It managed to reduce the first file size down to 171 Kb in less than 2 minutes. I never bothered to check it with the second, big, file because, in my experience, it would've taken too long.
Desired software is:
- Acrobat (full version). I've not used version 7. I have versions 5 & 6 both installed. They offer slightly different features.
- ABBYY finereader (ver. 7). You can use it with or without a scanner. It converts images or PDFs to text.
- Document Express 4.XX (LizardTech).
- PDF Writer (creates PDFs from MS Word)
- Any2DjVu (free web-site).
If I was buying another scanner I'd get a 2nd hand one with a sheet feeder or one with fast scanning when using 600 dpi optical resolution. |
|
| Back to top |
|
 |
java
Consumer
|
| Joined: 07 Feb 2005 |
| Posts: 736 |
| Location: The Mexican Republic |
21794.14 Points
|
|
|
Re: Favor the acrobat for it's cut and paste features
Tue Apr 05, 2005 10:01 pm |
|
|
| CherrieBaby....a very good summary, although I like djvu, I really enjoy the cut and past features that adobe acrobat provides, something that djvu hasn't . When documenting a research paper and just want to quote some paragraphs without re-typing I like the cut and paste and hence makes it easier especially with e-books, where diagrams can be snap pictured with the acrobat and converted to jpeg's to be used in the forum or add to research documents........java |
|
| Back to top |
|
 |
Polverone
|
| Joined: 12 Feb 2005 |
| Posts: 28 |
|
846.64 Points
|
|
|
filesizes
Wed Apr 06, 2005 8:39 am |
|
|
For most things you'll find in older books, plain black and white is the best. Newer books or ones that otherwise have continuous-tone images in them will look very bad and compress poorly if they are scanned/stored as black and white images. The best solution is to segment the page into regions, so areas which need full color/shades of gray get them and the rest are stored as black and white. DjVu has always offered this capability, and I believe that Acrobat 7.0 provides tools for it also. I scan at 600 DPI, but always distribute at 300. The extra resolution is useful for OCR but I don't think the extra file size is wothwhile when it comes to viewing/printing.
An uncompressed 600 DPI image will be four times as large as an uncompressed 300 DPI image, but compression narrows this gap considerably. With JBIG2, the best bitonal image compressor for textual page images in PDF, you can actually generate larger files with images that are too low-res, because the compressor can't make sense of shapes. |
|
| Back to top |
|
 |
|
|
|
Wed Apr 06, 2005 3:51 pm |
|
|
A good efficient way can be to scan, pdf, then output to plain text using ocr
I have been finding that even pdf's can be heavy on size,
it's a shame there is not a program, that can convert output text to a html booklet, with index and the like,
Now that would be space saving, then compress the book
Polverone,
Are you aware of software that can turn a forum into a html index?
syn |
|
| Back to top |
|
 |
brain
Linguist Extraordinaire
|
| Joined: 08 Mar 2005 |
| Posts: 143 |
|
2405.16 Points
|
|
|
Sun Apr 17, 2005 11:45 pm |
|
|
| about scanning : maybe not books but TLC - do you know how to conwert normal scanner to scanner TLC ? in diffrent ligh lenght? i seen it somewhere, but losted. |
|
| Back to top |
|
 |
MiNdBaBY
|
| Joined: 16 Mar 2005 |
| Posts: 40 |
|
2093.46 Points
|
|
|
Going digital..
Tue Apr 26, 2005 10:49 pm |
|
|
I have a few suggestions from experience..
If you seriously want to digitize (pdf, djvu, ocr, whatever) hardcopies (books, papers, etc.) you will have to spend a pocket of change, which hinders most.
A) Software, is costless obtainable online through resourcefullness and utfse, or file sharing.
B) You MUST purchase an ADF scanner. There is not a large variety or selection of these. You best stick with the leader, ol'faithful, HP.
[I still use and for most part like my 6350C. With my 6350C's (I bought two, and new which makes me look like a fool as you'll get to in a second.. Read on..) it wasn't too far into the scanners life that it failed.. It was a few months and {has no clue today} not even a few thousand pages having been scanned before the rubber roller(s) that feed/pickup the sheet died ('glazed' would be more scientifically correct)]
[The solution MiNdBaBY dreampt of and into reality, Xylene's usefulness in this arena via applying to rubber with q-tip for re-juvination. This is the most inexpensive and OTC solution and would work for printers as well]
C) You MUST purchase a decent copy machine. I' highly reccomend Canon copiers. Depending on your ability to bargain shop (i.e. online auctions), you'll spend at least $100 {used} and upto $500 {brand new} with respect to brand, model, and condition.
On ebay.com you could search for "HP 6350C" and would find three for sale with buy it now prices of $80 & 2@$50, wow, that's a deal..
Utilizing an ADF you need to feed it nothing but stacks std letter size paper if you want the computer & scanner to do all the work rather than you continuing to labor over a scanner.
Whenever it comes to a book, I first photo copy the book. Why? Becuase a copier is able to have printed the sheet quicker than I am able to flip the book pages and continuing to press the greeen button.
Aside from books specifically, the ADF's like to jam too with rips, tears, and krinkles thus you want to make clean copies of any such paper(s) you planned to scan.
When copying books {paperback being the easiest allowing 2 pages per letter sheet where depending on pprbck book size you have to reduce/enlarge to achieve this} you will want to utilize the copiers reduce/enlarge functions in order to center and/or size your book page(s) how you desire them on the copie(s).
And the alternative of cutting up/destroying books to scan'em in adf results in 2 issues: adf jams; and more time/effort spent de-binding the book opposed to having copied it; not to mention injured book diminishing in value..
And last but not least I want to stress that going digital is awesome for a few reasons:
*With a DVD's in your pocket you can take every piece of paper/book of yours, quickly, easily, and anywhere.
*You can encrypt, making it difficult for others to easily or accidently view your personal documents.
I'd say two words sum up being digital, convience & safety in ones life.. |
|
| Back to top |
|
 |
|
|
|
Powered by phpBB 2.0.11 © 2001, 2002 phpBB Group
Igloo Theme Version 1.0 :: Created By: Andrew Charron
|