Author Topic: Convert PDF to HTML with images?  (Read 47 times)

Vesp

  • Administrator
  • Foundress Queen
  • *****
  • Posts: 3,130
Convert PDF to HTML with images?
« on: October 01, 2010, 09:25:34 PM »
All right - I am wanting to turn some PDF documents into HTML pages without losing much of the structure, images, etc.

Any idea on how I could go about doing this? What programs are available for this, etc?
« Last Edit: October 02, 2010, 12:30:03 AM by Vesp »
Bitcoin address: 1FVrHdXJBr6Z9uhtiQKy4g7c7yHtGKjyLy

salat

  • Dominant Queen
  • ****
  • Posts: 276
Re: Convert PDF to HTML with images?
« Reply #1 on: October 02, 2010, 12:36:22 AM »
I tried it using Adobe Reader 9.0 and it seems to work ok - even with pictures. 
You just open the pdf up in Reader and do a save as and pick html 4.01 from the list of file types.

salat
Salat

Vesp

  • Administrator
  • Foundress Queen
  • *****
  • Posts: 3,130
Re: Convert PDF to HTML with images?
« Reply #2 on: October 02, 2010, 03:14:14 AM »
Could you please explain that in more detail?

I downloaded the latest version on a windows XP computer and went to file and it only has the option of Save as text or Save a copy?

The text I don't believe will same the images...

If you could turn it into HTML with its images, etc etc and everything looks good -- since I can't figure out how to do it.
Is there any chance you could convert Practical Organic Chemistry Arthur I. Vogel for me?

http://library.sciencemadness.org/library/books/vogel_practical_ochem_3.pdf 
« Last Edit: October 02, 2010, 03:47:51 AM by Vesp »
Bitcoin address: 1FVrHdXJBr6Z9uhtiQKy4g7c7yHtGKjyLy

lugh

  • Global Moderator
  • Foundress Queen
  • *****
  • Posts: 876
Re: Convert PDF to HTML with images?
« Reply #3 on: October 02, 2010, 11:46:04 AM »
Open Office should be able to do what's needed  8)
Chemistry is our Covalent Bond

salat

  • Dominant Queen
  • ****
  • Posts: 276
Re: Convert PDF to HTML with images?
« Reply #4 on: October 02, 2010, 12:43:59 PM »
I picked reader from the list when I opened the file because I also have acrobat.  I can ocr pdfs with acrobat.

 I guess it used acrobat anyway.  I'm running it through now, will take a while.  I'll let you know how it turns out this afternoon - wouldn't mind an html version myself.

salat
Salat

Vesp

  • Administrator
  • Foundress Queen
  • *****
  • Posts: 3,130
Re: Convert PDF to HTML with images?
« Reply #5 on: October 02, 2010, 05:19:26 PM »
I tried using Adobe Pro 9.0 and it didn't take the images properly or the structure.
I will try open office - That was going to be the first thing I tried but it didn't seem to like the idea of opening the document.
Bitcoin address: 1FVrHdXJBr6Z9uhtiQKy4g7c7yHtGKjyLy

lugh

  • Global Moderator
  • Foundress Queen
  • *****
  • Posts: 876
Re: Convert PDF to HTML with images?
« Reply #6 on: October 02, 2010, 05:40:16 PM »
You need the Sun PDF Import Extension to open and work with portable document files using Open Office  8)
Chemistry is our Covalent Bond

Vesp

  • Administrator
  • Foundress Queen
  • *****
  • Posts: 3,130
Re: Convert PDF to HTML with images?
« Reply #7 on: October 02, 2010, 07:37:16 PM »
All right tried to use the Sun PDF import on windows XP and Ubuntu - No luck, it just freezes up.
Well, this is stupid.
« Last Edit: October 02, 2010, 11:50:58 PM by Vesp »
Bitcoin address: 1FVrHdXJBr6Z9uhtiQKy4g7c7yHtGKjyLy

salat

  • Dominant Queen
  • ****
  • Posts: 276
Re: Convert PDF to HTML with images?
« Reply #8 on: October 02, 2010, 08:17:18 PM »
I'm not particularly happy with it either, but I suspect the problem is with the formatting of the original pdf - when OCR was used it may have run on some of the pictures.  I converted a different pdf (a tutorial on DMT) and it came out beautifully.  There were images in the directory after conversion but they were of things like numbers and letters.

I think you can modify it somewhat using the css file, but until it knows exactly which items are text and which are pictures I don't think you will get good results.

salat
« Last Edit: October 02, 2010, 08:51:38 PM by salat »
Salat

Wizard X

  • Lord of the Realms
  • Foundress Queen
  • *****
  • Posts: 1,224
Albert Einstein - "Great ideas often receive violent opposition from mediocre minds."

Vesp

  • Administrator
  • Foundress Queen
  • *****
  • Posts: 3,130
Re: Convert PDF to HTML with images?
« Reply #10 on: October 03, 2010, 04:50:43 AM »
Thank you Wizard X, greatly appreciated - I will try these out in a little bit. :)
Bitcoin address: 1FVrHdXJBr6Z9uhtiQKy4g7c7yHtGKjyLy