PDF questions

BellonaTimes · Post by **BellonaTimes** » August 23rd, 2015, 6:26 am

Does anyone know of a way to unlock a public domain PDF file in order to convert it to a Txt file? Is there something like an optical-scan program that can do this? For some reason, Google Books no longer makes Txt versions of PD books available on their site.

RuthieG · Post by **RuthieG** » August 23rd, 2015, 6:40 am

Are they specially locked PDFs, then? My Acrobat Reader has the option to File | Save as Other | 'Text' or 'Word or Excel Online'. I don't know how the latter works - never tried it - but saving as Text and then opening in Open Office works fine.

Ruth

Boomcoach · Post by **Boomcoach** » August 23rd, 2015, 7:54 am

I have a feeling that what you will need is OCR software, to convert the scanned images into text. Most of Google books offerings are straight image scans of the pages, so the text is not embedded within them.

I haven't used OCR software in over 10 years, so I don't have any idea of what is available, open source, or purchasable.

RuthieG · Post by **RuthieG** » August 23rd, 2015, 8:30 am

Well, I don't know

. Here is a PDF image scan of a poetry magazine, downloaded, opened in Acrobat Reader and "saved as other " text. Looks OK to me. I use Acrobat Reader DC 2015 release.

https://librivox.org/uploads/ruthieg/scan.zip

Ruth

Post by **Availle** » August 23rd, 2015, 8:42 am

BT, if you are looking for a file for your latest group project:

archive.org has a microform download here:
https://archive.org/details/cihm_992063

one of the files appears to be a text file - it should be possible to extract word counts from there?