PDF questions

Post your questions & get help from friendly LibriVoxers
Post Reply
BellonaTimes
Posts: 3647
Joined: February 15th, 2009, 6:25 pm
Location: Florida
Contact:

Post by BellonaTimes »

Does anyone know of a way to unlock a public domain PDF file in order to convert it to a Txt file? Is there something like an optical-scan program that can do this? For some reason, Google Books no longer makes Txt versions of PD books available on their site. :(
They call me Threadkiller.
My Catalog Page
RuthieG
Posts: 21957
Joined: April 17th, 2008, 8:41 am
Location: Kent, England
Contact:

Post by RuthieG »

Are they specially locked PDFs, then? My Acrobat Reader has the option to File | Save as Other | 'Text' or 'Word or Excel Online'. I don't know how the latter works - never tried it - but saving as Text and then opening in Open Office works fine.

Ruth
My LV catalogue page | RuthieG's CataBlog of recordings | Tweet: @RuthGolding
Boomcoach
Posts: 1058
Joined: December 29th, 2008, 8:37 am
Location: Bluffton, IN
Contact:

Post by Boomcoach »

I have a feeling that what you will need is OCR software, to convert the scanned images into text. Most of Google books offerings are straight image scans of the pages, so the text is not embedded within them.

I haven't used OCR software in over 10 years, so I don't have any idea of what is available, open source, or purchasable.
Boomcoach
My Catalog Page
My current Solo project A Spoiler of Men by Richard Marsh
One role needed to complete the Dramatic Reading of The Leader by Murray Leinster, help us finish this project!
RuthieG
Posts: 21957
Joined: April 17th, 2008, 8:41 am
Location: Kent, England
Contact:

Post by RuthieG »

Well, I don't know :? . Here is a PDF image scan of a poetry magazine, downloaded, opened in Acrobat Reader and "saved as other " text. Looks OK to me. I use Acrobat Reader DC 2015 release.

https://librivox.org/uploads/ruthieg/scan.zip

Ruth
My LV catalogue page | RuthieG's CataBlog of recordings | Tweet: @RuthGolding
Availle
LibriVox Admin Team
Posts: 22449
Joined: August 1st, 2009, 11:30 pm
Contact:

Post by Availle »

BT, if you are looking for a file for your latest group project:

archive.org has a microform download here:
https://archive.org/details/cihm_992063

one of the files appears to be a text file - it should be possible to extract word counts from there?
Cheers, Ava.
Resident witch of LibriVox, channelling
Granny Weatherwax: "I ain't Nice."

--
AvailleAudio.com
Post Reply