Offlining Librivox

Non-reading activities need your help too!
Post Reply
vu2tve
Posts: 4
Joined: March 28th, 2019, 8:01 am

Post by vu2tve » April 4th, 2019, 6:14 am

Hi,

I am a volunteer of the open source Internet-in-a-Box project (https://github.com/iiab/iiab, http://iiab.io). We work with remote communities around the world to provide them with high quality CC-*/CC0 licensed content in offline/semi-offline settings. Our content collection includes projects like Wikipedia, OpenStreetMap, TED, and many other projects.

I think having the librivox recordings as part of this project would be absolutely wonderful, and would love to volunteer to make that happen.

Any thoughts on where I should begin?

For starters, I'd need:
1. As much metadata as possible about the librivox recordings.
2. Download urls for the recordings themselves

I could try offlining tools like "wget -drc" and other spiders, but I'd much rather make this a proper workflow, so that it becomes possible to easily do this in the future.

Any pointers would be much appreciated!

Warmly,
Anish

schrm
Posts: 2965
Joined: February 10th, 2018, 11:02 am
Location: Austria

Post by schrm » April 7th, 2019, 9:37 am

hi,

im not a dev, but here are 2 links regarding the lv api:

https://librivox.org/api/info
viewtopic.php?f=24&t=44129&start=195

this question would suit better in our help sections of the forum, or in the thread linked above :-)
or maybe in the mailing lists of the project..
http://wiki.laptop.org/go/IIAB/FAQ#How_can_I_help.3F

cheers,

/reader/12275
cc welcome! my skills improve from pl notes that cite when my english pronunciation is way off, or when words are missing.
thx!


en: lay down your arms, essays on art by goethe

de: sammlung prosa, rousseau, hoffmann: sommerfrische

vu2tve
Posts: 4
Joined: March 28th, 2019, 8:01 am

Post by vu2tve » April 8th, 2019, 7:01 pm

Thanks for your reply!

I posted to the help forum. Where can I find the librivox mailing list?

Basquetteur
Posts: 461
Joined: January 23rd, 2016, 1:17 am
Location: Belgium - Bélgica - Belgique- België
Contact:

Post by Basquetteur » April 10th, 2019, 7:28 am

Hi vu2tve,

Perhaps this thread contains something useful for your purpose, just in case you have not seen it already:

Project recoding?
viewtopic.php?f=22&t=70340

I am no admin or senior librivox member so others certainly have better directions to provide to you on this.

Regards

Basquetteur

lethargilistic
Posts: 173
Joined: July 24th, 2018, 3:38 am
Contact:

Post by lethargilistic » December 12th, 2019, 11:41 am

I understand why you'd want to focus on the LibriVox site directly, but another strategy would be to crawl the pages for each recording on the Internet Archive. Each page would have the metadata for the recordings built into it, as a bonus, and each description of the item will contain a link back to the Librivox page for the recording. Not all of them are recordings, though--I don't know if you'd want the m4b files or the earliest "statement of purpose"-style recordings from the founder.

https://archive.org/details/librivoxaudio
Mike

niobium
Posts: 269
Joined: August 15th, 2015, 9:49 pm

Post by niobium » June 12th, 2020, 4:50 pm

Im estimating librivox publishes close to 144 Gigabytes of digital listening per year, based on the number of open projects, the typical number of continuous hours of reading per book, and how a audio file is usually a minute per megabyte. I suppose you could cram that into a portable harddrive and mail order it easily

speakhub
Posts: 7
Joined: June 19th, 2020, 3:14 am

Post by speakhub » June 30th, 2020, 9:08 am

If this is still relevant, I can help get all the content's metadata in a nice json file. The data itself comes from the librivox api but I cleaned and organized it for my project to make librivox available over voice assistants (http://audiobookreader.app/)

The json file will have all the links which you could be able to download. I am not sure of the total file size at the moment but I'd be keen to also make an offline dump some time in the near future. Let me know if you need help and I'd be happy to collaborate

Post Reply