Offlining Librivox

Post your questions & get help from friendly LibriVoxers
Post Reply
vu2tve
Posts: 5
Joined: March 28th, 2019, 8:01 am

Post by vu2tve » April 8th, 2019, 7:00 pm

(This was originally posted here but moved to the help forum as was suggested. )

Hi,

I am a volunteer of the open source Internet-in-a-Box project (https://github.com/iiab/iiab, http://iiab.io). We work with remote communities around the world to provide them with high quality CC-*/CC0 licensed content in offline/semi-offline settings. Our content collection includes projects like Wikipedia, OpenStreetMap, TED, and many other projects.

I think having the librivox recordings as part of this project would be absolutely wonderful, and would love to volunteer to make that happen.

Any thoughts on where I should begin?

For starters, I'd need:
1. As much metadata as possible about the librivox recordings.
2. Download urls for the recordings themselves

I could try offlining tools like "wget -drc" and other spiders, but I'd much rather make this a proper workflow, so that it becomes possible to easily do this in the future.

Any pointers would be much appreciated!

Warmly,
Anish

dlolso21
LibriVox Admin Team
Posts: 4274
Joined: January 11th, 2011, 12:13 pm

Post by dlolso21 » April 9th, 2019, 5:18 pm

Anish,

All of our published recordings are in the Public Domain so you are free to add them to an Internet-in-a-box project.

Librivox does have a very basic API that will get you some of the data you are requesting, API information is here: https://librivox.org/api/info

All of our files are hosted on Archive.org servers as "The LibriVox Free Audiobook Collection" ( https://archive.org/details/librivoxaudio ).

Information on how to the advanced search features on Archive.org can be found here: https://archive.org/advancedsearch.php

Post Reply