Offlining Librivox
Posted: April 4th, 2019, 6:14 am
Hi,
I am a volunteer of the open source Internet-in-a-Box project (https://github.com/iiab/iiab, http://iiab.io). We work with remote communities around the world to provide them with high quality CC-*/CC0 licensed content in offline/semi-offline settings. Our content collection includes projects like Wikipedia, OpenStreetMap, TED, and many other projects.
I think having the librivox recordings as part of this project would be absolutely wonderful, and would love to volunteer to make that happen.
Any thoughts on where I should begin?
For starters, I'd need:
1. As much metadata as possible about the librivox recordings.
2. Download urls for the recordings themselves
I could try offlining tools like "wget -drc" and other spiders, but I'd much rather make this a proper workflow, so that it becomes possible to easily do this in the future.
Any pointers would be much appreciated!
Warmly,
Anish
I am a volunteer of the open source Internet-in-a-Box project (https://github.com/iiab/iiab, http://iiab.io). We work with remote communities around the world to provide them with high quality CC-*/CC0 licensed content in offline/semi-offline settings. Our content collection includes projects like Wikipedia, OpenStreetMap, TED, and many other projects.
I think having the librivox recordings as part of this project would be absolutely wonderful, and would love to volunteer to make that happen.
Any thoughts on where I should begin?
For starters, I'd need:
1. As much metadata as possible about the librivox recordings.
2. Download urls for the recordings themselves
I could try offlining tools like "wget -drc" and other spiders, but I'd much rather make this a proper workflow, so that it becomes possible to easily do this in the future.
Any pointers would be much appreciated!
Warmly,
Anish