LibriVox API Discussion Thread

Comments about LibriVox? Suggestions to improve things? News?
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello again luelusten san,
luelusten wrote:If I was going to do it your way I would use the option you provided to copy the database based on the returns and then create my own API,

Several developpers are already struggling with this method. If you adopt this way, you can get lots of advice.
This way has still lots of problems. We are reporting database problems and several problems are already fixed.
I expect in the near future we will success to clone the LibriVox database. Then we can start to fix the API.
The API source codes are aleady on Github.

Cheers,
Masa
luelusten
Posts: 11
Joined: January 18th, 2015, 8:02 pm
Location: UK
Contact:

Post by luelusten »

ekzemplaro wrote:Hello again luelusten san,
luelusten wrote:If I was going to do it your way I would use the option you provided to copy the database based on the returns and then create my own API,

Several developpers are already struggling with this method. If you adopt this way, you can get lots of advice.
This way has still lots of problems. We are reporting database problems and several problems are already fixed.
I expect in the near future we will success to clone the LibriVox database. Then we can start to fix the API.
The API source codes are aleady on Github.

Cheers,
Masa
I am not able to do it this way this is what I am getting at, the Plugin I am creating is for the LibriVox service so I can't in all fairness use a 3ed party services that might not be there plus I see nothing in your page that returns it to just XML, so the API might be limited but I will have to deal with what LibriVox offers.

I understand LibriVox api is limited but I don't have the time or resources to reinvent the wheel, and relying on a 3ed party service is not the best idea.
Windows Desktop App Coming Soon
gnasher729
Posts: 11
Joined: February 12th, 2015, 8:50 am

Post by gnasher729 »

Hi everyone, I started writing some code to get information about books using the Librivox api, and here is what I found so far:

1. For each book, the following fields are returned by default: id, title, description, url_text_source, language, copyright_year, num_sections, url_rss, url_zip_file, url_project, url_librivox, url_other, totaltime, totaltimesecs, and authors.

2. If the parameter extended=1 is specified, the additional fields url_iarchive, sections, genres, translators are available. url_iarchive is especially important because it provides a link to www.archive.org where more important information is available.

3. You can get information about one book (id=52), about all books after a unix time stamp (since=123456789), about all books with an exact title, author last name, or genre (title=The Title, author=Lastname, genre=WhichGenre) or all books starting with that title, author last name, or genre (title=^X, author=^Y, genre=^Z for title starting with X, author starting with Y, genre starting with Z).

4. If there are many books returned, by default the first 50 are returned. This is changed with the parameter offset and limit, for example offset=100,limit=20 to get books number 100 to 119 (numbering starts with 0).

5. The books are returned as an array. HOWEVER there is a bug in the .php code when extended=1 is used: The books are returned in a dictionary. The dictionary key is the number of sections in the book. And since it is a dictionary, it can't contain two books with the same number of sections. If two books have 21 sections, then only the second one will be returned.

This can be seen by entering the following URLs:

https://librivox.org/api/feed/audiobooks/?format=json&extended=1&limit=10&fields={id,title}
https://librivox.org/api/feed/audiobooks/?format=json&extended=1&limit=40&fields={id,title}

When you try to download 40 titles at a time, "This Side of Paradise" gets overwritten by "Merry Adventures of Robin Hood". No problem if you want the information for a single ID, but a problem if you want to download in bulk. For example, downloading 40 books only returns 33.

I haven't written a line of php code in my life, but looking at

https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php

the problem seems to be that the code creating sections information re-uses the variable $key, and changing it to $sectionkey in two places might fix the problem. Would be nice if anyone with access could have a look and maybe fix it.

My workaround to get all the information about all books in bulk is not too difficult: Download the two fields "id" and "num_sections" for all books. Then look for consecutive groups where no two have the same number of sections, and download those groups. Oh well.
ScottLawton
Posts: 243
Joined: October 14th, 2011, 1:38 pm

Post by ScottLawton »

gnasher729 wrote:If two books have 21 sections, then only the second one will be returned.
Excellent observation! I had noticed the bug, but hadn't thought about the implication.

Thanks for sharing your full set of notes.

Scott
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello gnasher729 san,

Good job.
gnasher729 wrote:https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php

the problem seems to be that the code creating sections information re-uses the variable $key, and changing it to $sectionkey in two places might fix the problem. Would be nice if anyone with access could have a look and maybe fix it.
Everyone can access the code, as the code is on Github. The problem is we don't have data. As the contents are generated dynamically, we need both code and data to reproduce the problem and to fix the problem and to confirm the fix.
I feel if we cooperate, we can reconstruct the MySQL database and reproduce the problem and fix it.

Cheers,
Masa
ybora
Posts: 5
Joined: February 10th, 2015, 3:11 am

Post by ybora »

Hi All,

I had posted my query earlier regarding Librivox APIs. Please see the link.
viewtopic.php?f=23&t=55408

there i got the right link for API related topics and i went through different discussions there.
I got few of my questions answered but still i have no idea about how to get genre information correctly without passing extended=1. When i pass extended=1 i don't get all the books as someone has mentioned it in the discussion.

"The books are returned in a dictionary. The dictionary key is the number of sections in the book. And since it is a dictionary, it can't contain two books with the same number of sections. If two books have 21 sections, then only the second one will be returned. "

without extended=1 i don't get Genre information. I can see this question is posted many times but there are no answers for that. Kindly guide me on how to get correct genre information without passing extended=1.
something like -
https://librivox.org/api/feed/audiobooks?limit=20&offset=0&format=json&fields=id,url_rss,language,genres


Regards
Yogesh
ScottLawton
Posts: 243
Joined: October 14th, 2011, 1:38 pm

Post by ScottLawton »

Another alternative: use XML (by dropping 'format=json&'). Not as convenient as JSON, but not a big deal.

Scott
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app
gnasher729
Posts: 11
Joined: February 12th, 2015, 8:50 am

Post by gnasher729 »

I figured out how to distinguish between complete and incomplete books using the LibriVox - just download the "Extended" data, check for the field "url_iarchive" (which is needed anyway to download thumbnails and the actual .mp3 files), and if that field isn't there or is empty, then the book is "In Progress".

However, it seems that I cannot distinguish between books that are actually "In Progress" and books that are "Abandoned". Any idea how to get this from the Librivox API data?
gnasher729
Posts: 11
Joined: February 12th, 2015, 8:50 am

Post by gnasher729 »

ScottLawton wrote:Another alternative: use XML (by dropping 'format=json&'). Not as convenient as JSON, but not a big deal.

Scott
Well, it doesn't solve the problem. The problem is in the code that collects the data that gets returned; the bug turns the data from an array into a dictionary, and you often have duplicate keys.

The way I solved it: I download all the IDs first, 100 at a time.
Then I download the complete data, 20 at a time. More than 20 is pointless, because the more you download, the more duplicate keys, and the more data is missing. I record all the data, and remove all the IDs of the books that were returned from my list of IDs.
Then I download the missing data by id, one at a time.

(My plan is that an app will contain some version of that data built in, so it doesn't have to download everything, just the changes).
ScottLawton
Posts: 243
Joined: October 14th, 2011, 1:38 pm

Post by ScottLawton »

ScottLawton wrote:Another alternative: use XML (by dropping 'format=json&').
gnasher729 wrote:Well, it doesn't solve the problem. The problem is in the code that collects the data that gets returned; the bug turns the data from an array into a dictionary, and you often have duplicate keys.
Oops, sorry. I just inspected the format that got returned rather than the list of items.

Scott
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello gnasher san,
gnasher729 wrote:However, it seems that I cannot distinguish between books that are actually "In Progress" and books that are "Abandoned". Any idea how to get this from the Librivox API data?
I think we cannot access abandoned projects through API.
But I could be wrong. Can you mention the abandoned projects with IDs?
Some projects are silent for several months. But they are treated as in Progress.

Cheers,
Masa
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello Yogesh san,
Welcome to LibriVox. I hope you enjoy it here.

Just look at the code.
https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php
if ($extended)
{
$project['url_iarchive'] = $row['url_iarchive'];
//get sections
$project['sections'] = $this->_get_sections($row['id']);
if (!empty($project['sections']))
{
foreach ($project['sections'] as $key=>$section)
{
$project['sections'][$key]['readers'] =$this->_get_readers($section['id']);
}
}
// get genres
$project['genres'] = $this->_get_genres($row['id']);
// get translators
$project['translators'] = $this->_get_authors($row['id'], 'translator');
}
You need to pass 'extended=1' to get genres.
My solution is to call API 9300 times, and reconstruct the data base.
I agree it's crazy.

Cheers,
Masa
gnasher729
Posts: 11
Joined: February 12th, 2015, 8:50 am

Post by gnasher729 »

ekzemplaro wrote:Hello gnasher san,
I think we cannot access abandoned projects through API.
But I could be wrong. Can you mention the abandoned projects with IDs?
Some projects are silent for several months. But they are treated as in Progress.

Cheers,
Masa
Hi Masa,

The first three books without a link to archive.org are: id 282, "The Time Machine" is "abandoned". id 700, "Bleak House (version 2)" is "on hold", and id 1299, "Extermination of the American Bison" is "fully subscribed". One difference is that the first two have no "description", the second one has, but that is be coincidence; id 6706 "Cuentos de la Selva" is marked as abandoned and has a description.

So I can see no different between these books in the API data. It doesn't matter too much for my purposes; I think I'm not supposed to use the links for the mp3 files in the librivox update area anywhere, so for my purposes it doesn't make much difference whether a book is in progress or abandoned.
gnasher729
Posts: 11
Joined: February 12th, 2015, 8:50 am

Post by gnasher729 »

ekzemplaro wrote:You need to pass 'extended=1' to get genres.
My solution is to call API 9300 times, and reconstruct the data base.
I agree it's crazy.
My solution was first to download just the ids (without extended), then download with extended=1 20 books at a time, keeping track which are missing, and downloading the rest, which is about 1,700. A simpler method would be downloading two at a time, which will will either return both books, or only the second one. So if you get only one book, you download the first of the two individually. That's about half the API calls and a lot faster.

And I'd recommend to everyone to write their code so that it still works if the API gets fixed. I use the JSON API, and with extended=0 I receive an array, but with extended=1 I receive a dictionary with meaningless keys.
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello gnasher san,
gnasher729 wrote:id 282, "The Time Machine" is "abandoned".
id 700, "Bleak House (version 2)" is "on hold", and
id 1299, "Extermination of the American Bison" is "fully subscribed". One difference is that the first two have no "description", the second one has, but that is be coincidence;
id 6706 "Cuentos de la Selva" is marked as abandoned and has a description.
Thank you for these information.
I'll investigate about these projects.

Cheers,
Masa
Post Reply