LibriVox API Discussion Thread

vpanayotov · Post by **vpanayotov** » March 14th, 2014, 8:55 am

By the way, it seems the database used by dev.librivox.org is different from the one used by librivox.org.

Compare the 'genres' info returned by https://dev.librivox.org/api/feed/audiobooks/?offset=0&limit=1&extended=1 :

Code: Select all

<genres>
 <genre>
   <id>4</id>
   <name>Classics (Antiquity)</name>
  </genre>
</genres>

vs https://librivox.org/api/feed/audiobooks/?offset=0&limit=1&extended=1 :

Code: Select all

<genres>
  <genre>
    <id>20</id>
    <name>Literary Fiction</name>
  </genre>
  <genre>
     <id>53</id>
     <name>Published 1800 -1900</name>
  </genre>
</genres>

IMHO this may be confusing for new API users. It might be better to either somehow redirect the requests to the development host to the official one, or synchronize the databases.

Cori · Post by **Cori** » March 14th, 2014, 12:34 pm

Perhaps we need to consider turning off the dev server API? It was used during our own development, it's not intended for external developers to use, hasn't been documented or publicised for any use other than when we were in the feedback phase of active development -- and it's likely to be very out of date by now.

ekzemplaro · Post by **ekzemplaro** » March 15th, 2014, 1:53 am

Hello vpanayotov san,

Welcome to LibriVox. I hope you enjoy it here.
And thank you for your feedback.

You are talking about the genre of 'Count of Monte Cristo' (id=47).

At the librivox.org the genre is shown as

Genre(s): Literary Fiction, Published 1800 -1900

I checked my database (http://ekzemplaro.org/librivox/statistics/). It is as

Classics (Antiquity)

I suppose the genre is changed during the last 6 months.

The problem of the current API is 'there is no way to know the changes.'
According to error reports, information of the already catalogued books are changed.
We need API to know these changes.

When librvivox becomes open source, let's develop this API.

I'll soon update the information of the book 47 (Count of Monte Cristo) at my site.

Cheers,
Masa

vpanayotov · Post by **vpanayotov** » March 15th, 2014, 12:19 pm

Cori wrote:Perhaps we need to consider turning off the dev server API?

In my opinion, at least the URLs at https://dev.librivox.org/public/temp_info/api should be changed to point to the official server, because 3 out of the top 4 results on Google for "librivox api" lead to that page(the first post of this thread is one of these results BTW).

ekzemplaro wrote:Hello vpanayotov san,

Welcome to LibriVox. I hope you enjoy it here.

Thank you! I am sure I will enjoy it, because LibriVox has a really great community.

ekzemplaro wrote: When librvivox becomes open source, let's develop this API.

Interesting - do you know when it will be open sourced?

By the way how are we supposed to iterate over the project records using the API?
I guess one should use the 'offset' and 'limit' parameters and increase the offset in each consequent request? I noticed that this doesn't work exactly as I would expect, though.
For example https://librivox.org/api/feed/audiobooks/?offset=15&limit=5&extended=1 returns 4(although we are requesting 5) records, with the first being for project with id '78' and the last for project '83'. Moreover if we increase the 'offset' by 4 (https://librivox.org/api/feed/audiobooks/?offset=19&limit=5&extended=1), for the next request we get again the info for project 83 in the first position.
If we remove the 'extended=1' option the request https://librivox.org/api/feed/audiobooks/?offset=15&limit=5 returns 5 records, and the first record is for project '76', which is for some reason omitted in the 'extended' version. I wonder when should I stop iterating, i.e. are we guaranteed that at some point the API will not return (a 'false') empty response (0 records) even though there are more projects?

It seems that the API will require some more work, and I was wondering if the raw data that the API is using is available for download somewhere (e.g. in the form of a database dump)?

Best,
Vassil

Post by **bart** » March 15th, 2014, 1:31 pm

vpanayotov wrote:
ekzemplaro wrote: When librvivox becomes open source, let's develop this API.
Interesting - do you know when it will be open sourced?

I know what Open Source means, regarding to software, but what does it mean regarding to a website?

Bart

Post by **TriciaG** » March 15th, 2014, 2:28 pm

I know what Open Source means, regarding to software, but what does it mean regarding to a website?

The workflow, etc. is basically software. It'll be put on some site somewhere for people to download and look at. The website itself (and the workflow) won't be open to changes. At least, that's my understanding.

Post by **bart** » March 15th, 2014, 3:49 pm

I don't see why people would want to look at our software.
It's the database that's interesting, and you can browse it through our api.

Bart

Post by **annise** » March 15th, 2014, 4:28 pm

It means other people could use the software for their purposes , just as we use PD wiki and forum software , not that they would be able to change the set we are actually using
Free working database software would be handy for many volunteer projects

Anne

vpanayotov · Post by **vpanayotov** » March 16th, 2014, 1:15 am

annise wrote:Free working database software would be handy for many volunteer projects

Indeed, if another project needs a web application for managing some sort of categorized items, they may at least in theory take LibriVox's code and use it as a base of their own website. And of course going open source may be beneficial to LibriVox as well. The current thread provides a good example. Apparently the 'new' API is in development for more than an year and still seems to have some rough edges here and there. If the source code for the script(s) serving these requests was available, maybe some of the people complaining here, would be willing to have a look and possibly propose concrete solutions, instead of being (effectively) left on the "mercy" of whoever is developing this. For example they could have send 'patches', and if the developer(s) like them they could be applied to the actual code running on the librivox.org - everyone wins.

BTW does anyone know if someone is still working on the API?

Vassil

Post by **annise** » March 16th, 2014, 1:50 am

We are finding it frustrating too.

Anne

Post by **bart** » March 16th, 2014, 4:05 am

vpanayotov wrote:BTW does anyone know if someone is still working on the API?

Vassil

We have stopped developing when the money ran out.
We would like to continue (there is still more to do than only the api) but we can't.

I don't think looking at the LV software would be benefitial to others. The organisation at LV is rather unique. If open source software is needed, it's better to develop that from scratch, so that the structure is as versetile as possible.

Bart

ekzemplaro · Post by **ekzemplaro** » March 16th, 2014, 4:13 am

Hello Vassil san,

vpanayotov wrote:ekzemplaro wrote:
When librvivox becomes open source, let's develop this API.

Interesting - do you know when it will be open sourced?

No, I don't know. Only 'will be open sourced' is announced. The date is still not given.

vpanayotov wrote:By the way how are we supposed to iterate over the project records using the API?

The following is my method.

#! /bin/bash
#
URL_HEAD='https://librivox.org/api/feed/audiobooks/?id='
URL_TAIL='&extended=1&format=json'
for id in {8520..8560}
do
url=$URL_HEAD$id$URL_TAIL
curl -k $url > "ex_"$id".json"
done
#

vpanayotov wrote:It seems that the API will require some more work, and I was wondering if the raw data that the API is using is available for download somewhere (e.g. in the form of a database dump)?

I requested mysqldump. The answer was no. Please just see the following thread.
viewtopic.php?p=805040#p805040

I understand the situation.
So I drop my request for a mysqldump.

bart wrote:I don't see why people would want to look at our software.
It's the database that's interesting, and you can browse it through our api.

As we are not satisfied with the current API.
I can't synchronized my database with the one at LibriVox.
The book 47 (Count of Monte Cristo) is a good example.
After looking the code I might find out a solution, which is not documented.
Or I can make a suggestion how to improve it.

vpanayotov wrote:BTW does anyone know if someone is still working on the API?

I bet nobody is working. So we have to wait for 'being open sourced'.

Cheers,
Masa

Post by **bart** » March 16th, 2014, 5:23 am

I'm sorry but 'going open source' was never announced and will never happen.
We do want to extend the api, but only if we find the funds.

Bart

Cori · Post by **Cori** » March 16th, 2014, 5:34 am

Open sourcing was a part of the Mellon funding agreement, Bart. But exactly what becomes open source and how/when is another matter. I do think that there might be some projects that would appreciate code to organise a catalogue. Or perhaps even to submit stuff to archive.org (if they're willing to work with the Archive around access and so on.) They're probably few and far between though and I fully agree with you that our code is so tightly linked with what we do and how our workflow has evolved, that it'll probably need considerable work for anyone else to fit it into their own organisation.

vpanayotov · Post by **vpanayotov** » March 16th, 2014, 6:57 am

ekzemplaro wrote: No, I don't know. Only 'will be open sourced' is announced. The date is still not given.

Thank you Masa san!
Unfortunately, judging from the other answers this may not happen anytime soon, so I guess we will have to work around the current semi-broken implementation of the API.

exemplaro wrote: The following is my method.
#! /bin/bash
#
URL_HEAD='https://librivox.org/api/feed/audiobooks/?id='
URL_TAIL='&extended=1&format=json'
for id in {8520..8560}
do
url=$URL_HEAD$id$URL_TAIL
curl -k $url > "ex_"$id".json"
done
#

I see, thank you - so you are basically iterating over a predefined range of ids, and I guess you are somehow filtering out the '{"error":"Audiobooks could not be found"}' entries later.

Vassil