Page 7 of 20

Re: LibriVox API Discussion Thread

Posted: March 14th, 2014, 8:55 am
by vpanayotov
By the way, it seems the database used by dev.librivox.org is different from the one used by librivox.org.

Compare the 'genres' info returned by https://dev.librivox.org/api/feed/audiobooks/?offset=0&limit=1&extended=1 :

Code: Select all

<genres>
 <genre>
   <id>4</id>
   <name>Classics (Antiquity)</name>
  </genre>
</genres>
vs https://librivox.org/api/feed/audiobooks/?offset=0&limit=1&extended=1 :

Code: Select all

<genres>
  <genre>
    <id>20</id>
    <name>Literary Fiction</name>
  </genre>
  <genre>
     <id>53</id>
     <name>Published 1800 -1900</name>
  </genre>
</genres>
IMHO this may be confusing for new API users. It might be better to either somehow redirect the requests to the development host to the official one, or synchronize the databases.

Re: LibriVox API Discussion Thread

Posted: March 14th, 2014, 12:34 pm
by Cori
Perhaps we need to consider turning off the dev server API? It was used during our own development, it's not intended for external developers to use, hasn't been documented or publicised for any use other than when we were in the feedback phase of active development -- and it's likely to be very out of date by now. ;)

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 1:53 am
by ekzemplaro
Hello vpanayotov san,

Welcome to LibriVox. I hope you enjoy it here.
And thank you for your feedback.

You are talking about the genre of 'Count of Monte Cristo' (id=47).

At the librivox.org the genre is shown as
Genre(s): Literary Fiction, Published 1800 -1900
I checked my database (http://ekzemplaro.org/librivox/statistics/). It is as
Classics (Antiquity)
I suppose the genre is changed during the last 6 months.

The problem of the current API is 'there is no way to know the changes.'
According to error reports, information of the already catalogued books are changed.
We need API to know these changes.

When librvivox becomes open source, let's develop this API.

I'll soon update the information of the book 47 (Count of Monte Cristo) at my site.

Cheers,
Masa

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 12:19 pm
by vpanayotov
Cori wrote:Perhaps we need to consider turning off the dev server API?
In my opinion, at least the URLs at https://dev.librivox.org/public/temp_info/api should be changed to point to the official server, because 3 out of the top 4 results on Google for "librivox api" lead to that page(the first post of this thread is one of these results BTW).
ekzemplaro wrote:Hello vpanayotov san,

Welcome to LibriVox. I hope you enjoy it here.
Thank you! I am sure I will enjoy it, because LibriVox has a really great community.
ekzemplaro wrote: When librvivox becomes open source, let's develop this API.
Interesting - do you know when it will be open sourced?

By the way how are we supposed to iterate over the project records using the API?
I guess one should use the 'offset' and 'limit' parameters and increase the offset in each consequent request? I noticed that this doesn't work exactly as I would expect, though.
For example https://librivox.org/api/feed/audiobooks/?offset=15&limit=5&extended=1 returns 4(although we are requesting 5) records, with the first being for project with id '78' and the last for project '83'. Moreover if we increase the 'offset' by 4 (https://librivox.org/api/feed/audiobooks/?offset=19&limit=5&extended=1), for the next request we get again the info for project 83 in the first position.
If we remove the 'extended=1' option the request https://librivox.org/api/feed/audiobooks/?offset=15&limit=5 returns 5 records, and the first record is for project '76', which is for some reason omitted in the 'extended' version. I wonder when should I stop iterating, i.e. are we guaranteed that at some point the API will not return (a 'false') empty response (0 records) even though there are more projects?

It seems that the API will require some more work, and I was wondering if the raw data that the API is using is available for download somewhere (e.g. in the form of a database dump)?

Best,
Vassil

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 1:31 pm
by bart
vpanayotov wrote:
ekzemplaro wrote: When librvivox becomes open source, let's develop this API.
Interesting - do you know when it will be open sourced?
I know what Open Source means, regarding to software, but what does it mean regarding to a website?

Bart

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 2:28 pm
by TriciaG
I know what Open Source means, regarding to software, but what does it mean regarding to a website?
The workflow, etc. is basically software. It'll be put on some site somewhere for people to download and look at. The website itself (and the workflow) won't be open to changes. At least, that's my understanding.

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 3:49 pm
by bart
I don't see why people would want to look at our software.
It's the database that's interesting, and you can browse it through our api.

Bart

Re: LibriVox API Discussion Thread

Posted: March 15th, 2014, 4:28 pm
by annise
It means other people could use the software for their purposes , just as we use PD wiki and forum software , not that they would be able to change the set we are actually using
Free working database software would be handy for many volunteer projects

Anne

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 1:15 am
by vpanayotov
annise wrote:Free working database software would be handy for many volunteer projects
Indeed, if another project needs a web application for managing some sort of categorized items, they may at least in theory take LibriVox's code and use it as a base of their own website. And of course going open source may be beneficial to LibriVox as well. The current thread provides a good example. Apparently the 'new' API is in development for more than an year and still seems to have some rough edges here and there. If the source code for the script(s) serving these requests was available, maybe some of the people complaining here, would be willing to have a look and possibly propose concrete solutions, instead of being (effectively) left on the "mercy" of whoever is developing this. For example they could have send 'patches', and if the developer(s) like them they could be applied to the actual code running on the librivox.org - everyone wins.

BTW does anyone know if someone is still working on the API?

Vassil

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 1:50 am
by annise
We are finding it frustrating too.

Anne

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 4:05 am
by bart
vpanayotov wrote:BTW does anyone know if someone is still working on the API?

Vassil
We have stopped developing when the money ran out.
We would like to continue (there is still more to do than only the api) but we can't.

I don't think looking at the LV software would be benefitial to others. The organisation at LV is rather unique. If open source software is needed, it's better to develop that from scratch, so that the structure is as versetile as possible.

Bart

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 4:13 am
by ekzemplaro
Hello Vassil san,
vpanayotov wrote:ekzemplaro wrote:
When librvivox becomes open source, let's develop this API.

Interesting - do you know when it will be open sourced?
No, I don't know. Only 'will be open sourced' is announced. The date is still not given.
vpanayotov wrote:By the way how are we supposed to iterate over the project records using the API?
The following is my method.
#! /bin/bash
#
URL_HEAD='https://librivox.org/api/feed/audiobooks/?id='
URL_TAIL='&extended=1&format=json'
for id in {8520..8560}
do
url=$URL_HEAD$id$URL_TAIL
curl -k $url > "ex_"$id".json"
done
#
vpanayotov wrote:It seems that the API will require some more work, and I was wondering if the raw data that the API is using is available for download somewhere (e.g. in the form of a database dump)?
I requested mysqldump. The answer was no. Please just see the following thread.
viewtopic.php?p=805040#p805040
I understand the situation.
So I drop my request for a mysqldump.
bart wrote:I don't see why people would want to look at our software.
It's the database that's interesting, and you can browse it through our api.
As we are not satisfied with the current API.
I can't synchronized my database with the one at LibriVox.
The book 47 (Count of Monte Cristo) is a good example.
After looking the code I might find out a solution, which is not documented.
Or I can make a suggestion how to improve it.
vpanayotov wrote:BTW does anyone know if someone is still working on the API?
I bet nobody is working. So we have to wait for 'being open sourced'.

Cheers,
Masa

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 5:23 am
by bart
I'm sorry but 'going open source' was never announced and will never happen.
We do want to extend the api, but only if we find the funds.

Bart

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 5:34 am
by Cori
Open sourcing was a part of the Mellon funding agreement, Bart. But exactly what becomes open source and how/when is another matter. I do think that there might be some projects that would appreciate code to organise a catalogue. Or perhaps even to submit stuff to archive.org (if they're willing to work with the Archive around access and so on.) They're probably few and far between though and I fully agree with you that our code is so tightly linked with what we do and how our workflow has evolved, that it'll probably need considerable work for anyone else to fit it into their own organisation.

Re: LibriVox API Discussion Thread

Posted: March 16th, 2014, 6:57 am
by vpanayotov
ekzemplaro wrote: No, I don't know. Only 'will be open sourced' is announced. The date is still not given.
Thank you Masa san!
Unfortunately, judging from the other answers this may not happen anytime soon, so I guess we will have to work around the current semi-broken implementation of the API.
exemplaro wrote: The following is my method.
#! /bin/bash
#
URL_HEAD='https://librivox.org/api/feed/audiobooks/?id='
URL_TAIL='&extended=1&format=json'
for id in {8520..8560}
do
url=$URL_HEAD$id$URL_TAIL
curl -k $url > "ex_"$id".json"
done
#
I see, thank you - so you are basically iterating over a predefined range of ids, and I guess you are somehow filtering out the '{"error":"Audiobooks could not be found"}' entries later.

Vassil