LibriVox
Forums

* FAQ    * Search
* Login   * Register
It is currently October 19th, 2017, 12:25 am


Post new topic Reply to topic  Page 10 of 13  [ 194 posts ] 
Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13  Next

Author Message
Offline
Post Posted:: January 24th, 2015, 2:46 am 

Joined: December 31st, 2011, 7:17 am
Posts: 2030
Location: Tochigi,Japan
Hello again luelusten san,

luelusten wrote:
If I was going to do it your way I would use the option you provided to copy the database based on the returns and then create my own API,

Several developpers are already struggling with this method. If you adopt this way, you can get lots of advice.
This way has still lots of problems. We are reporting database problems and several problems are already fixed.
I expect in the near future we will success to clone the LibriVox database. Then we can start to fix the API.
The API source codes are aleady on Github.

Cheers,
Masa

_________________
My recrodings Community Audio
My GitHub Another LibriVox Catalog Statistics Sound_check


Top
 Profile  
Offline
Post Posted:: January 24th, 2015, 8:32 am 

Joined: January 18th, 2015, 8:02 pm
Posts: 11
Location: UK
ekzemplaro wrote:
Hello again luelusten san,

luelusten wrote:
If I was going to do it your way I would use the option you provided to copy the database based on the returns and then create my own API,

Several developpers are already struggling with this method. If you adopt this way, you can get lots of advice.
This way has still lots of problems. We are reporting database problems and several problems are already fixed.
I expect in the near future we will success to clone the LibriVox database. Then we can start to fix the API.
The API source codes are aleady on Github.

Cheers,
Masa


I am not able to do it this way this is what I am getting at, the Plugin I am creating is for the LibriVox service so I can't in all fairness use a 3ed party services that might not be there plus I see nothing in your page that returns it to just XML, so the API might be limited but I will have to deal with what LibriVox offers.

I understand LibriVox api is limited but I don't have the time or resources to reinvent the wheel, and relying on a 3ed party service is not the best idea.

_________________
Windows Desktop App Coming Soon


Top
 Profile  
Offline
Post Posted:: February 12th, 2015, 10:23 am 

Joined: February 12th, 2015, 8:50 am
Posts: 11
Hi everyone, I started writing some code to get information about books using the Librivox api, and here is what I found so far:

1. For each book, the following fields are returned by default: id, title, description, url_text_source, language, copyright_year, num_sections, url_rss, url_zip_file, url_project, url_librivox, url_other, totaltime, totaltimesecs, and authors.

2. If the parameter extended=1 is specified, the additional fields url_iarchive, sections, genres, translators are available. url_iarchive is especially important because it provides a link to www.archive.org where more important information is available.

3. You can get information about one book (id=52), about all books after a unix time stamp (since=123456789), about all books with an exact title, author last name, or genre (title=The Title, author=Lastname, genre=WhichGenre) or all books starting with that title, author last name, or genre (title=^X, author=^Y, genre=^Z for title starting with X, author starting with Y, genre starting with Z).

4. If there are many books returned, by default the first 50 are returned. This is changed with the parameter offset and limit, for example offset=100,limit=20 to get books number 100 to 119 (numbering starts with 0).

5. The books are returned as an array. HOWEVER there is a bug in the .php code when extended=1 is used: The books are returned in a dictionary. The dictionary key is the number of sections in the book. And since it is a dictionary, it can't contain two books with the same number of sections. If two books have 21 sections, then only the second one will be returned.

This can be seen by entering the following URLs:

https://librivox.org/api/feed/audiobooks/?format=json&extended=1&limit=10&fields={id,title}
https://librivox.org/api/feed/audiobooks/?format=json&extended=1&limit=40&fields={id,title}

When you try to download 40 titles at a time, "This Side of Paradise" gets overwritten by "Merry Adventures of Robin Hood". No problem if you want the information for a single ID, but a problem if you want to download in bulk. For example, downloading 40 books only returns 33.

I haven't written a line of php code in my life, but looking at

https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php

the problem seems to be that the code creating sections information re-uses the variable $key, and changing it to $sectionkey in two places might fix the problem. Would be nice if anyone with access could have a look and maybe fix it.

My workaround to get all the information about all books in bulk is not too difficult: Download the two fields "id" and "num_sections" for all books. Then look for consecutive groups where no two have the same number of sections, and download those groups. Oh well.


Top
 Profile  
Offline
Post Posted:: February 12th, 2015, 10:42 am 

Joined: October 14th, 2011, 1:38 pm
Posts: 239
gnasher729 wrote:
If two books have 21 sections, then only the second one will be returned.


Excellent observation! I had noticed the bug, but hadn't thought about the implication.

Thanks for sharing your full set of notes.

Scott

_________________
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app


Top
 Profile  
Offline
Post Posted:: February 13th, 2015, 5:13 am 

Joined: December 31st, 2011, 7:17 am
Posts: 2030
Location: Tochigi,Japan
Hello gnasher729 san,

Good job.
gnasher729 wrote:
https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php

the problem seems to be that the code creating sections information re-uses the variable $key, and changing it to $sectionkey in two places might fix the problem. Would be nice if anyone with access could have a look and maybe fix it.

Everyone can access the code, as the code is on Github. The problem is we don't have data. As the contents are generated dynamically, we need both code and data to reproduce the problem and to fix the problem and to confirm the fix.
I feel if we cooperate, we can reconstruct the MySQL database and reproduce the problem and fix it.

Cheers,
Masa

_________________
My recrodings Community Audio
My GitHub Another LibriVox Catalog Statistics Sound_check


Top
 Profile  
Offline
Post Posted:: February 16th, 2015, 9:08 pm 

Joined: February 10th, 2015, 3:11 am
Posts: 5
Hi All,

I had posted my query earlier regarding Librivox APIs. Please see the link.
viewtopic.php?f=23&t=55408

there i got the right link for API related topics and i went through different discussions there.
I got few of my questions answered but still i have no idea about how to get genre information correctly without passing extended=1. When i pass extended=1 i don't get all the books as someone has mentioned it in the discussion.

"The books are returned in a dictionary. The dictionary key is the number of sections in the book. And since it is a dictionary, it can't contain two books with the same number of sections. If two books have 21 sections, then only the second one will be returned. "

without extended=1 i don't get Genre information. I can see this question is posted many times but there are no answers for that. Kindly guide me on how to get correct genre information without passing extended=1.
something like -
https://librivox.org/api/feed/audiobooks?limit=20&offset=0&format=json&fields=id,url_rss,language,genres


Regards
Yogesh


Top
 Profile  
Offline
Post Posted:: February 17th, 2015, 11:12 am 

Joined: October 14th, 2011, 1:38 pm
Posts: 239
gnasher729 wrote:
https://librivox.org/api/feed/audiobooks/?format=json&extended=1&limit=40&fields={id,title}

Another alternative: use XML (by dropping 'format=json&'). Not as convenient as JSON, but not a big deal.

Scott

_________________
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app


Top
 Profile  
Offline
Post Posted:: February 18th, 2015, 8:07 am 

Joined: February 12th, 2015, 8:50 am
Posts: 11
I figured out how to distinguish between complete and incomplete books using the LibriVox - just download the "Extended" data, check for the field "url_iarchive" (which is needed anyway to download thumbnails and the actual .mp3 files), and if that field isn't there or is empty, then the book is "In Progress".

However, it seems that I cannot distinguish between books that are actually "In Progress" and books that are "Abandoned". Any idea how to get this from the Librivox API data?


Top
 Profile  
Offline
Post Posted:: February 18th, 2015, 8:13 am 

Joined: February 12th, 2015, 8:50 am
Posts: 11
ScottLawton wrote:
Another alternative: use XML (by dropping 'format=json&'). Not as convenient as JSON, but not a big deal.

Scott


Well, it doesn't solve the problem. The problem is in the code that collects the data that gets returned; the bug turns the data from an array into a dictionary, and you often have duplicate keys.

The way I solved it: I download all the IDs first, 100 at a time.
Then I download the complete data, 20 at a time. More than 20 is pointless, because the more you download, the more duplicate keys, and the more data is missing. I record all the data, and remove all the IDs of the books that were returned from my list of IDs.
Then I download the missing data by id, one at a time.

(My plan is that an app will contain some version of that data built in, so it doesn't have to download everything, just the changes).


Top
 Profile  
Offline
Post Posted:: February 18th, 2015, 8:40 am 

Joined: October 14th, 2011, 1:38 pm
Posts: 239
ScottLawton wrote:
Another alternative: use XML (by dropping 'format=json&').

gnasher729 wrote:
Well, it doesn't solve the problem. The problem is in the code that collects the data that gets returned; the bug turns the data from an array into a dictionary, and you often have duplicate keys.

Oops, sorry. I just inspected the format that got returned rather than the list of items.

Scott

_________________
Cheers,

Scott
Aplt1.com - alternate LibriVox catalog that puts more info up front; optional iOS app


Top
 Profile  
Offline
Post Posted:: February 19th, 2015, 4:09 am 

Joined: December 31st, 2011, 7:17 am
Posts: 2030
Location: Tochigi,Japan
Hello gnasher san,
gnasher729 wrote:
However, it seems that I cannot distinguish between books that are actually "In Progress" and books that are "Abandoned". Any idea how to get this from the Librivox API data?

I think we cannot access abandoned projects through API.
But I could be wrong. Can you mention the abandoned projects with IDs?
Some projects are silent for several months. But they are treated as in Progress.

Cheers,
Masa

_________________
My recrodings Community Audio
My GitHub Another LibriVox Catalog Statistics Sound_check


Top
 Profile  
Offline
Post Posted:: February 19th, 2015, 4:15 am 

Joined: December 31st, 2011, 7:17 am
Posts: 2030
Location: Tochigi,Japan
Hello Yogesh san,
Welcome to LibriVox. I hope you enjoy it here.

Just look at the code.
https://github.com/LibriVox/librivox-public/blob/master/application/libraries/Librivox_API.php
Quote:
if ($extended)
{
$project['url_iarchive'] = $row['url_iarchive'];
//get sections
$project['sections'] = $this->_get_sections($row['id']);
if (!empty($project['sections']))
{
foreach ($project['sections'] as $key=>$section)
{
$project['sections'][$key]['readers'] =$this->_get_readers($section['id']);
}
}
// get genres
$project['genres'] = $this->_get_genres($row['id']);
// get translators
$project['translators'] = $this->_get_authors($row['id'], 'translator');
}

You need to pass 'extended=1' to get genres.
My solution is to call API 9300 times, and reconstruct the data base.
I agree it's crazy.

Cheers,
Masa

_________________
My recrodings Community Audio
My GitHub Another LibriVox Catalog Statistics Sound_check


Top
 Profile  
Offline
Post Posted:: February 20th, 2015, 10:56 am 

Joined: February 12th, 2015, 8:50 am
Posts: 11
ekzemplaro wrote:
Hello gnasher san,
I think we cannot access abandoned projects through API.
But I could be wrong. Can you mention the abandoned projects with IDs?
Some projects are silent for several months. But they are treated as in Progress.

Cheers,
Masa

Hi Masa,

The first three books without a link to archive.org are: id 282, "The Time Machine" is "abandoned". id 700, "Bleak House (version 2)" is "on hold", and id 1299, "Extermination of the American Bison" is "fully subscribed". One difference is that the first two have no "description", the second one has, but that is be coincidence; id 6706 "Cuentos de la Selva" is marked as abandoned and has a description.

So I can see no different between these books in the API data. It doesn't matter too much for my purposes; I think I'm not supposed to use the links for the mp3 files in the librivox update area anywhere, so for my purposes it doesn't make much difference whether a book is in progress or abandoned.


Top
 Profile  
Offline
Post Posted:: February 20th, 2015, 11:03 am 

Joined: February 12th, 2015, 8:50 am
Posts: 11
ekzemplaro wrote:
You need to pass 'extended=1' to get genres.
My solution is to call API 9300 times, and reconstruct the data base.
I agree it's crazy.

My solution was first to download just the ids (without extended), then download with extended=1 20 books at a time, keeping track which are missing, and downloading the rest, which is about 1,700. A simpler method would be downloading two at a time, which will will either return both books, or only the second one. So if you get only one book, you download the first of the two individually. That's about half the API calls and a lot faster.

And I'd recommend to everyone to write their code so that it still works if the API gets fixed. I use the JSON API, and with extended=0 I receive an array, but with extended=1 I receive a dictionary with meaningless keys.


Top
 Profile  
Offline
Post Posted:: February 22nd, 2015, 2:30 am 

Joined: December 31st, 2011, 7:17 am
Posts: 2030
Location: Tochigi,Japan
Hello gnasher san,
gnasher729 wrote:
id 282, "The Time Machine" is "abandoned".
id 700, "Bleak House (version 2)" is "on hold", and
id 1299, "Extermination of the American Bison" is "fully subscribed". One difference is that the first two have no "description", the second one has, but that is be coincidence;
id 6706 "Cuentos de la Selva" is marked as abandoned and has a description.

Thank you for these information.
I'll investigate about these projects.

Cheers,
Masa

_________________
My recrodings Community Audio
My GitHub Another LibriVox Catalog Statistics Sound_check


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 194 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13  Next



Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group