LibriVox API Discussion Thread

Comments about LibriVox? Suggestions to improve things? News?
Isana
Posts: 276
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana » March 17th, 2014, 3:05 am

For third party developers, there is also the Internet Archive JSON API. Librivox has its own collection at IA (librivoxaudio), which makes it easy to find Librivox uploads. You can use IA's advanced search for queries and choose which fields you want returned and the sorting order. IA has more resources, and its servers are also probably better able to handle queries than Librivox servers.

ekzemplaro
Posts: 2030
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro » March 17th, 2014, 5:33 am

Hello Vassil san,
vpanayotov wrote:Unfortunately, judging from the other answers this may not happen anytime soon, so I guess we will have to work around the current semi-broken implementation of the API.
My advice is just use basic API. Search APIs don't work fine.
vpanayotov wrote:so you are basically iterating over a predefined range of ids, and I guess you are somehow filtering out the '{"error":"Audiobooks could not be found"}' entries later.
Yes. It's a bit complicated script.

Isana san,
Isana wrote:IA has more resources, and its servers are also probably better able to handle queries than Librivox servers.
Just an information. LibriVos servers are on IA servers.
When LibriVox becomes open-source, I'll write an JSON API similar to thouse at IA.

Cheers,
Masa

Isana
Posts: 276
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana » March 17th, 2014, 5:56 am

ekzemplaro wrote:
Isana san,
Isana wrote:IA has more resources, and its servers are also probably better able to handle queries than Librivox servers.
Just an information. LibriVos servers are on IA servers.
That's good to know, thanks. I thought IA only hosted the audio files. Still, there is at present a functioning IA API that exists that can access Librivox records that third party developers could use.

ekzemplaro
Posts: 2030
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro » March 18th, 2014, 4:06 am

Hello Isana san,

Let me add one more advice.
Just imagine somebody wants to listen to 'The Last Leaf'.
When searched at IA, the result is
Search Results
Results: 1 through 0 of 0 (0.112 secs)
You searched for: The Last Leaf AND collection:audio_bookspoetry
Your search did not match any items in the Archive. Suggestions:
When searched at LibriVox, the result is
The Last Leaf (in Short Story Collection Vol. 002 )
Sometimes IA is better, sometimes LibriVox is better.
So we need to improve LibriVox when it becomes open-sourced.

Cheers,
Masa

Isana
Posts: 276
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana » March 18th, 2014, 5:17 am

ekzemplaro wrote:Hello Isana san,

Let me add one more advice.
Just imagine somebody wants to listen to 'The Last Leaf'.
When searched at IA, the result is
Search Results
Results: 1 through 0 of 0 (0.112 secs)
You searched for: The Last Leaf AND collection:audio_bookspoetry
Your search did not match any items in the Archive. Suggestions:
When searched at LibriVox, the result is
The Last Leaf (in Short Story Collection Vol. 002 )
Sometimes IA is better, sometimes LibriVox is better.
So we need to improve LibriVox when it becomes open-sourced.

Cheers,
Masa
Actually, the metadata for individual titles and authors in collections are there in the IA details page (in the form of xml files). Extracting the information to populate a database requires a little bit more of coding, but it's nothing that a good programmer couldn't handle.

What cannot be linked to individual audio files are the readers (but I can think of a kludge fix even for this), although the readers' names seem to be given in the description in more recent uploads, which could at least be used for searches.

I guess what I'm saying is, if I were a resourceful developer ready to do work, I would see that, as with anything else in programming, there would be glitches to overcome, but I would also see that there are workarounds and a viable alternative to the Librivox API. And that I am not at the mercy of open-sourcing (which is probably a separate issue from making improvements to Librivox anyway.)

ekzemplaro
Posts: 2030
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro » March 19th, 2014, 3:48 am

Hello Isana san,

Thank you for a good suggestion.
Isana wrote:I would see that, as with anything else in programming, there would be glitches to overcome, but I would also see that there are workarounds and a viable alternative to the Librivox API.
I invest to develop a new catalog system using the available information.
Needless to say a better one than LibriVox catalog and IA catalog.

As of now the difficult thing is how to distinguish collections from books.
For collections the system needs to search all section titles.
For books the system doesn't need to search section titles.

One idea is to use key words like 'collection', 'stories' in the title. But this is not perfect.
Does somebody have a better idea?

Cheers,
Masa

vpanayotov
Posts: 10
Joined: March 14th, 2014, 2:42 am

Post by vpanayotov » March 19th, 2014, 6:33 am

ekzemplaro wrote: One idea is to use key words like 'collection', 'stories' in the title. But this is not perfect.
Does somebody have a better idea?
Well, I'm still familiarizing myself with LibriVox, but maybe you can add a "collection" genre tag? I believe there are other not-quite-genres, like "dramatic readings". Admittedly still not perfect though...

Vassil

TriciaG
LibriVox Admin Team
Posts: 38225
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG » March 19th, 2014, 6:46 am

That discussion goes beyond the API. If you want to talk about how to make a whole new catalog system (or an improved one) when the code goes public, you should start a new thread rather than clutter up the API thread. :)

When the source code goes public, Masa, you'll see that there's a check box we select to indicate if a project is a collection or not.
Fiction, partly about jail atrocities: It Is Never too Late
E E Cummings' time in French prison: The Enormous Room

ekzemplaro
Posts: 2030
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro » March 20th, 2014, 4:19 am

Hello Vassil san,

Thank you for your suggestion.
vpanayotov wrote:but maybe you can add a "collection" genre tag?
I'm afraid this kind of proposals are not accepted.
So we need to work with only what we alrady have.

Tricia san,
TriciaG wrote:you should start a new thread rather than clutter up the API thread.
Sure.
TriciaG wrote:you'll see that there's a check box we select to indicate if a project is a collection or not.
Thank you for a usefule information. So if this one bit information is interfaced through API.
If somebody finds out the method to get this information, please let me know.

Cheers,
Masa

DrewJ
Posts: 832
Joined: July 16th, 2013, 5:30 pm
Location: Memphis
Contact:

Post by DrewJ » March 20th, 2014, 5:02 am

TriciaG wrote:That discussion goes beyond the API. If you want to talk about how to make a whole new catalog system (or an improved one) when the code goes public, you should start a new thread rather than clutter up the API thread. :)

When the source code goes public, Masa, you'll see that there's a check box we select to indicate if a project is a collection or not.
May I ask what language the code is in?
When the hurlyburly's done,
When the battle's lost and won. -Second Witch
Read some poetry?

TriciaG
LibriVox Admin Team
Posts: 38225
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG » March 20th, 2014, 6:41 am

"The backend is PHP built on the CodeIgniter framework, with some Javascript." Source: a post in the admin section, about a year ago. I assume it's still correct.
Fiction, partly about jail atrocities: It Is Never too Late
E E Cummings' time in French prison: The Enormous Room

ekzemplaro
Posts: 2030
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro » March 22nd, 2014, 3:06 am

Hello everybody,
ekzemplaro wrote: TriciaG wrote:
>> you should start a new thread rather than clutter up the API thread.

Sure.
I started the thread Another LibriVox Catalog.

I welcome your feedback.

Cheers,
Masa

vikas1
Posts: 3
Joined: May 13th, 2014, 6:45 am

Post by vikas1 » May 15th, 2014, 1:09 am

Hi ,
I want to integrate librivox api tp my wordpress site , but I cant find a way to start it . Can you please tell me where should I start .In api documentation a book url is given which opens an xml content . How can I use this url in wordpress.

Thanks.

tbook
Posts: 77
Joined: May 12th, 2012, 7:01 am

Post by tbook » June 16th, 2014, 7:25 pm

Sorry to be a bit slow in replying, but for the anyone who is still following this thread, the best way to integrate librivox into a web site may be to use the RSS feeds. You can find them at: https://librivox.org/pages/librivox-feeds/

I did a quick google on wordpress / rss and found: http://wordpress.org/plugins/wp-o-matic/ Some other alternative may be better for you, though.
Working on iOS and Android apps for LibriVox. You can see the comments from the apps on our web site: LibriVox Audio Books

joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris » June 30th, 2014, 12:11 am

I'm a software developer. I have started to build a php-based web utility to show what texts exist in Gutenberg but not in Librivox. My inspiration to do this came when I was trying to find something to record for the Science Fiction short stories collections, and I wanted to find a story that had not been recorded before, and that took me almost as much time (flipping back and forth between LibriVox and lists of Sci-Fi collections on Gutenberg, searching for the same works in both) as it did to actually record the story. What I've got so far is just on the basis of by author, but eventually I'd like it to be by genre and file size (i.e., recording length), so you can think "I'd like to read a nonfiction story that's less than an hour" and see what in PG hasn't been recorded yet. However, like others here, I'm a bit stuck because the LibriVox API isn't giving me complete data.

Demo of what I've got so far: https://xenotropic.net/gutenovox/
Source code: https://github.com/xenotropic/gutenovox

On the API, when one does a query by Author, it does not return all the works by that Author. Let's say for example that I wanted to see what short stories there are on LibriVox by Philip K. Dick. If I ask for that from the API, I get seven results:

https://librivox.org/api/feed/audiobooks?author=dick&format=php&fields={id,title,authors,url_text_source,url_librivox}

But if I go to PKD's Librivox page, there are many more hits -- twenty-eight of them.

https://librivox.org/author/558

It looks like maybe the API is not providing collaborative projects. If you choose "solo" on that last link, you get eight recordings, which is pretty similar to the seven the API gives.

The API docs also describe a "Simple Authors API", but as far as I can determine, it only gives the author's ID, name, and date of birth, not any of their works, e.g., https://librivox.org/api/feed/authors/id/558

If anyone is aware of how to get all the works for an author through the API, please let me know.

I would also really like to see a dump of the database on a regular basis. I know that request has been made and refused, but it makes a lot of sense. It's simple engineering-wise (it's a single line of code, 'mysqldump [databasename] > filename', and then you have to schedule that command to run once a day/week at off-hours), isn't that large (Project Gutenberg publishes its catalog data, and the whole catalog is half the size of the LibriVox recording of A Tale of Two Cities), and while perhaps not as ideal as an API (the structure can change, and it's not 100% up to date), it would definitely make the entirety of the catalog information available to those who want to program with it. So please keep considering that in your engineering priorities.

Masa, how often is your db_catalog.json updated on github? How is it generated? I might try using that.

Thanks,

Joe

Post Reply