LibriVox API Discussion Thread

Comments about LibriVox? Suggestions to improve things? News?
TriciaG
LibriVox Admin Team
Posts: 62672
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

wb415 wrote: April 10th, 2023, 4:23 pm Hi, all. I came to this forum because I was interested in whether or not the API endpoint includes any parameter to specify a sort order for search results.

I appreciate the link to the github repo, which allowed me to quickly read through the code and confirm that no such parameter exists.

I don't know if there's currently anybody at Librivox who is maintaining this code.. for example, to accept pull requests and accept new features and improvements from the developer community. On the off-chance there is, I made a pull request containing a very simple update to add this feature. I hope it can be tested and included on the live server.
There is someone that monitors pull requests and accepts code. The question is when he can test and implement it. :)
redrun
LibriVox Admin Team
Posts: 4132
Joined: August 11th, 2022, 8:32 pm
Contact:

Post by redrun »

Nobody get too excited, but I intend to fix a bug in one part of the API. This part has been bugged a long time, and I hope nobody's been using it. It's only marginally useful as-is, and is liable to cause trouble if somebody starts using it too much. :help:

So: if anybody's app uses the 'listen_url' of individual sections in the API, please have a look here:
https://github.com/LibriVox/librivox-catalog/issues/211#issue-2216067383

If you use the zip URL, the RSS feed or the M4B, you can safely ignore. 8-)
redrun
LibriVox Admin Team
Posts: 4132
Joined: August 11th, 2022, 8:32 pm
Contact:

Post by redrun »

Cover art is now included in our API
For anyone who uses scraping, or RSS, or some other method of pulling cover art for our projects, we now have an official way within the LibriVox API. Here's the PR from @fakerybakery, and the full API info can, as always, be found at /api/info.

In short, adding '&coverart=1' to an API query for projects, will add three new fields to the response:
  • coverart_jpg - URL of the original-quality cover art image
  • coverart_thumbnail - URL of a lower-quality thumbnail
  • coverart_pdf - URL of a "jacket sleeve" CD-cover page, presenting track run-times and such
More details can be gleaned from the instructions our volunteers follow in creating them (caveat emptor - may change!):
viewtopic.php?p=523032#p523032

Also, though anyone who currently gets images by other means will already know this: those images will NOT be available on most freshly cataloged projects. The delay between a project being added to the catalog and feeds, and the cover art being available, will vary. I'd venture that it's usually less than a week, but you'll want a sanity check in place. :wink:
Vitaliy
Posts: 16
Joined: January 29th, 2025, 9:09 am

Post by Vitaliy »

How do you get authors of sections?
I.e. this book https://librivox.org/37-american-poems-by-various/ has an author for each section specified, but
this https://librivox.org/api/feed/audiobooks?id=3350&extended=1 and this https://librivox.org/api/feed/audiotracks?project_id=3350 calls doesn't have this info. How do you get it then?
redrun
LibriVox Admin Team
Posts: 4132
Joined: August 11th, 2022, 8:32 pm
Contact:

Post by redrun »

Vitaliy wrote: January 31st, 2025, 1:51 am How do you get authors of sections?
I.e. this book https://librivox.org/37-american-poems-by-various/ has an author for each section specified, but
this https://librivox.org/api/feed/audiobooks?id=3350&extended=1 and this https://librivox.org/api/feed/audiotracks?project_id=3350 calls doesn't have this info. How do you get it then?
That information seems to be in our database (at least, for projects like this one that are "Collections" from various source texts), and in our site code, but not in the API at this time.

I can put that on my personal to-do list (and in our Github Issues list), but it would be a fairly low, as I have other things I want to improve in the time I have.
If you wanted this available sooner (though: no guarantees!), then you could put together a new Pull Request. I'd suggest modeling your PR after this recent one, and perhaps including another field or two that goes along with the theme. For example, the "source" field is what fills in the "Etext" link, and both of these fields are only used/useful for these Collection-type projects.
Vitaliy
Posts: 16
Joined: January 29th, 2025, 9:09 am

Post by Vitaliy »

redrun wrote: February 2nd, 2025, 4:20 pm I can put that on my personal to-do list (and in our Github Issues list), but it would be a fairly low, as I have other things I want to improve in the time I have.
Is it something around ~6 months with no upper limit? I don't know PHP so I will have to wait.
HTML parsing and this feature are the only ways to obtain this info, am I right?
annise
LibriVox Admin Team
Posts: 39725
Joined: April 3rd, 2008, 3:55 am
Location: Melbourne,Australia

Post by annise »

Can we ask why you feel you need this and what you are planning to do with it ? All our information is displayed in our catalogue and we don't use our scarce volunteer expertise for every idea people come up with. Our Catalogue is PD and you can use it legally any way you want I know but why after all these years do you feel this is something we should add ?

Anne
Vitaliy
Posts: 16
Joined: January 29th, 2025, 9:09 am

Post by Vitaliy »

annise wrote: February 3rd, 2025, 2:28 am Can we ask why you feel you need this and what you are planning to do with it ?
I've uploaded all the data into graph database and I'm using it and its query language to get interesting insights, stats, even to find books and readers to listen to. Obviously, it's not full without author->section relationship and therefore I can't reliably search for recordings by an author (because recordings from compilations wouldn't be included).
I don't need an API itself, I just need the data. If LV was publishing dumps, I wouldn't have touched an API once.
annise wrote: February 3rd, 2025, 2:28 am we don't use our scarce volunteer expertise for every idea people come up with.
Since with current version of an API you can get all authors, all books, all sections and all relationships between these entities except for author->section, probably this last bit will end up in an API sooner or later. I don't see how it's less useful than author of a book or reader of a section.
annise wrote: February 3rd, 2025, 2:28 am Our Catalogue is PD and you can use it legally any way you want I know
If I'm not mistaken Catalogue is available in two forms: HTML pages and API. API doesn't have this info, and downloading/parsing thousands of HTML pages is not a choice for me. So, I will have to wait. Maybe I will come up with something in PHP
as redrun proposed.

Below is the schema of DB I've got so far; I wonder if anyone will find it interesting.
Schema chart (Image, Github)
annise
LibriVox Admin Team
Posts: 39725
Joined: April 3rd, 2008, 3:55 am
Location: Melbourne,Australia

Post by annise »

Thank for taking the time to answer. Anne
redrun
LibriVox Admin Team
Posts: 4132
Joined: August 11th, 2022, 8:32 pm
Contact:

Post by redrun »

Vitaliy wrote: February 3rd, 2025, 4:06 am If I'm not mistaken Catalogue is available in two forms: HTML pages and API. API doesn't have this info, and downloading/parsing thousands of HTML pages is not a choice for me. So, I will have to wait. Maybe I will come up with something in PHP
as redrun proposed.

Below is the schema of DB I've got so far; I wonder if anyone will find it interesting.
Schema chart (Image, Github)
Correct on the two forms in which we publish our catalog - and that graph database schema looks nice! It seems like a much neater way to work with something like a public catalog. 8-)

edited by annise
TheBanjo
Posts: 1587
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

Vitaliy wrote: February 3rd, 2025, 4:06 am
I've uploaded all the data into graph database and I'm using it and its query language to get interesting insights, stats, even to find books and readers to listen to. Obviously, it's not full without author->section relationship and therefore I can't reliably search for recordings by an author (because recordings from compilations wouldn't be included).
I don't need an API itself, I just need the data. If LV was publishing dumps, I wouldn't have touched an API once.

If I'm not mistaken Catalogue is available in two forms: HTML pages and API. API doesn't have this info, and downloading/parsing thousands of HTML pages is not a choice for me.

...

Below is the schema of DB I've got so far; I wonder if anyone will find it interesting.
Schema chart (Image, Github)
Hi Vitaliy,

I notice in your schema that you have a field called "views" for a book. It's true that archive.org (where our audiobooks are hosted) DOES keep track of views (which are accessible via an API of their own). However, it's possibly worth pointing out that this data does not reside inside the Librivox public catalog as such, where there is no field serving this purpose defined. The only way you will ever be able to get that data is via a separate query to the archive.org API.

I mention this only because it may have implications for the kinds of "insights, stats" you will be able to derive from the Librivox catalog alone.
annise
LibriVox Admin Team
Posts: 39725
Joined: April 3rd, 2008, 3:55 am
Location: Melbourne,Australia

Post by annise »

I don't have any official information from IA but it seems that the "views" have not been updated since Jan 1st - all our projects since then have zero views which can't be correct unless they have changed how it works.
They never really gave a count of people who listened to a project - I think they described it as "interacted"

Anne
Vitaliy
Posts: 16
Joined: January 29th, 2025, 9:09 am

Post by Vitaliy »

TheBanjo wrote: February 7th, 2025, 12:36 am Hi Vitaliy,

I notice in your schema that you have a field called "views" for a book. It's true that archive.org (where our audiobooks are hosted) DOES keep track of views (which are accessible via an API of their own). However, it's possibly worth pointing out that this data does not reside inside the Librivox public catalog as such, where there is no field serving this purpose defined. The only way you will ever be able to get that data is via a separate query to the archive.org API.

I mention this only because it may have implications for the kinds of "insights, stats" you will be able to derive from the Librivox catalog alone.
Hi.
Yes, also LV API doesn't provide a date-time of when book/section have been uploaded, so I've got book.views and book.dateUploaded fields with a single request from InternetArchive. I already have all the data depicted in schema; visualisation image is built automatically.

Also there is a somewhat obscure Group type in domain model which is not presented in LV API, but it's not as important.
annise wrote: February 7th, 2025, 2:41 am I don't have any official information from IA but it seems that the "views" have not been updated since Jan 1st - all our projects since then have zero views which can't be correct unless they have changed how it works.
They never really gave a count of people who listened to a project - I think they described it as "interacted"
In data I've got last month last book with non-zero views is this one: 2024-12-08, API returned downloads count = 64, while right now according to HTML it has 23361 views.
Right now IA API returns 23361 views for this book, and last non-zero is this one: 2024-12-17. Looks like a simple delay.
TriciaG
LibriVox Admin Team
Posts: 62672
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

Isn't December 17th right around when IA came back from its big outage and rebuild? There are other functions still not working on the IA site (such as "history" of a project when you're logged in as the person who uploaded it); I think Views is a function that still hasn't been restored.
Post Reply