Is there an easy way to browse what has NOT been recorded?

Post your questions & get help from friendly LibriVoxers
joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris »

Is there a way to see what works exist on Gutenberg but not in Librivox? I'm sure a lot of stuff comes up in the forums from people checking, but it just seems like it would be neat to be able to browse or search through listings of what is public domain and proofread, but hasn't been audio recorded yet.

If that does not exist, is there a way to download the Librivox catalog in one go? Not the audio files, but the catalog info. Might give a shot and seeing if I can program something up.

Joe
RuthieG
Posts: 21957
Joined: April 17th, 2008, 8:41 am
Location: Kent, England
Contact:

Post by RuthieG »

There is no way currently to compare the PG catalogue with the LibriVox catalogue.

You may find helpful: http://librivox.org/api/info
Or Masa San's own version of the catalogue: http://ekzemplaro.org/librivox/catalog/

I'd point out that, although PG is the source used by many (most?) readers, we also use many other sources, including pre-1923 scans on the Internet Archive and Hathi Trust.

Ruth
My LV catalogue page | RuthieG's CataBlog of recordings | Tweet: @RuthGolding
philchenevert
LibriVox Admin Team
Posts: 24590
Joined: October 17th, 2010, 9:23 pm
Location: Basking by the Bayou
Contact:

Post by philchenevert »

Hi Jo and welcome to LibriVox. I don't know an answer to your question but when I find something on Gutenberg I like, I just open a tab and search our catalog to see if it is there yet. Very low tech but it works for me.
"I lost my trousers," said Tom expansively.
89 Decibels? Easy Peasy ! https://youtu.be/aSKR55RDVpk
joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris »

I made a webpage to do this: http://xenotropic.net/gutenovox/

It's essentially the Gutenberg catalog; you can search for categories (Library of Congress subject headings) or by author. When you click on the category or author, it shows you all the texts, with an added column with links to Librivox recordings of that work.

Because the librivox and gutenberg catalogs don't "join" perfectly -- I'm having to match works within collections (e.g., the science fiction short story collections) by title, and not all titles match 100% -- some recordings may not show up. But most of them do, probably 90%. If people like this and are using it I can probably manage to get the other 10%, but I wanted to see if folks think it is useful first before putting more time into it.
Bassaga
Posts: 80
Joined: June 15th, 2014, 8:03 pm
Location: USA
Contact:

Post by Bassaga »

I haven't been here long, and dare not speak for the whole group, but for myself, I say that you, joemorris, are amazing. That's a darned handy website you cooked up!
"In this world... you must be oh so smart, or oh so pleasant. Well, for years I was smart; I recommend pleasant."
Elwood P. Dowd, "Harvey"
TriciaG
LibriVox Admin Team
Posts: 61053
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

Cool!

How did you calculate estimated reading time? The estimates are about 70% of the actual length on my Pansy solos. :) (Not that I am the standard for reading speed, but still...)

The Count of Monte Cristo is 59% of the actual length. :hmm:

I'm not criticizing; I'm just curious. :)
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
RuthieG
Posts: 21957
Joined: April 17th, 2008, 8:41 am
Location: Kent, England
Contact:

Post by RuthieG »

It is extremely well done, thank you, Joe!

May I ask a question? Does it only work for LibriVox recordings where the source text used is from Project Gutenberg? What I mean is, if PG has the text, but the LibriVox recording has been done from another text source. Example: Beautiful Stories from Shakespeare which uses the text from mainlesson.com and not from http://www.gutenberg.org/ebooks/1430, and hence the table suggests that there is no LV recording.

Ruth
My LV catalogue page | RuthieG's CataBlog of recordings | Tweet: @RuthGolding
joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris »

@RuthieG: Good question, and not something I thought of before (that a LibriVox reader would use another text when a Gutenberg text is available). The answer is different for LibriVox works based on a work that has one author (e.g., novels) and for tracks in LibriVox collections (e.g., short story collections). For one-author works, the recording will only show up if the etext url that is in the LibriVox database points to the gutenberg text. For collections, it matches by title and so (unless the title has been altered) it should come up regardless of what text was used.

Originally I thought using the Gutenberg etext identifier would be a more precise match and I just didn't use it for the tracks in collections because the LibriVox API doesn't provide it for them (although it is in the LibriVox database). But based on your point, I may just change the matching tactic for whole works as well as tracks to last name of author plus a "messy match" to the title that allows some variation. I've noticed some texts have, for example, "Vol. 1" appended in Gutenberg or the LibriVox title may drop "A" or "The" or a comma, so that scheme would address those problems also -- "the other 10%" I mentioned in my last post. Might take me a bit before I get to it, but that seems like the optimal matching plan.

@TriciaG: I used the estimates on http://wiki.librivox.org/index.php/Science_Fiction_Short_Stories which imply a formula of 1421 bytes in the plain-text PG file to one minute of reading time. I dunno who created that page or what their basis was. I've only done one recording for the latest (and not yet published) Sci Fi Short story collection, and for that the estimate was 125% of my actual time (i.e., I was faster than the estimate). Seems like I should (a) read more slowly; and (b) find the actual average reading time of LibriVox readers to use for that estimate.

Also, criticism and comments are welcome, all helps me make something people will use.
Availle
LibriVox Admin Team
Posts: 22485
Joined: August 1st, 2009, 11:30 pm
Contact:

Post by Availle »

joemorris wrote: @TriciaG: I used the estimates on http://wiki.librivox.org/index.php/Science_Fiction_Short_Stories which imply a formula of 1421 bytes in the plain-text PG file to one minute of reading time. I dunno who created that page or what their basis was. I've only done one recording for the latest (and not yet published) Sci Fi Short story collection, and for that the estimate was 125% of my actual time (i.e., I was faster than the estimate). Seems like I should (a) read more slowly; and (b) find the actual average reading time of LibriVox readers to use for that estimate.
What an interesting way to compute the recording time!

I think the plain-text PG files also contain the PG licence at the end, often some sort of intro, a table of contents... all things that we do not read, so there is automatically an overestimation of the length of the piece. It should be worse for shorter recordings, of course. Maybe there is a way to re-estimate? I think at least the PG licence has the same length everywhere, so this could be automatically deducted from the file size.

ETA: we usually say that 4000 WORDS equal 30 minutes of recording. How to translate that into byte-size though... :hmm:

In any case: great job! :thumbs:
Cheers, Ava.
Resident witch of LibriVox, channelling
Granny Weatherwax: "I ain't Nice."

--
AvailleAudio.com
Darvinia
LibriVox Admin Team
Posts: 3254
Joined: March 15th, 2009, 8:38 pm
Location: Alberta, Canada
Contact:

Post by Darvinia »

joemorris wrote:I've only done one recording for the latest (and not yet published) Sci Fi Short story collection, and for that the estimate was 125% of my actual time (i.e., I was faster than the estimate). Seems like I should (a) read more slowly; and (b) find the actual average reading time of LibriVox readers to use for that estimate.

Also, criticism and comments are welcome, all helps me make something people will use.
Here's the explanation for your 125%. You only read half the file. The entire document at gutenberg is 6693 words. The story alone is 3626 words. For 24.5 minutes you read at a rate of 148 wpm. (Average-no need to slow down) BUT -- if you had read the whole document (which is what your website calculation uses) in 24.5 minutes your speed would have been 273 wpm. (25% faster than the estimate.) Hence the discrepancy. Which is exaggerated in smaller files as the PG information text takes up a larger percentage of the whole.
Bev

There's nothing you can't prove if your outlook is only sufficiently limited. - Lord Peter Wimsey
I yam what I yam, and that's all what I yam - Popeye, the sailor man
If you choose not to decide, you still have made a choice - Neil Peart
12696
joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris »

Excellent points. Hadn't really paid attention to the PG license before (although I should have, I'm a lawyer!). I deleted from the filesize for the license and header (18233 bytes) and changed the conversion ratio to 900 bytes = 1 min. TriciaG's reading of The Chautauqua Girls at Home is now at 105%, Monte Cristo (v3) is now at 96%, and I'm right there in the middle at 100% with Lanier's Join Our Gang.
TriciaG
LibriVox Admin Team
Posts: 61053
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

That appears much better! Yay! 8-)

I like being over 100%. I feel like an overachiever that way.
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
kayray
Posts: 11828
Joined: September 26th, 2005, 9:10 am
Location: Union City, California
Contact:

Post by kayray »

That is super-cool, Joe.

Is there a way to make your search ignore gutenberg's audiobooks? (Which are usually OUR audiobooks, anyway.)

E.g. a search on Jane Austen gives this page:

http://xenotropic.net/gutenovox/index.php?author_gid=873&contrib_gid=

Some of the entries are oddly short, such as the first "Emma" which weighs in at 7 minutes. Clicking it reveals that it's an audiobook. I suppose 7 minutes is the amount of time estimated to read the list of audio files: http://www.gutenberg.org/files/22962/22962-index.html

It's not a big deal, but it would be nifty if those audiobook results were ignored!
Kara
http://kayray.org/
--------
"Mary wished to say something very sensible into her Zoom H2 Handy Recorder, but knew not how." -- Jane Austen (& Kara)
joemorris
Posts: 12
Joined: June 25th, 2014, 9:41 pm
Location: Oakland, CA
Contact:

Post by joemorris »

Thanks Kara, that is fixed, audiobooks should now be ignored -- no Austen works under 2 hours anymore.
ekzemplaro
Posts: 2027
Joined: December 31st, 2011, 7:17 am
Location: Tochigi,Japan
Contact:

Post by ekzemplaro »

Hello Joe san,

Good job.
I think you can make it better, if all book information at LibriVox is provided to you.
joemorris wrote:Originally I thought using the Gutenberg etext identifier would be a more precise match and I just didn't use it for the tracks in collections because the LibriVox API doesn't provide it for them (although it is in the LibriVox database).
I checked the database structure.
There's a column 'source' in the table sections.

Cheers,
Masa
Post Reply