How Often are Librivox Books Listened To?

Comments about LibriVox? Suggestions to improve things? News?
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

On March 1, TheBanjo included in one of his messages a link to a spreadsheet with data about 10,670 audiobooks that came originally from Project Gutenberg.
Almost all the audiobooks listed one or more LOC subject headings as topics. I used Excel to break up the grouped subject headings, and put them into one long column. The 10,670 books yielded 28,677 subject headings.

Then I manually began eliminating duplicates (where the same keyword was used in several books). I don't know an automated way to do that, so I only got through the first 1,500 rows (5% of the total). Eliminating duplicates left me with 508 unique headings. I assume that if I did the whole list, I would end up with about 500 X 20, or 10,000 unique keywords. And this only covers about 55% of the books Librivox currently has.

I just thought I would put this out there. Hopefully there is someone who knows where we should go from here.
Last edited by TedL on March 3rd, 2024, 2:32 am, edited 1 time in total.
TheBanjo
Posts: 1309
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

In my time working as a technical writer (for around 15 years) I found that simple instructions were often the best. If you look at the help for any Apple product, you will see Apple very much follow this philosophy.

If we were going to be giving our users a 'procedure' to follow, I think it could be as simple as this:

Go to Internet Archive Librivox Free Audiobook Collection and enter your search term in quote marks (eg, "humorous stories", "Catholic church") (where "Internet Archive Librivox Free Audiobook Collection" would be hotlinked).

Anyone who can't follow that as a procedure probably isn't going to be capable of playing an audiofile anyway, let alone understanding its contents.

HOWEVER, I cannot see any good location on the present website where such a sentence could be placed without completely spoiling the (actually, very elegant and simple) design of our website's opening pages.

I therefore have a different suggestion.

How about, immediately below the Submit button on the Advanced Search block of text, we place the following:

Still can't find what you're looking for? Try searching by title, author, reader, key terms, or terms appearing in book descriptions at Internet Archive Librivox Free Audiobook Collection. (where again "Internet Archive Librivox Free Audiobook Collection" would be hotlinked).
TheBanjo
Posts: 1309
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

TedL wrote: March 2nd, 2024, 2:18 pm Then I manually began eliminating duplicates (where the same keyword was used in several books). I don't know an automated way to do that
Hi Ted, I'm no Excel guru at all. However, when I search for "Excel remove duplicates", the very first hit I got advised "To remove duplicate values, click Data > Data Tools > Remove Duplicates."
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

TheBanjo wrote: March 2nd, 2024, 3:32 pm In my time working as a technical writer (for around 15 years) I found that simple instructions were often the best. If you look at the help for any Apple product, you will see Apple very much follow this philosophy.

If we were going to be giving our users a 'procedure' to follow, I think it could be as simple as this:

Go to Internet Archive Librivox Free Audiobook Collection and enter your search term in quote marks (eg, "humorous stories", "Catholic church") (where "Internet Archive Librivox Free Audiobook Collection" would be hotlinked).

Anyone who can't follow that as a procedure probably isn't going to be capable of playing an audiofile anyway, let alone understanding its contents.

HOWEVER, I cannot see any good location on the present website where such a sentence could be placed without completely spoiling the (actually, very elegant and simple) design of our website's opening pages.

I therefore have a different suggestion.

How about, immediately below the Submit button on the Advanced Search block of text, we place the following:

Still can't find what you're looking for? Try searching by title, author, reader, key terms, or terms appearing in book descriptions at Internet Archive Librivox Free Audiobook Collection. (where again "Internet Archive Librivox Free Audiobook Collection" would be hotlinked).
Good suggestions. I agree.
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

TheBanjo wrote: March 2nd, 2024, 3:37 pm
TedL wrote: March 2nd, 2024, 2:18 pm Then I manually began eliminating duplicates (where the same keyword was used in several books). I don't know an automated way to do that
Hi Ted, I'm no Excel guru at all. However, when I search for "Excel remove duplicates", the very first hit I got advised "To remove duplicate values, click Data > Data Tools > Remove Duplicates."
That worked - thanks for saving my eyes. I'm left with 8,950 unique keywords. My guess is that unique keywords for all the 19,000 books would be around 15,000. Let me just list a few selected examples, so everyone knows what we're talking about:

Austen, Jane, 1775-1817 -- Correspondence
Australia -- Fiction
Australia -- Gold discoveries -- Fiction
Australia -- History -- Anecdotes
Authors -- Biography
Authors, American -- 19th century -- Biography
Authors, American -- 19th century -- Correspondence
Authors, American -- 19th century -- Diaries
Automobile industry and trade -- United States -- History
Automobile racing -- Juvenile fiction
Automobile travel
Aversion -- Juvenile fiction
Ayesha (Fictitious character : Haggard) -- Fiction
B-17 bomber -- Juvenile fiction
Babbage, Charles, 1791-1871
Babylon (Extinct city)
Bad Nauheim (Germany) -- Fiction
Badgers -- Juvenile fiction
Badlands (S.D. and Neb.) -- Fiction
Baking
Battles -- Juvenile fiction
Bayern (Bavaria, Germany : Province) -- Fiction
Beagle Expedition (1831-1836)
Bean, Sawney (Legendary character) -- Fiction
TheBanjo
Posts: 1309
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

TedL wrote: March 2nd, 2024, 4:06 pm
TheBanjo wrote: March 2nd, 2024, 3:37 pm
TedL wrote: March 2nd, 2024, 2:18 pm Then I manually began eliminating duplicates (where the same keyword was used in several books). I don't know an automated way to do that
Hi Ted, I'm no Excel guru at all. However, when I search for "Excel remove duplicates", the very first hit I got advised "To remove duplicate values, click Data > Data Tools > Remove Duplicates."
That worked - thanks for saving my eyes. I'm left with 8,950 unique keywords. My guess is that unique keywords for all the 19,000 books would be around 15,000. Let me just list a few selected examples, so everyone knows what we're talking about:
...
Battles -- Juvenile fiction
Bayern (Bavaria, Germany : Province) -- Fiction
Beagle Expedition (1831-1836)
Bean, Sawney (Legendary character) -- Fiction
Pretty interesting, huh? If for 19,000 books we had (around) 15,000 key terms, it's pretty easy to see that most of these key terms are going to have been applied to only one book in our collection — which obviously makes many of these key terms all but useless for someone wanting to find "similar" books in our collection. I mean, how many books in our collection are likely to deal with " Bayern (Bavaria, Germany : Province) -- Fiction" or "Bean, Sawney (Legendary character) -- Fiction"?

An interesting question to ask, though, at this point, would be something like "what are the 1000 (or 500) most frequently used key terms, and to how many books does each apply". Then one day, when librivox.org wins a HUGE grant and gets to make a zillion interface improvements (yeah, right), someone could suggest something like "How about for each book's catalogue page we check to see if there are at least x other books to which the same key term has been applied, and if there are at least x, then we display that key term with a hyperlink, clicking which would allow the user immediately to see a listing of these similarly key termed books". No, I know it's not going to happen. Strictly a thought experiment.
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

Incidentally,
I looked up the front LibriVox page on the Wayback Machine from 2 March 2014 - 10 years ago. It is virtually identical to the Librivox front page of today, except that one sentence has been added at the bottom of the footer.
TriciaG
LibriVox Admin Team
Posts: 60810
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

TedL wrote: March 2nd, 2024, 4:21 pm Incidentally,
I looked up the front LibriVox page on the Wayback Machine from 2 March 2014 - 10 years ago. It is virtually identical to the Librivox front page of today, except that one sentence has been added at the bottom of the footer.
Yes, that's no surprise. Look at pre-2013, if you can. It was different. I just looked at July 4, 2013, and it was the old design.

EDIT: Here was the "advanced search": http://web.archive.org/web/20130818174143/http://catalog.librivox.org/visitor_advanced.php
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

TheBanjo wrote: March 2nd, 2024, 4:21 pm
TedL wrote: March 2nd, 2024, 4:06 pm
TheBanjo wrote: March 2nd, 2024, 3:37 pm
Hi Ted, I'm no Excel guru at all. However, when I search for "Excel remove duplicates", the very first hit I got advised "To remove duplicate values, click Data > Data Tools > Remove Duplicates."
That worked - thanks for saving my eyes. I'm left with 8,950 unique keywords. My guess is that unique keywords for all the 19,000 books would be around 15,000. Let me just list a few selected examples, so everyone knows what we're talking about:
...
Battles -- Juvenile fiction
Bayern (Bavaria, Germany : Province) -- Fiction
Beagle Expedition (1831-1836)
Bean, Sawney (Legendary character) -- Fiction
Pretty interesting, huh? If for 19,000 books we had (around) 15,000 key terms, it's pretty easy to see that most of these key terms are going to have been applied to only one book in our collection — which obviously makes many of these key terms all but useless for someone wanting to find "similar" books in our collection. I mean, how many books in our collection are likely to deal with " Bayern (Bavaria, Germany : Province) -- Fiction" or "Bean, Sawney (Legendary character) -- Fiction"?

An interesting question to ask, though, at this point, would be something like "what are the 1000 (or 500) most frequently used key terms, and to how many books does each apply". Then one day, when librivox.org wins a HUGE grant and gets to make a zillion interface improvements (yeah, right), someone could suggest something like "How about for each book's catalogue page we check to see if there are at least x other books to which the same key term has been applied, and if there are at least x, then we display that key term with a hyperlink, clicking which would allow the user immediately to see a listing of these similarly key termed books". No, I know it's not going to happen. Strictly a thought experiment.
Its not so hard to check the probable search volume of keywords. Of course, there is no need to check 'Ayesha (Fictitious character : Haggard) -- Fiction', or 'Bad Nauheim (Germany) -- Fiction'. But 'Australia - Fiction' has 195 hits in the 70,000 book Project Gutenberg collection, which has books like ours and LoC subjects throughout (I think). Assume growth of our collection to say, 35,000 books, and that would mean Australia - Fiction' would have about 100 hits in the (future, larger) Librivox collection. On the other hand, 'Australia - History - Anecdotes' has one hit, so that would be covered by searches for the truncated 'Australia - History'.

So, if there is any purpose to knowing roughly how many results come from a particular search term, I think that a search in Gutenberg would be workable.
TheBanjo
Posts: 1309
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

TedL wrote: March 2nd, 2024, 4:40 pm
TheBanjo wrote: March 2nd, 2024, 4:21 pm An interesting question to ask, though, at this point, would be something like "what are the 1000 (or 500) most frequently used key terms, and to how many books does each apply".
Its not so hard to check the probable search volume of keywords. Of course, there is no need to check 'Ayesha (Fictitious character : Haggard) -- Fiction', or 'Bad Nauheim (Germany) -- Fiction'. But 'Australia - Fiction' has 195 hits in the 70,000 book Project Gutenberg collection, which has books like ours and LoC subjects throughout (I think). Assume growth of our collection to say, 35,000 books, and that would mean Australia - Fiction' would have about 100 hits in the (future, larger) Librivox collection. On the other hand, 'Australia - History - Anecdotes' has one hit, so that would be covered by searches for the truncated 'Australia - History'.

So, if there is any purpose to knowing roughly how many results come from a particular search term, I think that a search in Gutenberg would be workable.
Sorry, didn't make myself clear. By "what are the 1000 (or 500) most frequently used key terms", I meant which, of the list of unique key terms you have derived, are associated within our own collection with multiple books (and so, if entered as a key term search phrase, might allow a user to find multiple 'hits' for their search in our collection if these key terms were ever to be added to our collection?"
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

If the 10,600 books in that spreadsheet were from our collection, we can see which terms have many duplicates in the version of the spreadsheet (which I retained) where I put all the keywords in one column. For example: "Abolitionists - United States - Biography" was in 11 books. "Adventurers and adventurers - Fiction" was a subject in 15 books. They're easy to find because I sorted them alphabetically, so all the duplicates are bunched together.

Finding out which books they're in would require a search of the original spreadsheet, using the LOC subject term as the search term.

Is that what you mean?
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

TriciaG wrote: March 2nd, 2024, 4:31 pm
TedL wrote: March 2nd, 2024, 4:21 pm Incidentally,
I looked up the front LibriVox page on the Wayback Machine from 2 March 2014 - 10 years ago. It is virtually identical to the Librivox front page of today, except that one sentence has been added at the bottom of the footer.
Yes, that's no surprise. Look at pre-2013, if you can. It was different. I just looked at July 4, 2013, and it was the old design.

EDIT: Here was the "advanced search": http://web.archive.org/web/20130818174143/http://catalog.librivox.org/visitor_advanced.php
Yes, It looks like you did a major overhaul, in layout at least, in 2012 or 2013. The Advanced Search looks pretty good. I like the four categories at the top, but with 19,000 books in the catalog, the search results for those categories are too large to search through. As we go forward I think we should be planning as if the catalog was around 30,000 books, so that any modifications we make will last for several years, until AI makes it obsolete.
TedL
Posts: 570
Joined: October 24th, 2022, 3:06 am
Location: Wisconsin
Contact:

Post by TedL »

Status of Improving Subject Search

(Long post - get comfortable)

Where are we now?

Someone at Internet Archive yesterday changed the search process in the IA Librivox collection to give results from only within the collection, rather than results from the 4-million book collection. This will be a huge help. Thank you very much for that.

We have apparently decided to ask Librivox management to put a notice on the Librivox site that encourages visitors to go to the IA Librivox page to carry out subject searches. As Ubersuggest indicates that the Librivox front page now gets 1.6 million visitors per month, I believe the number of people that will make use of the IA subject search will be very large. This will result in substantial increases in traffic for thousands of audiobooks that are now rarely heard. Also, some of the private websites that offer Librivox books in their own catalogs will find more LV audiobooks that they wish to offer their users, resulting in substantial traffic increases for maybe a couple of hundred titles that are now unused.

Librivox badly needs to track website metrics in order to know what is working and what isn't. What percentage of visitors who come to Librivox.org for audiobooks go away frustrated because they can't find what they're seeking? As far as I can see, no one at Librivox has the slightest idea. We should have volunteers who track and report up to management the metrics that are available free to every site owner from Google Analytics. I think the privacy concerns preventing Librivox from using Analytics metrics are outdated.

If we don't have volunteers to manage the website, Librivox should recruit them. How can a website with this volume of traffic and a 20,000 book catalog get along without dedicated, skilled website management? Or maybe they're there, and we just aren't hearing from them. If Librivox needs money to get that done, why not accept donations? The 'Official' YouTube channel is asking for donations to be sent to Librivox, but when I go to Librivox's Donate page, it says LibriVox has not accepted donations for the past year.


Where does Librivox go from here on Subject Searches?

We have been discussing the possibility of somehow importing Library of Congress subject headings into the records of our audiobooks. I have a few questions about that.

Are LV audiobook records the same at the IA collection as at the Librivox site? Or would importing subject headings be separate problems/projects at the two sites?

Would it be possible to create new fields in existing records and import data? I'm thinking of two fields that are in the metadata of Project Gutenberg books: "LoC Class", and "Downloads" (see below).

These are database questions that are way over my head, technically. It would be great to get input from someone who has expertise in this field.

I spent some time looking at the Project Gutenberg search capabilities this morning, and I'm really impressed by the way they have it set up. Their download data box on each book record shows that a very high percentage of their books are regularly used, regardless of how obscure the title or how long ago they were released. I think their subject search capabilities are the key to their success in opening their full catalog up to the public.

Project Gutenberg dealt with the problem of having standard search terms by adopting the Library of Congress (LOC) system of search headings. So they have standard terms that are defined in LOC publications, but these terms aren't "commonsense" search terms, so the public doesn't know them. Most people don't know what the 'proper' search term is for the subject they're seeking. Gutenberg addresses that issue in several ways, that I can see.

1st; their search engine handles 'fuzzy searches'. For example; the proper LOC term "Airplanes, military" turns up 6 books in Gutenberg. But if you search for the commonsense term 'military airplanes', it gives you the same 6 books.
Searching for the LOC term "Alcott, Louisa May, 1832-1888" gives 64 books for Alcott as subject or author. Searching for the commonsense term "Louisa Alcott" gives you 70 books, including (I think) all 64 results from the other search.

2nd; You can put a fairly broad subject in Gutenberg's Quick Search box, and it will provide a list of narrower subjects within that broad subject, with books in each.

Example: I searched the LOC term "America Discovery and exploration". At the beginning of the results is "Subjects: 26 subject headings match your search". I could ignore that and view all 135 books in my broad subject. Or I can click on "Subjects" to open results showing each of the 26 narrower subjects. Click on one to show all books within.
By the way: searching "America exploration" which is not an LOC term, got me 157 books and 29 subjects. Again, the fuzzy search capability doing its thing.

3rd: Go to "Search and Browse" in the top menu and click on "Book Search". This gives you several options. Choose "Advanced Search", scroll down to "LoCC:, and open the drop-down. Here's a list of all the LOC general headings. If I choose, say, "BH Philosophy, Psychology, Religion: Aesthetics", and select "Search", the results are all the books in the collection that fit that subject. You can sort them by author or title, and filter results in several ways. This search makes use of the field in all their book metadata records called "LoC Class". Very useful I think, although it was hard to find.

So Gutenberg has shown how using standard LOC subject headings in their book records overcomes the disadvantage of using commonsense search terms, which is that there are various search terms for the same subject, and each search term only reveals part of the books on that subject. But Gutenberg also has shown how to overcome the main disadvantages of LOC search terms; that they're clunky and people aren't familiar with them.


So the next questions for Librivox are, it seems to me:

Is it feasible to import LOC subject headings, and if possible LOC classes, into Librivox book records?

Where would we find LOC subject headings for the 45% of LV books that did not come from Gutenberg?

If importing LOC subject headings is feasible, what would it take to add search capabilities to Librivox that are similar to Gutenberg's?

If these fixes simply aren't feasible, might it be useful to have our visitors search within Gutenberg, the way we're asking them to search in Internet Archive? That approach might be worth discussing, even though only 1 in 6 of Gutenberg's titles is at Librivox.
InTheDesert
Posts: 7786
Joined: August 20th, 2019, 8:25 pm

Post by InTheDesert »

If you parse the referer data at the end of the IA API views data, you get some interesting information.
Female Scripture Characters by William Jay (1769 - 1853) 97% 1 left! "The Penitent Sinner Part 2"
St. Augustine (Vol.6 Psalms 126-150) 94% 3 left!
PL pls: DPL 43 27-28
TheBanjo
Posts: 1309
Joined: January 23rd, 2021, 8:19 pm
Location: Melbourne, Australia
Contact:

Post by TheBanjo »

By way of providing some more concrete background data for this discussion, I have created a further spreadsheet which gives some additional insight into the frequency with which key terms appear in the Project Gutenberg catalog for all of their texts which have formed the basis of Librivox audiobooks. Here is that spreadsheet, zipped: https://drive.google.com/file/d/1tm6S05lm55xAbpShHcpdiie07DTsgzAC/view?usp=sharing

As of a couple of days ago, there were 10,673 audiobooks in our collection based on a Project Gutenberg source.

If we add all the key terms that Project Gutenberg have associated with those books to a spreadsheet and sort alphabetically, we get the green column shown here, which contains 29,228 entries.

The second, yellow, column in this spreadsheet lists only unique key terms that appear in the green column. There are 8158 of these.

The third, blue, column shows how many times each of these unique key terms is associated with one of our Project Guternberg sources. We can see, for instance, that while "Zoos -- England -- London" has been applied to only one of our PG sources, "Science fiction" (the most popular term) has been applied to 547 of our Project Gutenberg sources, and even "American wit and humour" (is there such a thing??) has been applied to 58 books. (That's an Aussie joke, by the way.)

I can envisage a far off, distant world, maybe around 2050, where a user comes to our Librivox catalog and pulls up the catalog page for "The Adventures of Sherlock Holmes". On the left hand side of the page, beneath the section headed "Links", this user finds a new section has been recently added headed "Key terms". Below it the user sees:

Holmes, Sherlock (Fictitious character) -- Fiction (47)
Private investigators -- England -- Fiction (46)
Detective and mystery stories, English (44)


Each of these entries is hyperlinked. The numbers in brackets, generated dynamically at the time the page is accessed, show the number of audiobooks in our collection currently associated with that key term. Clicking any of these link takes the reader to a page that lists other Librivox audiobook that have the clicked key term as one of their key terms.

Science fiction, I know - but then, that IS our most popular category!
Post Reply