How to Make Subject Searches Work
Go to the Librivox Free Audiobook Collection in Internet Archive, -
https://archive.org/details/librivoxaudio - and hover your mouse over a book cover. You can see the line called
"Topics". The proposal for changes described in this paper is about updating the words that Librivox inserts in the
Topics line.
What's wrong with the existing subject search system?
You can already carry out a subject search at this site and at Librivox.org. Putting a search term into the search line causes it to look through all the text in a record, including the book summary. There are no standardized search terms. If you want to search for all books in the collection about automobiles, for example, you would have to do searches for auto, autos, car, cars, automobile, and automobiles, plus any other terms used at the beginning of the 20th century for autos.
Your search results will include all books where the search term was mentioned in the summary text, so search results include many books where your subject is really marginal to the book. A lot of related books are omitted from search results because the subject term didn't appear in the book summary.
What terms are in "Topics" now, and how did they get there?
The terms "audiobooks" and "librivox" seem to be in Topics in every book entry. If we confirm those have a necessary function, we should continue to add them to each book.
Otherwise, the terms that were added to topics are usually one-word terms added by the Book Coordinators. They used words they believed described the major themes of the book. The number of topics entered varies a lot from book to book, with many books having no topics other than 'audiobooks' and 'librivox'. Internet Archive (IA), which governs the format of these records, says we should use a maximum of 10 topics in a book.
Why are you suggesting changes?
The Internet Archive has built in the capability to do searches from terms into the Topics field. To see how books use this capability, look at these two examples:
https://archive.org/details/101essaysthatwil0000wies
https://archive.org/details/sonofneptune0000rick_f7t7
As you can see, there are multi-word 'topics' in each entry. Click any topic, and Internet Archive does a search of all the books on site and presents you with the results. You then filter the results by selecting Librivox in the Collection box in the left column. To be included in those search results, a book must have the same identical term in its
Topics field.
Just hover your mouse over the book covers in the results, and you'll see they have that topic. You can click on any topic while hovering (to do another search), without even opening a book page.
The search results in this example sometimes have hundreds of books, because it is searching within a collection of 3.9 million books (the collection is "Texts to Borrow"). Our collection of Librivox audiobooks presently has 19,000 books, or one-half of one percent that size. So we can limit our subject terms to broader topics than IA does, and therefore use far fewer subject terms.
How will Users know about these new standardized 'subject headings'?
1st, they can find a similar book through browsing or a book title search, then click on the topic that fits their need.
2nd. We should provide a complete list of our subject headings. The front page of the list would have basic subject categories. Clicking on a basic subject would lead to all the subject headings in that category that we use.
How will we implement putting topics like this into all of our 19,000 books?
To do an effective search from a topic in a book, or for a book to appear in search results, all books on a particular subject need to use the same identical term. There are three systems (that I'm aware of) for standardized 'subject headings' used for books. The Library of Congress and Dewey Decimal System are used by libraries, and BISAC headings (Book Industry Study Group) are used by booksellers.
I suggest that we should select the Library of Congress (LOC) system, for two reasons.
First, it is the main system used in other collections at IA, so it will be easy for users to switch back and forth from Librivox to other IA collections.
Second, the system I'm proposing for 'classifying' our books (assigning subject headings) involves looking up book titles and using their existing LOC subject headings, rather than using the LOC reference tools to figure them out. This will be a big time-saver, and will make it possible for non-librarian volunteers to carry out this task. Volunteers would find subject headings by doing book title searches on Internet Archive, Worldcat.org, or the LOC Online Catalog at catalog.loc.gov. Occasionally they will refer to the Library of Congress Subject Headings manual, available online at IA.
We should have a 'team leader' for the volunteers to oversee the assignment of subject headings.
Volunteers will forward the book title and new subject headings to an 'Admin' person at Librivox with authority to edit book records. The Admin person will open an existing book record, copy and paste the new subject headings in the Topics field, and close the book record.
For new books, a volunteer will find and add the subject headings while the book is still being recorded. When the book is finished the Admin will do their usual routine to add the book to the catalog, with no additional steps.
How much work will it be?
My guess is that volunteers will each be able to do 5 to 10 books per hour. I hope that an Admin person can tell me how long their task would take. I can't predict how long it will take to finish the job until we know how many volunteers we have.
Where will we get volunteers?
Over the life of Librivox, I understand that we have had 13,000 volunteer readers. But many drop off because they find the process of recording difficult. I suspect there are people who would be willing to help Librivox in this other way instead of recording. Librivox would recruit volunteers as usual on the website and YouTube channel.
You said we would change searches at IA, so that instead of searching the full book catalog, our searches would take place within the Librivox audiobook collection. Status?
IA indicates that the searcher should use the "Collection" box in the left margin to limit search results to Librivox audiobooks.
What about the Librivox.org site?
This needs more study, and probably wouldn't be addressed immediately. Change here would require modifications to the Librivox.org website. There is much less traffic on this site than on the IA site, although at 1.7 million views per month, traffic is still outstanding.
I suggest that instead of using the Genre/Subject list, we would direct users to our list of subject headings, mentioned above. Users could put the subject heading into the Librivox.org search field, or simply access our audiobooks through the IA site.
Currently the Librivox.org individual book records contain a line for 'Genre(s)', not for topics. Is it feasible and desirable for Librivox.org to have the same book records as IA?
Do you think this is worth the effort?
Definitely yes! The two Librivox sites together have over 20 million views per month; a truly phenomenal number. ahrefs says the 100th largest U.S. website, FedEx, has the same volume of traffic. But despite all those visitors, more than 90% of our books get less than 1 view per month, and are heard even less than that. This proposed subject search system, already in use for most Internet Archive books, will allow this huge crowd of visitors searching for audiobooks to easily find everything we offer on their favorite subjects.
Many Librivox volunteers believe that private websites offering Librivox books ensure that our books are heard. I think not. I found that a handful of Librivox books are heavily used on some of the biggest audiobook sites. But very few of our books even appear in their catalogs. Traffic on most of the audiobook sites that offer our books is insignificant.
The most important thing we can do for Librivox right now is to make our books more accessible to users with this upgrade.