Hi there William and thanks for your patience
Looking at your post from above, I don't want to comment about the first paragraph, really. I don't think a websearch is a good example. I don't know how google etc. organise their databases, but I would assume that there are plenty of redundancies and multiple entries in the databases that come in simply because of how a webcrawl is done. When they then report the findings, surely those redundancies come into play and you will find a number of results pointing to the same website (even though that might be a different entry in the database). I'm assuming a lot here, I have no idea, so that's why I don't want to get into this.
So, our database has unique entries that are essentially manually added whenever a project is started, and each project/catalog page leads to a unique project number in the database. There are no redundancies built into the thing per se and I don't think that adding redundancies will fix the bugs and peculiarities that are in the search function.
In your second paragraph, you're (almost) asking the question: "Why can I find SOME sections with a certain title, but not all of them?"
The toggle you're suggesting is already implemented, it's a checkbox whether a project is classified as a collection or not. If YES, then the following two things happen:
1) We now have additional metadata for each section that can be filled in manually and covers "section author, link to section text, section language (that's for multilingual projects, default is project language)". Together with the section title, this meta data is now fully searchable.
2) On the catalog page, this additional metadata is displayed for each section. To see how this looks like, compare a
standard book project:
https://librivox.org/las-criaturas-acuaticas-by-charles-kingsley/ with a
collection project:
https://librivox.org/short-nonfiction-collection-vol-063-by-various/
As to "when is something a collection" and thus searchable by section:
Essentially, if it has been published as a single book, and the project is the same as this one single book, then we have a standard project and no collection. (e.g.,
https://librivox.org/collected-poems-1901-1918-by-walter-de-la-mare/)
In contrast, if the project is made up of a number of individual sections that were published individually, possibly but not necessarily by different authors, and taken from different text sources, then we call it a collection and catalog as such. (e.g.,
https://librivox.org/37-american-poems-by-various/)
In essence, we try to emulate what you would find in a standard library catalog. If you were to look there for, say, a particular story by Poe, it would not show up in a search of the library catalog (unless it comes up in the title of a book). Instead, you would have to look for Poe and check inside individual books.
In a sense, we are already doing better than that, because we do index collections "produced" by LibriVox by section and not just by overall title.
For now, I'm leaving it at this explanation of what we're doing right now. There are other issues with the search, but let's start small.