catalog system

Non-reading activities need your help too!
Hath1
Posts: 11
Joined: January 28th, 2006, 8:30 pm
Location: OH, USA

Post by Hath1 »

You all make very good points about authority control, but I think I have figured out how to solve most of the technical problems involved with this aspect. I will go with a system which uses one identity for each author, and allows for unlimited aliases to be tied to any author. We will have to be more careful at the time of entering information into the catalog, however this is only done by MCs so this is probably the best way to handle the situation.

All the Author's works will need to be entered into the system using the "MAIN" author information and not the pen name by which the work was published as the author, but I will also leave a field for the pen name (only when used per each work) that will need to be filled in to retain the original publishing authorname for display/historical purposes.

This will allow us to display all the works by the author when searching, including works written under a pen name. It will also allow us to display works written under the author's own name when searching by a pen name, as well as displaying works written under all other pen names tied to the same author. Depending on exact implementation, it may also display all works by other authors who have written works under a same pen name as the author being searched, however displayed information about the works in the catalog would never display info about an incorrect 3rd party author, even if the book was published under a pen name used by said 3rd party author.

I do not, however, know how we will address one other authorship concern I have: Works written by multiple authors.

If there are books written by more than one author, I have a few different choices on how to handle this, but I want input from the MCs first regarding if this is an issue worth worrying about or not, as I'd prefer to keep the database design as simple and straightforward as possible. My recommendation is to enter each Project in our collection with exactly one author specified, but I'd like to hear what you think.

Also, a few questions about internal terminology for those of you likely to be inspecting the database from a viewpoint other than the user-side of the web pages before I get too many dependencies to go back and change things.

Projects: Librivox Projects/Books/Poems/Collections (makes up the catalog) Includes both finished works and ongoing projects (status code distinguishes between them)

ProjectSections: The Chapters or other units of a project parted out by the BC for readers to record.

ProjectCategories: All the different Genres/Categories the MCs can use to catalogue a Project / Work in Progress in our Librivox collection.

Requests: Become Projects after minimum required amount of info is verified and entered by MC.

Recordings: Describes and locates the actual files containing the auditory data.

Authors: Keeps certain biographical information about each author for which we have derivative Projects in our system (now also includes pseudonyms.)

Favorites: Users' preferences (Projects (books), Readers, Recordings, Authors, LibriBuddies)


I plan to work on this project a bit this evening, sometime after 7:00P.M. EDT, so get your comments in!
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

Greg McMullan wrote:I don't have a lot of time, but it might be interesting for folks to look at http://www.qwikly.com/2006/02/wikipedia-for-cataloging-books.html which seems to be talking about a place where folks can catalog their own libraries of books in a shared environment, with an API that relies on Library of Congress data. I will for sure be looking more later, when I'm not running out of lunch break.
Greg, thanks for pointing this out.

I recently added a book to the proofreading system for Project Gutenberg and was surprised that they had it set up to pull in the LOC info... surprised because PG doesn't display any of it in their public interface :roll: (... don't get me started! :wink: )
~ Betsie
Multiple projects lead to multiple successes!
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

Hath,

The definitions are right on... we're on the same page with those... and your work around for pen names makes perfect sense to me...

as for additional authors, i'm wondering if there is a way to have a field called something like "Contributor" which could be used for editors or translators or second authors .... it wouldn't be considered the MAIN author, but perhaps it could also be searched when an author search is performed? ... in a browse situation, it would cause the title to appear twice if a name were included in this Alternate Author field... so that might be a problem for the system (that's how Project Gutenberg seems to do it).

I think the bit with multiple authors would make cataloging collections (such as stories or poetry) to be very tricky... but I'm also thinking that with a database, it won't be such a big deal for us to create a seperate entry (catalog record) for each work... in fact we would no longer need to create collections the way we do now as we would (theoretically) be able to create on-the-fly collections with the subject/genre/category tags.

When you get the field names solidified, I would like to start some of the data entry (because I know you won't want to touch that boring part ;) ) ... would it be possible to import a delimited (tab or comma or whatever you specify) file? if so, I would like to start getting some of the data ready for import... let me know what you think. =)
~ Betsie
Multiple projects lead to multiple successes!
Hath1
Posts: 11
Joined: January 28th, 2006, 8:30 pm
Location: OH, USA

Post by Hath1 »

thistlechick,

I would NOT recommend csv for this project, due to the large number of fields which can potentially contain commas as part of the valid data.

A tab-delimited list should be OK, as I don't think tabs would normally be acceptable in the majority of the fields.

Even better would be something like MS access .mdb which handles table-type data a little better than spreadsheets, but any of the 3 formats should be able to be imported fairly easily.

I have decided to normalize our data slightly further to account for the multiple authors problem, so that any number of authors can be denoted as the author of the project. This will cause the initial load of back-data to be slightly more complicated, but should pay off in the long run (I don't see it making the entry of single Books at a time any more difficult because all the authors who contributed should be able to be control-click selected from a listbox, or hyperlink click-added one at a time from the available entire or filtered/sorted list, depending on which method the eventual web page is coded to).

I would suggest we first fill the tables with all the info we have available on all our authors, and then add in all their works into the Project tables 2nd. 3rd would be to break each Project down into the sections actually used to complete the project, but this may not be absolutely necessary, though it would make the "old" projects' pages behave a little differently than the "new" projects created after the new system is up and running.

I am currently trying to decide on the exact methodology I plan to use to link the actual final versions of the recorded files back to the project / project section that it belongs to in the best way possible. I will attempt to get some preliminary table specifications posted by the end of the weekend, so everyone who is interested can comment/question any potential problems before we start putting too much effort into data entry.

EDIT: Had to delay a week: I'd been mostly working on this during my lunch breaks, and my internet went on the fritz at home, so I couldn't get to most of what I needed to work on this. The cable-repairman is scheduled to come on Friday.

-Hath1

P.S.: Any coders who can figure out a way to automate collection of as much data as possible by scraping the existing data pages on the site, please contact me or thistlechick to discuss the possibility. It could really save a lot of volunteer hours spent on data entry!
Stephan
Posts: 1550
Joined: December 18th, 2005, 9:38 am
Location: Leverkusen, Germany

Post by Stephan »

Once the programmer(s) offer us the first rudimentary clickable version to give input to, with one or two dropdowns , I'd like to suggest that we move to a different place to discuss it all, better yet, to have it ready.

Our beloved forum wont do to sort all the input that is to be expected....for each aspect, for each small feature and request, and suggestion and all the ongoing long discussions to every one of these aspects. You all have read Haths feature list - and can imagine what will be going on here.

I have been working with a mate on a much smaller online-database-project - It was tiny compared to this cataloging monster software - yet just the two of us - we quickly had 20 open ends - each with a corresponding long discussion. The programmer would not be able to keep all this input layed out in his head and needs it sorted cleverly.

Everyone of us will have clever ideas. Some will talk about the underlaying functions, some talk about the UI, some about bugs, some about right managment. How would the programmer cope with all the hundrets of ideas?

The programmer who i was working with, clever as he was, set up a complete new forum just for discussing this single software. It worked like a charm because he cleverly chose sub-forums to channel the info flowing to him. He constantly moved discussions to more or lesser important categories. "to be implemented" "working on...." "future versions"... BUMMER i dont recall exactly his clever categories.

Hath, you will have to outline something like this to us - HOW DO YOU WORK?

We will surely blow up the regular Librivox Forum with this, at the very least "disturb" our volunteer readers with the appearance of a douzen crude subforums where hell is breaking loose.

I think this was the only way, the programmer could keep sane. :lol: We want to keep the information juggling away from our dear programmer as much as we can. Already the chaos is starting in this very thread here, and magically Hath somehow has managed to keep the balls in the air.

The forum we had was more like a feature-discussion-rescource which the programmer could pull and reread once he chose to go on working on a particular feature.

The forum was a good tool for that.
[url=http://librivox.org/wiki/moin.cgi/PromotionalMaterial][color=indigo]Want to promote LV? Print the poster and pin it at your library[/color][/url] | [url=http://librivox.org/wiki/moin.cgi/Stephan_Moebius][color=indigo]My wiki page[/color][/url]
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

Hath1, how's this going? =)
~ Betsie
Multiple projects lead to multiple successes!
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

I have split off the topic regarding having a Listening Preview feature to this location:
http://librivox.org/forum/viewtopic.php?t=2437
~ Betsie
Multiple projects lead to multiple successes!
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

I'm starting to think that building our own database system is out of the question... perhaps we should start considering other options such as an audio content management system such as http://www.jinzora.org/ ... take a look at the demos and see what you think.
~ Betsie
Multiple projects lead to multiple successes!
kri
Posts: 5319
Joined: January 3rd, 2006, 8:34 pm
Location: Keene NH
Contact:

Post by kri »

thistlechick wrote:I'm starting to think that building our own database system is out of the question... perhaps we should start considering other options such as an audio content management system such as http://www.jinzora.org/ ... take a look at the demos and see what you think.
Betsie, I'm beginning to agree. We don't have anyone really decidated and involved with creating the database. I'll look at that site later.
metal.lunchbox
Posts: 114
Joined: December 16th, 2005, 11:40 am
Location: Nashville, TN USA

Post by metal.lunchbox »

Though it is perhaps in no way useful to the explicit purposes of LibriVox, I thought I would suggest integrateing tags for Geographic information as well. I've noticed that many of our readers include this information before the begining of their chapters. It seems that despite it's seeming irrelevance they consider it important information. I'm no expert on Geotagging but it would just be one more tag. The wikipedia page explains conventions. Exploiting this data could come later but I would like to ask whether the panel believes it appropriate for inclusion. I am not very familiar with catalogueing systems and don't know how open the Librivox one is conceived to be.

James
James | [url=http://librivox.org/wiki/moin.cgi/JamesSmith]LV Wiki Page[/url] | "Tout et n'importe quoi mais surtout n'importe quoi" - Basile
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

Just an update... I have been hoping that this project would just appear in our laps completed, but it is becoming more and more critical that we move forward with a database for storing information about LibriVox recordings... I have explored pre-built options, and have not found anything that will sufficantly meet our needs. I have taken the initiative to begin development of the database with the help of several other active LibriVox volunteers.....

The database structure is built, and we are now in the process of developing interface pages and inputting retrospective project data. I do not have an expected release date, but will keep everyone posted on the progress as we move forward as quickly as possible.

James, Yes, I too believe that geographic information is valuable, or at least interesting.... and am already incorporating it in our database in connection with the members information.

I am wondering how much information we should include about the book authors... already I am including fields for Birth and Death dates, and for links to Wikipedia articles... but would we find it valuable to include a geographic field for authors as well? and how specific should it be? Is Country enough? or (if we are to include it) should it be in smaller regional segments?
~ Betsie
Multiple projects lead to multiple successes!
kri
Posts: 5319
Joined: January 3rd, 2006, 8:34 pm
Location: Keene NH
Contact:

Post by kri »

I think that language is enough of a determinant to add to authors. I dno't know that geographical location of the author will be that important. Plus, it would be hard to decide what location to choose from. Their birth place, their main home, or what?
thistlechick
Posts: 6170
Joined: November 30th, 2005, 12:14 pm
Location: Michigan

Post by thistlechick »

Yes, those are the kinds of questions that were holding me back from including geographic with the author... so far I have language connected with the actual book (so we would have multiple entries for each title done in a different language).... but we certainly could link language with the Author as well...
kri wrote:I think that language is enough of a determinant to add to authors. I dno't know that geographical location of the author will be that important. Plus, it would be hard to decide what location to choose from. Their birth place, their main home, or what?
~ Betsie
Multiple projects lead to multiple successes!
metal.lunchbox
Posts: 114
Joined: December 16th, 2005, 11:40 am
Location: Nashville, TN USA

Post by metal.lunchbox »

I was really thinking more of Geographic metadata for the recordings themselves. Some people start out there recordings telling us where they are. Obviously I think that geodata for a members catalogue would also be very interesting and potentially useful but my question was about wether or not we think it is woth the effort to have each recording potentially tagged with the location where it was recorded. I think this can be done with a varying degree of specificity without becoming too confusing.

James
James | [url=http://librivox.org/wiki/moin.cgi/JamesSmith]LV Wiki Page[/url] | "Tout et n'importe quoi mais surtout n'importe quoi" - Basile
kri
Posts: 5319
Joined: January 3rd, 2006, 8:34 pm
Location: Keene NH
Contact:

Post by kri »

It would have to be an optional field, because some people either don't want to have their location put in, or don't care to bother.
Post Reply