Information in Archive metadata

Comments about LibriVox? Suggestions to improve things? News?
Post Reply
Yann
Posts: 16
Joined: December 6th, 2021, 12:39 pm
Contact:

Post by Yann »

Hi, I am copying LibriVox recording to Wikimedia Commons, and presenting them to the Main Page in https://commons.wikimedia.org/wiki/Commons:Media_of_the_day
I take care to credit LibriVox volunteers, but it is difficult as there are not clearly mentioned in the metadata in Internet Archive. Also the text source is not mentioned, and the translator is not always mentioned. Could you please add these information in metadata? See https://commons.wikimedia.org/wiki/Category:LibriVox_recordings for a list of recording already uploaded to Commons. Thanks, Yann
TriciaG
LibriVox Admin Team
Posts: 60810
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

This would require a lot of back-end programming. Since we don't have any paid developers, and the volunteers we have don't do very much (usually only changes to fix functionality on our own site), this would be a low-priority request.

All this information is available on our own site, though. Or you could try the API, although that's a bit of a mess itself.

P.S. I can't imagine adding all our over 18,000 works to Wikimedia Commons!
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
BengtW
Posts: 196
Joined: February 14th, 2019, 11:11 am
Location: Sweden
Contact:

Post by BengtW »

I would use the API as the main source and then complement by scraping the project page for the rest. This is how the progress goes for the YouTube channel. I maintain my own local database with the information needed that is not part of the API. You also need to keep track of what has been uploaded and keep some backwards traceability as well.

Do not do this manually. That will take forever, spend a bit of time coding and let a script do it in the background if Wikimedia Commons allows it.
Yann
Posts: 16
Joined: December 6th, 2021, 12:39 pm
Contact:

Post by Yann »

Hi, Yes, I use a bot to upload the files. See https://commons.wikimedia.org/wiki/Special:Contributions/YannBot
If all the information could be extracted automatically, I would be easy and fast.
Yann
Posts: 16
Joined: December 6th, 2021, 12:39 pm
Contact:

Post by Yann »

There are also a lot of spelling mistakes for French works.
TriciaG
LibriVox Admin Team
Posts: 60810
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

Yann wrote: July 2nd, 2023, 3:20 pm There are also a lot of spelling mistakes for French works.
I don't know French, so I wouldn't know - but the French projects were mostly done by French speakers, so there shouldn't be more spelling mistakes than for any other language. :hmm:

Are they mistakes in the descriptions, section titles, or somewhere else?
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
Yann
Posts: 16
Joined: December 6th, 2021, 12:39 pm
Contact:

Post by Yann »

TriciaG wrote: July 2nd, 2023, 3:46 pm
Yann wrote: July 2nd, 2023, 3:20 pm There are also a lot of spelling mistakes for French works.
I don't know French, so I wouldn't know - but the French projects were mostly done by French speakers, so there shouldn't be more spelling mistakes than for any other language. :hmm:

Are they mistakes in the descriptions, section titles, or somewhere else?
In the titles and section titles, and that's where it matters. :(
TriciaG
LibriVox Admin Team
Posts: 60810
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

Do you have an example or two?
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
Yann
Posts: 16
Joined: December 6th, 2021, 12:39 pm
Contact:

Post by Yann »

BengtW wrote: July 1st, 2023, 10:51 pm I would use the API as the main source and then complement by scraping the project page for the rest. This is how the progress goes for the YouTube channel. I maintain my own local database with the information needed that is not part of the API. You also need to keep track of what has been uploaded and keep some backwards traceability as well.

Do not do this manually. That will take forever, spend a bit of time coding and let a script do it in the background if Wikimedia Commons allows it.
I suppose you talk about Internet Archive API? Yes, that's what I do, but some information is not available there, notably the name of volunteers who read it.
Availle
LibriVox Admin Team
Posts: 22452
Joined: August 1st, 2009, 11:30 pm
Contact:

Post by Availle »

TriciaG wrote: July 4th, 2023, 11:11 am Do you have an example or two?
I don't speak French either (anymore), but apparently, there was a change in orthography in 1990.
Obviously, all our books predate this one :wink:
We always use the spelling as it is used in the text we're reading from, regardless of what would be considered correct today.
Cheers, Ava.
Resident witch of LibriVox, channelling
Granny Weatherwax: "I ain't Nice."

--
AvailleAudio.com
Post Reply