Any interest in LibriVox's representation on wikipedia?

ZamesCurran · Post by **ZamesCurran** » April 26th, 2015, 1:44 pm

Kangaroo692 wrote:James, let me know if there is a way for me to help with my little HTML and CSS know-how. Thanks,

Well, as you can probably tell from that page, my design skill aren't that hot. If you want to take a try at punching it up, feel free. Take a copy of the Html of the page, (right-click, View Source) . Modify it, and email me the file (james-dot-curran-at-gmail-dot-com).

The IFrame on the right looks nice, but doesn't seem to be working -- it won't open the edit page in the frame (I think wikipedia has security measures to prevent people from doing exact what we want to do). So, I think we want to get rid of that, and open Wikipedia in a new tab.

Post by **Availle** » April 27th, 2015, 1:16 am

James,

we have removed those spurious "null" reader entries from the projects you have indicated above. Some of those seem to have occurred during the migration from our old system.

Thanks for letting us know.

For the record: We are perfectly able to change the data in our database; we just cannot do anything at the moment that requires programming and a testing environment. And we're not in a position to accept offers in that respect either.

ekzemplaro · Post by **ekzemplaro** » April 27th, 2015, 2:56 am

Hello James san,

ZamesCurran wrote:The "Id_" number appears to be the Librivox internal ID # (it's the number used on the iTunes & RSS links)

Right. These numbers are originated at librivox.org. I just use the same ID.

ZamesCurran wrote:Also, Masa/ekzemplaro appears to be accessing the librivox database directly. Is there an API for doing that?

Not directly. Just through limited API.
I can't get information relating to links to Wikipedia. So we have to manually check if each book has a link to Wikipedia or not.

Here's how I create db_catalog.json.
https://github.com/ekzemplaro/librivox_database

I use LibriVox API, Archive.org API and manually entered information as source.

Cheers,
Masa

ZamesCurran · Post by **ZamesCurran** » April 28th, 2015, 10:06 am

ekzemplaro wrote: As of now I'm thinking to show the status using the following rule.
-1 ---> Not checked yet.
0 ---> Checked. No link to Wikipedia
1 --> Checked. Exists link to Wikipedia.
2 --> Checked. Exists link to Wikipedia plus exists link from Wikipedia.

I've been thinking about this status, and I feel it has a number of problems, specifically,
- It is missing a number of possible states.
- Status meanings aren't obvious or memorable. Without consulting that chart, the numbers are meaningless.

I'd propose a two character status code. The first character tell the status of the LV page, The second the Wikipedia as such:

Code: Select all

     W-  = LV page has link to Wikipedia page.
     x-  = LV page does NOT have link to Wikipedia page.
     --  = Unknown.

     -L  = Wikipedia page has correct link to LV page (via {{librivox }} tag)
     -l  = Wikipedia page has improper link to LV page (without {{librivox }} tag)
     -x  = Wikipedia page has no link to LV page.
     -n  = Book does not have it's own Wikipedia page.
     --  = unknown.

So, if everything was correct (links going each way) the status would be "WL" (capital letter means it's good). If the book does not have a page (and therefore no links either way), the status would be "xn" (lowercase indicates a problem)

This would also allow us to separate the task of verifying the correctness of the pages (which can be done fairly quickly and without triggering Wikipedia's update sensors), from actually updating the page (which is much slower, and may have a daily limit.)

The numeric statuses would map to the alphabetic ones as such:

Code: Select all

    -1 = "--"
     0 = "x-"
     1 = "W-"
     2 = "WL"

a CHAR(2) should take up no more room in a database than an INT.

ZamesCurran · Post by **ZamesCurran** » April 28th, 2015, 10:22 am

Availle wrote:we just cannot do anything at the moment that requires programming and a testing environment. And we're not in a position to accept offers in that respect either.

That seems to be an odd situation. Can you give the background that led to it?

TimoleonWash · Post by **TimoleonWash** » April 28th, 2015, 11:38 am

ZamesCurran wrote:
ekzemplaro wrote: As of now I'm thinking to show the status using the following rule.
-1 ---> Not checked yet.
0 ---> Checked. No link to Wikipedia
1 --> Checked. Exists link to Wikipedia.
2 --> Checked. Exists link to Wikipedia plus exists link from Wikipedia.
I've been thinking about this status, and I feel it has a number of problems, specifically,
- It is missing a number of possible states.
- Status meanings aren't obvious or memorable. Without consulting that chart, the numbers are meaningless.

I'd propose a two character status code. The first character tell the status of the LV page, The second the Wikipedia as such:
Code: Select all
     W-  = LV page has link to Wikipedia page.
     x-  = LV page does NOT have link to Wikipedia page.
     --  = Unknown.

     -L  = Wikipedia page has correct link to LV page (via {{librivox }} tag)
     -l  = Wikipedia page has improper link to LV page (without {{librivox }} tag)
     -x  = Wikipedia page has no link to LV page.
     -n  = Book does not have it's own Wikipedia page.
     --  = unknown.
So, if everything was correct (links going each way) the status would be "WL" (capital letter means it's good). If the book does not have a page (and therefore no links either way), the status would be "xn" (lowercase indicates a problem)

This would also allow us to separate the task of verifying the correctness of the pages (which can be done fairly quickly and without triggering Wikipedia's update sensors), from actually updating the page (which is much slower, and may have a daily limit.)

The numeric statuses would map to the alphabetic ones as such:
Code: Select all
    -1 = "--"
     0 = "x-"
     1 = "W-"
     2 = "WL"
a CHAR(2) should take up no more room in a database than an INT.

I've been banging my head against the wall about these statuses and some stuff came loose. We would like to see all LibriVox offerings having the links to wikipedia, and the wikipedia link to LibriVox. So, maybe the two most important states is 1) does the wikipedia book page exist? and 2) are all the links in place. How about a simple status of "OK" if all three links are verified. Maybe a status of "NONE" if the wikipedia page doesn't exist. If the state is other than these two, it means something is missing, one of the three links is missing. The way this project is going, anything missing gets added in a few days, so we could leave the third status as "IP" for in progress. This IP would really only apply to missing LibiVox links because the person who reported it has likely already added the wikipedia link before they posted the results in this thread. Anyway, I'm thinking NONE, OK, or IP might meet our needs.

ekzemplaro · Post by **ekzemplaro** » April 28th, 2015, 4:46 pm

Hello James san,

Please just browse this thread
Project: Link Wikipedia pages to LibriVox recordings
At the first post the status is defined as follows,

-1 --> Not checked yet
0 ---> Checked. No link to or from Wikipedia
1 ---> Checked. The link from LibriVox to Wikipedia exists
2 ---> Checked. The link from Wikipedia to LibriVox exists
3 ---> Checked. Links exist both to and from wikipedia and Librivox

I changed the definition one time. What you refered is the old one.
The max number for the old definition was '2'.

I don't like to add another confusion. So I keep this definition for the moment.
I agree your idea is excellent. But somebody will soon give us better idea.

As I mentioned before, most of things can be automated, if a copy of the LibrviVox database is available.

Cheers,
Masa

mahne · Post by **mahne** » April 29th, 2015, 1:11 am

I would say the matching should then look like:

-1 = "--"
0 = "xn"
1 = "Wx"
2 = "xL"
3 = "WL"

The combinations "*l" were not neccessary as of now, because all reporters would have corrected the link on wikipedia.
The combinations "*-" or "-*" were not neccessary, because all reporters have to check both sides.
"Wn" does not make sense and "xx" was not needed, because reporting was done after changing s.th.
Just having a look without changing would mean touching things twice and therefore lost time.

On the other hand, if there would be some automation on checking, most of the combinations would make sense. One could also think of a status "n-", so that "nn" means nothing to do, whereas one x anywhere means s.th. to do.
Also, if we are separating the two statuses, why not introducing a new value in the database, why cramping them together into one char?

Speaking of automation:
Does anyone see a chance to spider the LV pages to check for an existing wiki book link automatically? And if there is one, to follow the link and check if the Librivox book template is used on that wiki page?

Additionally does anyone see a chance of automatically reading in all entries on this page : https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Librivox_book&limit=500 and search on Masas catalog if all findings are marked 2 or 3? And to check if there are any additional 2 or 3s in the database (which would most likely mean that this was falsely marked 2 or 3).

I think any of this would be a big improvement to the sole manual update process we have.

Cheers
mahne

Post by **annise** » April 29th, 2015, 3:15 am

I'm not quite sure what you are talking about - sorry , the tech bits are outside my current knowledge base but

Wikipedia will never approve of us adding links from LV automatically , they do not regard themselves that way . In a way I agree with them , but whether I do or not it is their decision.

We can not change any software or host any software at present and do not know when we will be able to.

It is not a giant conspiracy , it is just a fact we all, including the admins, have to accept - and get on with what we can do or find work arounds using what is available.

Anne

ekzemplaro · Post by **ekzemplaro** » April 29th, 2015, 4:05 am

Hello mahne san,

mahne wrote: Speaking of automation:
Does anyone see a chance to spider the LV pages to check for an existing wiki book link automatically? And if there is one, to follow the link and check if the Librivox book template is used on that wiki page?

Scott san is writing a code to parse LibriVox pages. I expect he will pop up here.

mahne wrote:
Additionally does anyone see a chance of automatically reading in all entries on this page : https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Librivox_book&limit=500 and search on Masas catalog if all findings are marked 2 or 3? And to check if there are any additional 2 or 3s in the database (which would most likely mean that this was falsely marked 2 or 3).

I think any of this would be a big improvement to the sole manual update process we have.

Good suggestion. If any API for my database is needed, I'll write it. I also study about Wikipedia API.

Cheers,
Masa

ZamesCurran · Post by **ZamesCurran** » April 29th, 2015, 7:54 am

mahne wrote:Also, if we are separating the two statuses, why not introducing a new value in the database, why cramping them together into one char?

Largely, because I felt that redefining the meaning of one field is a less dramatic a change than adding an additional field.

But, on the whole, two separate statuses would be best.

My main concern was having a status which covered every possibility (even if it covered some combinations which were impossible), and the code is something which, when viewing the raw value, would be meaningful without have to go through a translation step.

TimoleonWash · Post by **TimoleonWash** » April 29th, 2015, 8:57 am

annise wrote:It is not a giant conspiracy , it is just a

Secret?

Post by **TriciaG** » April 29th, 2015, 9:04 am

TimoleonWash wrote:
annise wrote:It is not a giant conspiracy , it is just a
Secret?

No. No secret.

We don't have the resources to develop the site further at this time, and don't know when we will. How is that a secret?

TimoleonWash · Post by **TimoleonWash** » April 29th, 2015, 9:09 am

TriciaG wrote:
TimoleonWash wrote:
annise wrote:It is not a giant conspiracy , it is just a
Secret?
No. No secret.
We don't have the resources to develop the site further at this time, and don't know when we will. How is that a secret?

Well, I would have hoped for a more complete answer to the original question. What resources are needed? A programmer and a server? As a volunteer organization I would have thought it would be straight forward getting these two. Perhaps some testers would also be needed?

Or, do you, we?, always farm out our programming to people or companies that charge for it?

You know, just more detail to satisfy the curious mind.

I'm not trying to pry I just like to know and don't usually see good reasons for restricting access to knowledge, that's all, just hoped for a complete answer.

Kangaroo692 · Post by **Kangaroo692** » April 29th, 2015, 9:19 am

I don't see how this is restricting access to information.
We record audiobooks, everything else is a side-project.

I believe that if the administrators were capable of having a programmer and an update to the website and/or workflow, they would let us know. But since they say they're not ready, they are not ready. They should know.

I am no way against this project, but I am against automation in this project.