LibriVox
Forums

* FAQ    * Search
* Login   * Register
It is currently December 17th, 2017, 2:35 pm


Post new topic Reply to topic  Page 1 of 6  [ 78 posts ] 
Go to page 1, 2, 3, 4, 5, 6  Next

Author Message
Offline
Post Posted:: March 13th, 2015, 9:21 pm 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Hi everyone. I had an idle server so I decided to make use of it. I made a web app which, with input from the user, counts the number of words in the chapters of a Project Gutenberg ebook. The app is called WordCount and this is the link:

https://karikarito.com/wordcount

I of course had LibriVox users in mind when I made the app, so I hope it will be useful to some of our BCs. It is straightforward to use, but there is a link to help pages on the sidebar if you need it. I'll also be around on this thread to help and answer questions or if you want to report a bug.

If the app gets some use, I will try to maintain and improve it and not let it go the way of vanishing websites. Please send me feedback and let me know what improvements or additional features you want. There is also a link on the sidebar for other ways to send feedback.

Thanks! 8-)


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 4:24 am 
LibriVox Admin Team

Joined: November 22nd, 2005, 10:22 am
Posts: 11735
Location: Great Britain
That's very neat, Isana! I really like it, much easier than highlighting each chapter and using an Opera widget-thing to count.

If I might make one suggestion ... would it be possible to add something to deal with the end of books? Some have a plain The End -- and then sundry olde advertisments and Transcriber Notes or Indexes. Or is it excluding those already?

_________________
There's honestly no such thing as a stupid question -- but I'm afraid I can't rule out giving a stupid answer : : To Posterity and Beyond!


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 4:47 am 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Hi Cori, cool, thank you for trying it!

About the last chapter, yes, I'm aware of cases when there are other things like appendices, indexes, ads, etc. at the end. Currently, this material will be ignored by the app when one uses the manual text entry method of providing the chapter titles when the user adds the appropriate words (appendix, index, the end, etc.) as the last title. This cannot be done yet when one uses the numeral titles method, but I do plan on adding an enhancement so that the user can input words which can serve as an end flag.

Thank you for pointing this out. It's one of the things that I think should be fixed, and I'll make it a priority. 8-)


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 5:14 am 
LibriVox Admin Team

Joined: November 22nd, 2005, 10:22 am
Posts: 11735
Location: Great Britain
Ahh, I see. Not a big priority, and I love how simple it all is. :D

_________________
There's honestly no such thing as a stupid question -- but I'm afraid I can't rule out giving a stupid answer : : To Posterity and Beyond!


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 9:52 am 
LibriVox Admin Team

Joined: September 26th, 2005, 9:10 am
Posts: 11744
Location: Union City, California
Wow this seems like a great tool!

To test it, I found a random book, 48476, which happened to be German. I chose "descriptive titles" and then copied and pasted them out of the text, e.g.

1. Kapitel: Fahrendes Volk
2. Kapitel: In stiller Zelle
3. Kapitel: Jahrmarkt


Wordcount did not work at all. Result:

001
1. Kapitel: Fahrendes Volk : UNDEFINED
first words:
last words:

002
*** CHAPTER TITLE NOT FOUND *** : UNDEFINED
first words:
last words:

003
3. Kapitel: Jahrmarkt : UNDEFINED
first words:
last words:

I think #3 was "title not found" because in the text there is funny spacing, which I guess got stripped out? Dunno.

Does it only work in English? Did I do something wrong? :)

_________________
Kara
http://kayray.org/
--------
"Mary wished to say something very sensible into her Zoom H2 Handy Recorder, but knew not how." -- Jane Austen (& Kara)


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 10:53 am 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Hi kayray. Thanks for the feedback; it does help! It's so kind of you to say it seems like a great tool even if it didn't work. :D

Anyway, I checked PG book 48476. I could tell from the output you provided that it was the second title that was not found. The other two titles were found but the app didn't know how to count the words because of the title that was not found. What caused the problem was that the app uses the text version of the book, not the HTML version, so the titles should be obtained from the text version by the user. From the text version, I found these titles (note the carets in the titles in the text version). The app worked properly when I entered them:

1. Kapitel: Fahrendes Volk
2. Kapitel: ^In stiller Zelle^
3. Kapitel: Jahrmarkt
4. Kapitel: ^Eine Hexe^
5. Kapitel: ^Ein blindes Kind -- ein blinder Richter^
6. Kapitel: ^Richterweisheit^
7. Kapitel: ^Edle Menschen^
8. Kapitel: ^Spürhunde^
9. Kapitel: ^Der Jesuit im Gefängnis^
10. Kapitel: Elsa und Edeltraut vor den Richtern
11. Kapitel: ^Der Richter im Gefängnisse^
12. Kapitel: ^Das Elend und der Wahn wachsen^
13. Kapitel: »Priester im Bunde des Satans«
14. Kapitel: ^Blutiges Morgenrot^
15. Kapitel: ^Der Wahrheit Sieg^
Inhaltsverzeichnis

It should work for German and other languages which use a space to delimit words. It won't work for Japanese or Chinese because they don't use spaces to separate words.

:thumbs:


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 11:05 am 
LibriVox Admin Team

Joined: September 26th, 2005, 9:10 am
Posts: 11744
Location: Union City, California
Ah cool! That does work! So maybe you could add a note to tell us to take the chapter titles from the "Plain Text UTF-8" version, not the html version :)

It is EXTREMELY handy that it also gives the first and last words of the chapter. Wow. I'm super-impressed. Simple and functional. Does one job and does it well.

_________________
Kara
http://kayray.org/
--------
"Mary wished to say something very sensible into her Zoom H2 Handy Recorder, but knew not how." -- Jane Austen (& Kara)


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 11:33 am 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Thank you for the kind words, kayray. Yes, I'll be sure to put a note on the text entry form to use the text version. It will be there on the next update. Your bringing it up here is also sure to help others who read this thread. :thumbs:


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 11:36 am 

Joined: August 21st, 2014, 9:34 am
Posts: 1951
Location: Probably the holodeck :)
Thanks a lot, Isana.


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 11:45 am 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Kangaroo692 wrote:
Thanks a lot, Isana.


You're welcome! :)


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 11:55 am 

Joined: August 21st, 2014, 9:34 am
Posts: 1951
Location: Probably the holodeck :)
OK. I just tried it, and here are my comments:

- I entered the chapter titles, but there is also (before the the name) a chapter number. Should I have used roman numerals or descriptive titles? Because if I don't use roman numerals, the last words would be "Chapter __"

- Is there any way to split chapters (or add them together)

- Is there any way to let the software find the chapter titles itself?

Other than that, I think this is a great program, except for one thing?

I assume the program searches the text for the words specified for the chapter.

For example, the last chapter in the book I'm soloing is called "Vindication". What if the word is used multiple times in the work?

Other than that, thank you a lot, I really enjoyed this program. Thanks a lot for your hard work! :D


Top
 Profile  
Online
Post Posted:: March 14th, 2015, 12:13 pm 
LibriVox Admin Team

Joined: June 15th, 2008, 10:30 pm
Posts: 36950
Location: Toronto, ON (but Minnesotan to age 32)
That's pretty cool! It probably works on 90% or more of the projects out there, which will be very handy. :)

I tried it on a non-standard project: ID 42315
Rise and Fall of the Confederate Government, Volume 2,
by Jefferson Davis

It has Roman numerals, which it found, but 2 things make it non-standard:

(1) It starts at chapter 15 and goes to 57. I originally put in the number of chapters, but it didn't find the end ones (go figure!) so I changed it to the number of the final chapter (57), and it did find them all. (It gave "not found" responses for chapters 1-14, which was expected.)

(2) It has a chapter listing with descriptions before the main text, so it found the word counts for the descriptions rather than for the text itself. This occurs on many old texts - histories, memoirs, etc. Perhaps a check box option "Ignore the first set of chapters in text (chapter listing with descriptions before the main text)"?

Overall it looks great! There are just so many variables out there, that it'll probably sneeze at a few projects, no matter how many scenarios you program it to handle. :lol:

_________________
Original journals on the Exploration of the Mississippi: Here
Fiction, partly about jail atrocities: It Is Never too Late
Watergate Report, volume 2: Here


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 12:21 pm 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Thank you for your comments and kind words. I will try to answer your questions.

Kangaroo692 wrote:
OK. I just tried it, and here are my comments:

- I entered the chapter titles, but there is also (before the the name) a chapter number. Should I have used roman numerals or descriptive titles? Because if I don't use roman numerals, the last words would be "Chapter __"

I think you can use either method, it's up to you. The chapter titles only add a handful of words to the word count. Since the word counts indicated in magic windows are used for the purpose of estimating how much material there is to read for each section, I think an estimate good to within +/-a few dozen is good enough.

Kangaroo692 wrote:
- Is there any way to split chapters (or add them together)

Do you mean in the counting process? Not at this time.

Kangaroo692 wrote:
- Is there any way to let the software find the chapter titles itself?

I could try, but the results would probably not be very consistent. But who knows, maybe I'll take the challenge in the future. :shock:

Kangaroo692 wrote:
Other than that, I think this is a great program, except for one thing?

I assume the program searches the text for the words specified for the chapter.

For example, the last chapter in the book I'm soloing is called "Vindication". What if the word is used multiple times in the work?

If you look at the text file (not HTML) of a PG ebook, each chapter title is usually on a line by itself, with a blank line before and after it. This is the pattern used in many of the books, and is the pattern the app looks for. If the exact pattern repeats in the text, then the first occurrence of the pattern will be considered the chapter title. Of course, that could be the wrong choice. :D

Kangaroo692 wrote:
Other than that, thank you a lot, I really enjoyed this program. Thanks a lot for your hard work! :D

You're welcome, and thank you again! :D


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 12:24 pm 

Joined: August 21st, 2014, 9:34 am
Posts: 1951
Location: Probably the holodeck :)
Thank you for your response. Please let me know if I can do anything to help.

Would a wiki page be proper for this software?


Top
 Profile  
Offline
Post Posted:: March 14th, 2015, 12:29 pm 

Joined: December 2nd, 2013, 12:46 pm
Posts: 271
Location: USA
Tricia, thank you very much for that feedback! I have to do something now but I will take a close look at ebook 42315 later tonight. I appreciate that you tried it with a non-standard project. Isn't text format great? We can easily read it, it's genius!


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 78 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next



Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group