WordCount app for counting words in a chapter

Comments about LibriVox? Suggestions to improve things? News?
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Hi everyone. I had an idle server so I decided to make use of it. I made a web app which, with input from the user, counts the number of words in the chapters of a Project Gutenberg ebook. The app is called WordCount and this is the link:

https://karikarito.com/wordcount

I of course had LibriVox users in mind when I made the app, so I hope it will be useful to some of our BCs. It is straightforward to use, but there is a link to help pages on the sidebar if you need it. I'll also be around on this thread to help and answer questions or if you want to report a bug.

If the app gets some use, I will try to maintain and improve it and not let it go the way of vanishing websites. Please send me feedback and let me know what improvements or additional features you want. There is also a link on the sidebar for other ways to send feedback.

Thanks! 8-)
Cori
Posts: 12124
Joined: November 22nd, 2005, 10:22 am
Location: Britain
Contact:

Post by Cori »

That's very neat, Isana! I really like it, much easier than highlighting each chapter and using an Opera widget-thing to count.

If I might make one suggestion ... would it be possible to add something to deal with the end of books? Some have a plain The End -- and then sundry olde advertisments and Transcriber Notes or Indexes. Or is it excluding those already?
There's honestly no such thing as a stupid question -- but I'm afraid I can't rule out giving a stupid answer : : To Posterity and Beyond!
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Hi Cori, cool, thank you for trying it!

About the last chapter, yes, I'm aware of cases when there are other things like appendices, indexes, ads, etc. at the end. Currently, this material will be ignored by the app when one uses the manual text entry method of providing the chapter titles when the user adds the appropriate words (appendix, index, the end, etc.) as the last title. This cannot be done yet when one uses the numeral titles method, but I do plan on adding an enhancement so that the user can input words which can serve as an end flag.

Thank you for pointing this out. It's one of the things that I think should be fixed, and I'll make it a priority. 8-)
Cori
Posts: 12124
Joined: November 22nd, 2005, 10:22 am
Location: Britain
Contact:

Post by Cori »

Ahh, I see. Not a big priority, and I love how simple it all is. :D
There's honestly no such thing as a stupid question -- but I'm afraid I can't rule out giving a stupid answer : : To Posterity and Beyond!
kayray
Posts: 11828
Joined: September 26th, 2005, 9:10 am
Location: Union City, California
Contact:

Post by kayray »

Wow this seems like a great tool!

To test it, I found a random book, 48476, which happened to be German. I chose "descriptive titles" and then copied and pasted them out of the text, e.g.

1. Kapitel: Fahrendes Volk
2. Kapitel: In stiller Zelle
3. Kapitel: Jahrmarkt


Wordcount did not work at all. Result:

001
1. Kapitel: Fahrendes Volk : UNDEFINED
first words:
last words:

002
*** CHAPTER TITLE NOT FOUND *** : UNDEFINED
first words:
last words:

003
3. Kapitel: Jahrmarkt : UNDEFINED
first words:
last words:

I think #3 was "title not found" because in the text there is funny spacing, which I guess got stripped out? Dunno.

Does it only work in English? Did I do something wrong? :)
Kara
http://kayray.org/
--------
"Mary wished to say something very sensible into her Zoom H2 Handy Recorder, but knew not how." -- Jane Austen (& Kara)
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Hi kayray. Thanks for the feedback; it does help! It's so kind of you to say it seems like a great tool even if it didn't work. :D

Anyway, I checked PG book 48476. I could tell from the output you provided that it was the second title that was not found. The other two titles were found but the app didn't know how to count the words because of the title that was not found. What caused the problem was that the app uses the text version of the book, not the HTML version, so the titles should be obtained from the text version by the user. From the text version, I found these titles (note the carets in the titles in the text version). The app worked properly when I entered them:

1. Kapitel: Fahrendes Volk
2. Kapitel: ^In stiller Zelle^
3. Kapitel: Jahrmarkt
4. Kapitel: ^Eine Hexe^
5. Kapitel: ^Ein blindes Kind -- ein blinder Richter^
6. Kapitel: ^Richterweisheit^
7. Kapitel: ^Edle Menschen^
8. Kapitel: ^Spürhunde^
9. Kapitel: ^Der Jesuit im Gefängnis^
10. Kapitel: Elsa und Edeltraut vor den Richtern
11. Kapitel: ^Der Richter im Gefängnisse^
12. Kapitel: ^Das Elend und der Wahn wachsen^
13. Kapitel: »Priester im Bunde des Satans«
14. Kapitel: ^Blutiges Morgenrot^
15. Kapitel: ^Der Wahrheit Sieg^
Inhaltsverzeichnis

It should work for German and other languages which use a space to delimit words. It won't work for Japanese or Chinese because they don't use spaces to separate words.

:thumbs:
kayray
Posts: 11828
Joined: September 26th, 2005, 9:10 am
Location: Union City, California
Contact:

Post by kayray »

Ah cool! That does work! So maybe you could add a note to tell us to take the chapter titles from the "Plain Text UTF-8" version, not the html version :)

It is EXTREMELY handy that it also gives the first and last words of the chapter. Wow. I'm super-impressed. Simple and functional. Does one job and does it well.
Kara
http://kayray.org/
--------
"Mary wished to say something very sensible into her Zoom H2 Handy Recorder, but knew not how." -- Jane Austen (& Kara)
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Thank you for the kind words, kayray. Yes, I'll be sure to put a note on the text entry form to use the text version. It will be there on the next update. Your bringing it up here is also sure to help others who read this thread. :thumbs:
Kangaroo692
Posts: 1939
Joined: August 21st, 2014, 9:34 am
Location: Probably the holodeck :)
Contact:

Post by Kangaroo692 »

Thanks a lot, Isana.
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Kangaroo692 wrote:Thanks a lot, Isana.
You're welcome! :)
Kangaroo692
Posts: 1939
Joined: August 21st, 2014, 9:34 am
Location: Probably the holodeck :)
Contact:

Post by Kangaroo692 »

OK. I just tried it, and here are my comments:

- I entered the chapter titles, but there is also (before the the name) a chapter number. Should I have used roman numerals or descriptive titles? Because if I don't use roman numerals, the last words would be "Chapter __"

- Is there any way to split chapters (or add them together)

- Is there any way to let the software find the chapter titles itself?

Other than that, I think this is a great program, except for one thing?

I assume the program searches the text for the words specified for the chapter.

For example, the last chapter in the book I'm soloing is called "Vindication". What if the word is used multiple times in the work?

Other than that, thank you a lot, I really enjoyed this program. Thanks a lot for your hard work! :D
TriciaG
LibriVox Admin Team
Posts: 60576
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

That's pretty cool! It probably works on 90% or more of the projects out there, which will be very handy. :)

I tried it on a non-standard project: ID 42315
Rise and Fall of the Confederate Government, Volume 2,
by Jefferson Davis

It has Roman numerals, which it found, but 2 things make it non-standard:

(1) It starts at chapter 15 and goes to 57. I originally put in the number of chapters, but it didn't find the end ones (go figure!) so I changed it to the number of the final chapter (57), and it did find them all. (It gave "not found" responses for chapters 1-14, which was expected.)

(2) It has a chapter listing with descriptions before the main text, so it found the word counts for the descriptions rather than for the text itself. This occurs on many old texts - histories, memoirs, etc. Perhaps a check box option "Ignore the first set of chapters in text (chapter listing with descriptions before the main text)"?

Overall it looks great! There are just so many variables out there, that it'll probably sneeze at a few projects, no matter how many scenarios you program it to handle. :lol:
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
Humor: My Lady Nicotine
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Thank you for your comments and kind words. I will try to answer your questions.
Kangaroo692 wrote:OK. I just tried it, and here are my comments:

- I entered the chapter titles, but there is also (before the the name) a chapter number. Should I have used roman numerals or descriptive titles? Because if I don't use roman numerals, the last words would be "Chapter __"
I think you can use either method, it's up to you. The chapter titles only add a handful of words to the word count. Since the word counts indicated in magic windows are used for the purpose of estimating how much material there is to read for each section, I think an estimate good to within +/-a few dozen is good enough.
Kangaroo692 wrote: - Is there any way to split chapters (or add them together)
Do you mean in the counting process? Not at this time.
Kangaroo692 wrote: - Is there any way to let the software find the chapter titles itself?
I could try, but the results would probably not be very consistent. But who knows, maybe I'll take the challenge in the future. :shock:
Kangaroo692 wrote: Other than that, I think this is a great program, except for one thing?

I assume the program searches the text for the words specified for the chapter.

For example, the last chapter in the book I'm soloing is called "Vindication". What if the word is used multiple times in the work?
If you look at the text file (not HTML) of a PG ebook, each chapter title is usually on a line by itself, with a blank line before and after it. This is the pattern used in many of the books, and is the pattern the app looks for. If the exact pattern repeats in the text, then the first occurrence of the pattern will be considered the chapter title. Of course, that could be the wrong choice. :D
Kangaroo692 wrote: Other than that, thank you a lot, I really enjoyed this program. Thanks a lot for your hard work! :D
You're welcome, and thank you again! :D
Kangaroo692
Posts: 1939
Joined: August 21st, 2014, 9:34 am
Location: Probably the holodeck :)
Contact:

Post by Kangaroo692 »

Thank you for your response. Please let me know if I can do anything to help.

Would a wiki page be proper for this software?
Isana
Posts: 273
Joined: December 2nd, 2013, 12:46 pm
Location: USA

Post by Isana »

Tricia, thank you very much for that feedback! I have to do something now but I will take a close look at ebook 42315 later tonight. I appreciate that you tried it with a non-standard project. Isn't text format great? We can easily read it, it's genius!
Post Reply