Word count a whole webpage (no copy/paste!)

Post your questions & get help from friendly LibriVoxers
Post Reply
quartertone
Posts: 257
Joined: December 27th, 2022, 2:27 pm
Location: Narnia
Contact:

Post by quartertone »

(tl;dr - Give this web app the Gutenberg plain text link, and subtract ~3000 from the result to get the approximate word count of the book.)

https://wordcounter.net/website-word-count

A useful tool I've found is this webpage word counter. Before I found this, I had been taking text from the source, copy pasting into a text editor and using the built in word counting feature. This was cumbersome, taking multiple clicks and mouse movements, and frequently having to redo the text selection when the text gets accidentally deselected by a careless action.

This counter is a simpler, one step process, and is equally effortless on a mobile device.

For example when using the Gutenberg "plain text" link, I've estimated that the top disclaimer is usually somewhere around 80-90 words depending on the title, author, and other credits. The bottom disclaimer is around 2950 words plus or minus, also depending on title/author.

So if you just subtract about 3000 from the result of the webpage word counter you'll get the approximate word count of just the book text.

Just thought I'd share this in case there are other readers out there who are just as lazy as I am. :D
Rapunzelina
LibriVox Admin Team
Posts: 17758
Joined: November 15th, 2011, 3:47 am

Post by Rapunzelina »

I'm using a Mozilla Firefox add-on for word counts, but you still have to highlight the text you want to word count and then it's an option in the right-click menu. I haven't tested how accurate it is though :mrgreen:

Thanks for the suggestion on the website word counter!
TriciaG
LibriVox Admin Team
Posts: 60737
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

I've got a bookmarklet - a little piece of javascript code on my bookmark toolbar. It looks like a regular bookmark but isn't. :)

I highlight the text I want to count and click the bookmark, and there's the word count!

Put this in the URL field of the bookmark:

Code: Select all

javascript:d=window.getSelection()+'';%20d=(d.length==0)?document.title:d;%20alert(d.split('%20').length+'%20words,%20'+d.length+'%20characters');
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
ej400
Posts: 5261
Joined: September 24th, 2014, 10:26 am
Location: Minnesota
Contact:

Post by ej400 »

TriciaG wrote: January 29th, 2023, 5:52 am I've got a bookmarklet - a little piece of javascript code on my bookmark toolbar. It looks like a regular bookmark but isn't. :)

I highlight the text I want to count and click the bookmark, and there's the word count!

Put this in the URL field of the bookmark:

Code: Select all

javascript:d=window.getSelection()+'';%20d=(d.length==0)?document.title:d;%20alert(d.split('%20').length+'%20words,%20'+d.length+'%20characters');
This has to be the easiest way! That is incredibly useful, Tricia. Thank you very much for sharing that! :D
knotyouraveragejo
LibriVox Admin Team
Posts: 22118
Joined: November 18th, 2006, 4:37 pm

Post by knotyouraveragejo »

I use an extension for Chrome called Word Count. I have it pinned to the toolbar so it's always just a click away. It works just like Tricia's javascript, but you download it from the Chrome Web Store and add it to Chrome as an extension. See

https://chrome.google.com/webstore/detail/word-count/pnngehidikgomgfjbpffonkeimgbpjlh

There are comparable plugins for other browsers.
Jo
quartertone
Posts: 257
Joined: December 27th, 2022, 2:27 pm
Location: Narnia
Contact:

Post by quartertone »

ej400 wrote: January 30th, 2023, 1:53 am
TriciaG wrote: January 29th, 2023, 5:52 am I've got a bookmarklet - a little piece of javascript code on my bookmark toolbar. It looks like a regular bookmark but isn't. :)

I highlight the text I want to count and click the bookmark, and there's the word count!

Put this in the URL field of the bookmark:

Code: Select all

javascript:d=window.getSelection()+'';%20d=(d.length==0)?document.title:d;%20alert(d.split('%20').length+'%20words,%20'+d.length+'%20characters');
This has to be the easiest way! That is incredibly useful, Tricia. Thank you very much for sharing that! :D


Hi LibriVoxers,

I work hard to be lazy. :lol:
I took the bookmarklet that TriciaG shared, and modified it:

* This will accurately word-count Gutenberg texts with one click, without needing to select the text.
* It only works using the plainText version of the text.
* This will parse only the text of the book, everything between the *** START OF _____ *** through to *** END OF ____ ***. (This does include the "transcriber's note" that is sometimes present, but that is easily subtracted from the total count.)

Note: This bookmarklet more accurately counts every word, taking into account "newline" characters that are not parsed by the original word count bookmarklet. That one only separates words by spaces, so selections that span multiple paragraphs (in HTML view) or lines (in plaintext) result in slightly or greatly inaccurate word counts.

To use:
* save this code as a bookmarklet
* Navigate to a Gutenberg title, and click to the Plain Text version
* Click the bookmarklet (no text selection needed)

Code: Select all

javascript:if(document.location.toString().match(/gutenberg.org.*\.txt$/)){d=document.body.innerHTML.match(/(?<=\*{3} START.*? \*{3}).*(?=\*{3} END.*?\*{3})/s)[0].trim();alert(d.split(/[ \n\*/]+/).length+' words, '+d.length+' characters');}else{alert("Use the Gutenberg plaintext version");}

For completeness, here is the original bookmarklet that was posted by TriciaG, modified to be as accurate as above (splits on space, newline and "*").

Code: Select all

javascript:d=(window.getSelection()+'').trim();d=(d.length==0)?document.title:d;alert(d.split(/[ \n\*]+/).length+' words,'+d.length+' characters');
Happy counting!

Edit: Updated the code for both bookmarlets so that they would ignore non-word characters at the beginning and end of the selection.
Post Reply