Books to study English?

Everything except LibriVox (yes, this is where knitting gets discussed. Now includes non-LV Volunteers Wanted projects)
Posts: 2774
Joined: February 24th, 2013, 7:14 am
Location: New Hampshire, USA

Post by tovarisch » January 30th, 2018, 5:45 pm

What I think? :hmm: If you have a pre-conceived notion of what results should show, you can apply various techniques to drive your statistics to be close to what you expect. :P

You are planning to use that method to give some kind of rating to various texts, and to suggest the ones with lower score for learning, yes? The proof that your scoring method is close to reality would be if it ranks texts in the same way they are used in English-speaking countries in the elementary, middle, high schools and colleges as material for studying English for native speakers. The texts don't have to be PD, you know... Once you have proved that your method works, you can run it over the entire LV collection (or Project Gutenberg's) :wink:
  • reality prompts me to scale down my reading, sorry to say
    to PLers: do correct my pronunciation please

Posts: 7
Joined: January 17th, 2018, 3:05 pm

Post by stillwaiting » January 30th, 2018, 6:17 pm

tovarisch : I think that's a very good idea. Thank you. It would even make sense to use some machine learning to build a classifier, using learning materials for schools as a training set. Let's see if I could do this. (... Unless this is an overkill :) )

Peter Why
Posts: 4486
Joined: November 24th, 2005, 3:54 am
Location: Chigwell (North-East London, U.K.)

Post by Peter Why » January 31st, 2018, 1:10 am

I've only read the first two on your list, so I can't say how accurate those grades might be. I would agree that Alice is more complex than the Wizard of Oz. You might try to find a really simple first reader text to act as your baseline ... "The cat sat on the mat" level.

And I agree with you that the Flesch-Kincaid test can't give much indication of the complexity of a text, though sentence length is a good pointer in that direction.

"I think, therefore I am, I think." Solomon Cohen, in Terry Pratchett's Dodger

LibriVox Admin Team
Posts: 31945
Joined: April 3rd, 2008, 3:55 am
Location: Melbourne,Australia

Post by annise » January 31st, 2018, 1:42 am

The ontario readers are age level graded to 3rd and fourth and the ambleside book lists too (see Wiki)


Posts: 7
Joined: January 17th, 2018, 3:05 pm

Post by stillwaiting » March 24th, 2018, 3:10 pm

So, I was able to make-up the metric for book complexity. Basically, I use an average between 2 metrics:
1. one taken from

2. one is the average complexity of the words (compared against the table of top 50K English books), normalization is based on min and max values over the books that I already have.

The result over the existent books can be found here: , I think it looks pretty plausible. Wdyt?

Post Reply