What I think? If you have a pre-conceived notion of what results should show, you can apply various techniques to drive your statistics to be close to what you expect.
You are planning to use that method to give some kind of rating to various texts, and to suggest the ones with lower score for learning, yes? The proof that your scoring method is close to reality would be if it ranks texts in the same way they are used in English-speaking countries in the elementary, middle, high schools and colleges as material for studying English for native speakers. The texts don't have to be PD, you know... Once you have proved that your method works, you can run it over the entire LV collection (or Project Gutenberg's)
Books to study English?
tovarisch
- reality prompts me to scale down my reading, sorry to say
to PLers: do correct my pronunciation please
-
- Posts: 7
- Joined: January 17th, 2018, 3:05 pm
tovarisch : I think that's a very good idea. Thank you. It would even make sense to use some machine learning to build a classifier, using learning materials for schools as a training set. Let's see if I could do this. (... Unless this is an overkill )
-
- Posts: 5849
- Joined: November 24th, 2005, 3:54 am
- Location: Chigwell (North-East London, U.K.)
I've only read the first two on your list, so I can't say how accurate those grades might be. I would agree that Alice is more complex than the Wizard of Oz. You might try to find a really simple first reader text to act as your baseline ... "The cat sat on the mat" level.
And I agree with you that the Flesch-Kincaid test can't give much indication of the complexity of a text, though sentence length is a good pointer in that direction.
Peter
And I agree with you that the Flesch-Kincaid test can't give much indication of the complexity of a text, though sentence length is a good pointer in that direction.
Peter
"I think, therefore I am, I think." Solomon Cohen, in Terry Pratchett's Dodger
-
- Posts: 7
- Joined: January 17th, 2018, 3:05 pm
So, I was able to make-up the metric for book complexity. Basically, I use an average between 2 metrics:
1. one taken from http://readable.io
2. one is the average complexity of the words (compared against the table of top 50K English books), normalization is based on min and max values over the books that I already have.
The result over the existent books can be found here: https://youcanreadit.com/books-to-read-and-learn-english/ , I think it looks pretty plausible. Wdyt?
1. one taken from http://readable.io
2. one is the average complexity of the words (compared against the table of top 50K English books), normalization is based on min and max values over the books that I already have.
The result over the existent books can be found here: https://youcanreadit.com/books-to-read-and-learn-english/ , I think it looks pretty plausible. Wdyt?
If only I could read The Scarlet Letter without using English dictionary!
!!!!!!.!!!!!!.!!!!.!!!!!!!!!..!!!.!!!!!!!!!!!...!!!!!!!!!.!!!!!!.!!!!.!!!!!!.!!!!
No way. He stole away a pretty thing, you know.
That's your heart.
!!!!.!!!!!!.!!!!.!!!!!!!!!..!!!.!!!!!!!!!!!...!!!!!.!!!!!!.!!!!!!!!.!!!!!!.!!!!!!
No way. He stole away a pretty thing, you know.
That's your heart.
!!!!.!!!!!!.!!!!.!!!!!!!!!..!!!.!!!!!!!!!!!...!!!!!.!!!!!!.!!!!!!!!.!!!!!!.!!!!!!