Project Gutenberg Words

As you may recall, I first filtered the text using Leech, Rayson, and Wilson’s word frequencies, which had fewer than ten thousand words.  Later, I discovered the Project Gutenberg word frequency lists, created by the Wiktionary folks, not the Project Gutenberg folks, and filtered again, sure of a ten-thousand word filter.  A couple of hundred words were eliminated from our list (which had been in the 5,000 to 10,000 range), and another seventy five which would have been eliminated had already been added to the concordance, so we let those stand pat.  The Project Gutenberg lists also added the following words to our study:

cargo casket drought fever gigantic hoof itch lair loathsome necklace nibble ninepins whisk whisker wrath

These are words that the modern filter classified as common, but Project Gutenberg – the corpus of out-of-copyright work – called uncommon.  These words have risen in use over the last seventy five years.

Word Fans, is “whisker” an inflection of “whisk” or a word in its own right?  I’ll put the kettle on – let me know what you think.

Leech, Geoffrey, Paul Rayson, and Andrew Wilson. Word Frequencies in Written and Spoken English: Based on the British National Corpus. Harlow: Longman, 2001. Print.

Leave a comment