My goodness

Believe it or not, sometimes I get a bit focused on doing a thing thoroughly.  I looked at the list after removing The Twenty Thousand – several hundred words.  The Thirty Thousand?  Several hundred words.  I realized I was going to take this right to the end.  The Project Gutenberg word frequency lists are not lemmatized, so I am sanguine about using right up to word 100,000 in my stopwords file.

That word is “Apennine”, by the way.  The last improper English word is “withes”, the plural of “withe”:

a. A band, tie, or shackle consisting of a tough flexible twig or branch, or of several twisted together; such a twig or branch, as of willow or osier, used for binding or tying, and sometimes for plaiting.

We get “Withywindle” from this word and it is etymologically related to “willow”.  Yes, indeed.  Word 99,996 is used in Tolkien’s corpus.

When The Hundred Thousand are eliminated from the text of The Hobbit, ninety words remain. The Project Gutenberg words were not lemmatized, so I first went back and checked for the original forms of the words in the text.  In a handful of cases my headword (like “rune”) is not in The Hundred Thousand, but the form in the text (“runes”) is.  Those words are not among these ninety.

adjoin attercop baa bannock bash bebother beeswax befoul belch benight bewuther boatload bottommost burgle buttertub carrock coalmining cockscomb confusticate crunchable daylong draggle drat firework fizzle flummox fluster foreleg frizzle gammer glede goggle greybeard guardroom haymaking hmmm hobbit homecoming hotfoot jibber kindhearted laburnum lazybone lunchtime manflesh moneybags ninepins nosebag oddment ogres orc parch pinewoods pitter plop plump plunk poach poof porthole quoits rockhewn rockrose roundshield ruddy rune scone scrabble scrumptious shoreland skrike slither slowcoach smithereens snapdragon snivel snuffle spearman summertime thrum tomnoddy undercut underparts upkeep uptake waterlog whizz wobble yammer zig-zag

Words beyond The Hundred Thousand have earned the “100K” tag.

“withe | with, n.” OED Online. Oxford University Press, March 2015. Web. 30 May 2015.

What do we do from here?

First, we have already made entries on seventy five words which are not in the 975 Uncommon Words.  We’ll enjoy their entries, and hope that some time in the future I’ll be able to do all the nouns, verbs, and adjectives as Blackwelder did.  Most of these are between 5,000th and 10,000th in the frequency lists, although a few like “eyebrows” are very common words which I could not resist making an entry for.

Second, we will consider the riches before us: the Project Gutenberg frequency lists take us all the way through 100,000 words.  I originally chose to eliminate ten thousand words in order to have a smaller group I could work with in my limited time frame, words which were uncommon enough to be of interest, words which Richard Blackwelder would have loved to notice.

Let’s see what happens when I increase my filter.  Will we reveal the gem-words, the scop’s treasure-hoard, if we remove even more words from the list?  Pack a sandwich, the road beckons!

New Numbers

Well, this is an adventure, and I’m tickled pink to be on it!

Now that I have more accurately eliminated The Ten Thousand most common words in Project Gutenberg, I’m ready to report to my fellow Word Fans.  I love the fact that our source material is the corpus of older written English – out-of-copyright works which include much of the literature which Tolkien himself knew, like William Morris and Lewis Carroll.  The previous list included current written work, and from looking at the word lists I suspected that newspapers and magazines had a proportion of influence which was not quite what we were looking for.

Of the 96,152 words of The Hobbit, 7212 of them are uncommon!  That’s about 7.5% of the book, including names of things (like “Bilbo”) as well as plain words (like “wobble”).  In English, most writers use uncommon words less than 5% of the time (reports vary from 2.5 to just under 5%).

These 7172 are made of about a thousand individual words plus about 75 names.  I have returned to the beginning of this blog and updated the numbers so that new Word Fans are not led astray be contradicting reports.

Double-checking, Project Gutenberg, and The Ten Thousand

As I approach the end of the month, I am double-checking details, spelling, formatting.  I checked my stopwords file – the list we have called The Ten Thousand.  In their original format, they’re listed as words which are used at least 20 times per million.  I set up my filter to take the first ten thousand of those words in order of use, and thus was born my stopwords file.  This morning, I checked that not only were there no more than ten thousand words, but that there were indeed at least that many.

Oh, dear.  There were just over five thousand.

In searching for my next resource, I discovered that Project Gutenberg’s corpus is being used to create just such a list!  It’s a work in progress, updating as more work enters the project, and going out to 100,000 words.

Thank you, Project Gutenberg!

I’ve easily created my new stopwords file and let’s see how things turn out, shall we?

Purr

Genius.  A giant tom-cat.  A dragon.  The most dangerous creature in the world.  With one word, we see the absolute confidence of the dragon, the completely athletic competence and grace.  With the same word, the father takes a tiny bit of the sting of fear out of the tale.  Yet we hear the rumble of the furnace.

  • 12.011 mixed with a rumble as of a gigantic tom-cat purring.

Onomatopoeia

I hope you have enjoyed our survey of sound play in the uncommon words!  I am charmed to learn that eighty four of the words – more than 8% of our uncommon words! – were sound-play words such as “Hum, whistle, sh”.  Many of those words are repeated, of course: they comprise 316, about one-third of 1% of the total words of the book!

The formation of a word from a sound associated with the thing or action being named; the formation of words imitative of sounds.

The use of echoic or suggestive language

I began with the idea that sound-play words would be light and funny, and that I would be able to tag and track them to identify light-hearted passages.  Then a leaf rustled and the dragon hummed.  This poetic technique quite simply adds sensation to each scene, intensifying the mood.  Sometimes Tolkien even uses the onomatopoeic words to create tone – brightening the scariest parts of his children’s bedtime tale.

Alert Word Fans will see that I captured a few more sound-play words after this post – they are included in this post’s total.

“onomatopoeia, n.” OED Online. Oxford University Press, March 2015. Web. 29 May 2015.