More art

The unaccented “be-” prefix comes through our Old English heritage and stands for “about” with many prepositions generalizing to “at or near”  (before, behind, below) and with be- verbs carrying many different meanings of “about” (begird uses the “around” meaning and “bespatter” uses “all-over”).  The OED‘s entry on the be-prefix is absolutely inspiring and I recommend it to anyone who loves the words so much that they are reading a concordance blog.  Among other things,

the force of be- passes over to an object, … Hence it is used to form transitive vbs. on adjectives and substantives

Görlach teaches us that in the 1500s one could add or subtract “be-” as an intensifier or causative just about anywhere one wished.  The OED says further that be- remains “a living element” and may be added even now when appropriate to meaning.  If a living element, then do I count the archaism or obsolescence of the stem?  or of the be-be’d word?  I was once tempted to lemmatize be- words to the form without the prefix.  In doing so, we would lose such beauties as “begone” and “benighted”.  Since we are keeping these, I’ll use the OED‘s classification for the full word, if there is one.  Again, I learn how much of the artist’s touch is required in this work.

Behold, we begin here today.

“be-, prefix.” OED Online. Oxford University Press, March 2015. Web. 7 May 2015.

Görlach, Manfred. Introduction to Early Modern English. Cambridge: Cambridge University Press, 1991. Print.

Mattock

I have learned that there’s a great deal more art than I had supposed in working with words in this way.  I’ve had to make a few judgment calls about which words match a headword in The Ten Thousand, a process which I mistakenly thought would be completely straightforward.  Now that I am working with the uncommon words, I’m working on which of them are humble enough for our purposes.  Some uncommon words, like “dragon”, might be taken as common within the genre.  They are words in a cauldron from which a fantasy author almost must dip if he is to establish his world in its proper place in our imaginations.  We save for another day the question of whether Tolkien used a word from the fantasy stew or whether that word excites fantasy associations because Tolkien used it.

Update: at first I foolishly thought I could identify fantasy-genre-specific words and hold them apart from our discussion.  This attempt led to some excellent dinner table discussions with Grace and the kids – and to the abandonment of a silly idea.  Any school vice-principal worth her salt is at least a little bit of a dragon.

I actually own a mattock.  Mattock is to stump as pickaxe is to bedrock.  While recognizing its use as a weapon in The Hobbit, it’s a regular maintenance tool in my shed.

  • 17.031  in battle they wielded heavy two-handed mattocks;
  • 17.051  wielding their mattocks,

Seven Thousand and Change

During my first pass through the words of The Hobbit which are not in The Ten Thousand, I lemmatized about fifteen hundred as being inflected forms of words in The Ten Thousand.  We are left with seven thousand words to examine.  Tolkien invented many of these words, like “Thorin”  and “Mirkwood”.  Every author names his characters and locations, although the names may already be familiar to the readers, (“Spencer” and “Boston”), so these words don’t directly get at our question.  We will store them up safely in a separate sheet of my Great Spreadsheet of Doom and move on to our study of the non-naming yet non-common words.

Immeasurable

Now, “measure” is in the Ten Thousand most common words.  Is “immeasurable” part of that lemma?  I think it is, word fans, although it has both a prefix and an suffix trying to disguise it.  You see how craftily the words are hiding?  Also, we have made it to the letter I on our first pass.  I have included “immeasurable” as an uncommon word until we have thought over just how finely to separate out our words.  Wouldn’t want it to get lost!

  • 12.014  with wings folded like an immeasurable bat.

Backward

I am rolling through my task of eliminating those words from the concordance which are inflected forms of The Ten Thousand.  I am trying to be ruthless, although my heart hurts as we lose some beauties like “clad”, an elder past participle form of “clothe”.  “Clothes” – the noun – is one of the Ten Thousand.  Does that eliminate the verb “clothe”?  Frankly I might put back “clad” when all is said and done by means of an argument about the archaicness and beauty of its form.

But really, can I afford to keep all the lovely words?  Does that not bias my method?  Does that not leave me with a boatload more words to work with than might be wise for a project of such limited time and resource?  Alas.  For now I will at least try to be ruthless.  Fortunately, I can write a little swan-song here for them.  For a regular present tense noun, if I see the 3d person singular, such as “knits”, I take notice, check The Ten Thousand, find “knit” there, and eliminate “knits” from our consideration.  I’m alert now to the -s ending.  But what about the lack of it?

Tonight’s observation, Hobbit fans, is that “backwards” is among The Ten Thousand, but “backward” is not.  I learn that “backward” as an adjective (I shot him a backward glance) is the usual (but not exclusive) spelling, and that “backward” as an adverb (… and then I fell backwards) is sometimes spelled with the s (but not exclusively).  The -s is more common in British than American writing.  Well, bless.

“Backward”. OED Online. Oxford University Press, March 2015. Web.

Lemmatizing woes: bid v bid

My goal right now is to lemmatize my list of eighty five hundred uncommon words from The Hobbit.  In other words, if “knit” is in The Ten Thousand most common words, then I should remove “knits, knitted, knitting” from my list of words under examination.  These inflected forms are still “knit” in fancy clothes.

In the course of doing this, we will lose some of the gems.  The Ten Thousand list doesn’t distinguish between “bid, bid, bidden” (offer, as bid at an auction) and “bid, bade, bidden” (entreat, as [06.092] “The Lord of the Eagles bids you”).  I must settle for eliminating those words whose stems match a stem in The Ten Thousand most frequent.

Tonight we must say farewell to arms (weapons), bid adieu to bid (entreat), and blow fair winds to blow (strong hit) in service of certainty in the specialness of the words we end up with.

See you at the other end of the alphabet!

How did we get to this list?

If I were to publish a standard concordance (software freely available) of all the words, each entry would have a few words before and a few after my Entry word.  If your computer is clever enough, it could put together the text of The Hobbit from all those overlapping words as you can see:

  • in a HOLE in the ground
  • a hole IN the ground there
  • hole in THE ground there lived.

This is my understanding of scholarly fair use:  I may chop up the words and write about them, but not in a way that your computer could put the text back together.  My idea was to chop up the text approximately into phrases with no overlap between them.  You may know that “in a hole” and “in the ground” are both in paragraph [01.001], but you don’t know in what order.  I marked up my hand-typed copy with [paragraph number] xx at the start of each paragraph and xx where I wanted to chop apart phrases.  Chopping apart phrases was a story in itself, I’m sure a post will come later.

Given that text preparation, my son wrote a Python script to make the concordance and index.  For your own copy of the script, which he publishes under a Lesser General Public License, click here.  You’ll find a Read Me, instructions, the concordance script, and others which he created for this project.

So Many Editions! or all the pretty paragraphs

Many, many editions of The Hobbit abound – hooray that this story is dear to millions of readers!  With many editions, using a page number for a quote or idea reference can be problematic.  In my Hobbit-word study, I’ve made an index of the paragraphs of the work and given each paragraph a unique number.  When you see a quotation here or in the concordance which is my goal, you can just zip to the index to help you find it in your own edition to get context.

1951 Hobbit Paragraph Index

In the future, I’ll be exploring the 1937 edition; here’s the paragraph index of 1942 edition’s Chapter V.  This one is identical as far as I know to the 1937, and John Rateliff kindly helped me to locate this from the Children’s Book Club.

You can also find these paragraph index links on the About page.

Bread and Cheese: overlooking the most common words

While it is possible to write a story without “the be of and a in to have it I”, these top ten most frequently used words and their close neighbors form the “bread and cheese” of the corpus of written work in Modern English.    To examine Tolkien’s special way with words, I wanted to skip past the ten thousand most common words, the words which just anyone might use.  I have in time come to call them The Ten Thousand in my idiolect.

The Hobbit has about 96,000 words.  After eliminating The Ten Thousand common words, which account for all but 2 to 5% of the British National Corpus (depending on whom you ask and how they measure), there remain 7,172.  Less-common words comprise 7.5 per cent of The Hobbit.  Ahem.  Now, over a hundred of those words are “hobbit”.  But only one is “bebother”.

Strap on your goggles, it’s going to be quite an adventure.

Please see the Works Cited page for full information on our sources.

Leech, Rayson, and Wilson. Word Frequencies in Written and Spoken English.