A Little Time-Travel Through Chapter Five

I plan to spend the rest of today exploring differences in word use between Chapter 5 of The Hobbit as we all know and love it and Chapter 5 as it was published in 1937.  At that time, Tolkien didn’t conceive of Gollum’s ring as anything other than a small magical ring with invisibility powers.  Once he began work on The Lord of the Rings, however, he knew the ring to be far more.  He rewrote Chapter 5 and managed to get the whole book re-published.

‘Very well,’ said Bilbo. ‘I will do as you bid. But I will now tell the true story, and if some here have heard me tell it otherwise’ – he looked sidelong at Glóin – ‘I ask them to forget it and forgive me. I only wished to claim the treasure as my very own in those days, and to be rid of the name of thief that was put on me. But perhaps I understand things a little better now. Anyway, this is what happened.’

What we begin with:

1931 1951
Paragraphs 111 145
Words 5258 7,021

I’ll report back  in with the list of which paragraphs are identical as well as the number of uncommon words in each chapter soon, Word Fans.

Tolkien, J.R.R. (2012-02-15). The Lord of the Rings: One Volume (p. 249). Houghton Mifflin Harcourt. Kindle Edition.

Leveling up the presentation

Great news, Word Fans, Tech Support has just downloaded an image-manipulating software for me and given me a basic lesson.  I will spend the afternoon learning how to make labeled graphs for you!

Update: the series of Uncommon Word graphs now have chapter markers, which I hope will help us all with interpreting the graphs.  The lines for the chapter breaks were added by me by hand to a layer in a GIMP file.  I used the Lexos graph’s tell-me-this-word-number function to see where the line should go.  In other words, this was an inexact process, but the results are close enough for a graph that shrinks 96,154 words into about 8 inches.

Further Update: the small labels in the Uncommon Word graphs have been successfully added at the suggestion of my Editor.

The Drawing Board

Very good.

We have a new text with everything MarkWords-ed, capitalized or no.  We have doublechecked our assumptions, gone back to the OED, and decided that “hobbit” is not uncommon.  It’s a Middle Earth word, created by Tolkien, such as any author might be free to name his characters and concepts.  “Hobbit” is not counting as an uncommon word in our analyses.  Now we’re ready to address “how did Tolkien do that?”  We’re looking at how uncommon words contribute to the register of the work.

Let’s make a graph!

My Tech Support

Tech Support solved our situation in the time it took me to make a second cup of coffee.  He made the MarkWords script convert everything to lower case letters, which will certainly take care of business for now.  We’re going to explore preserving capitalization in the original concordance-making script, but first we must be certain of how spreadsheets will handle capitals.

Not only am I grateful for Tech Support’s skills, but is it boastful to say that I am grateful for my child’s can-help attitude?

Constant Vigilance!

You see the problem, don’t you, Word Fans?  I must admit I had to have a good night’s sleep before my subconscious knocked me over the head with it.  How can a paragraph about hobbits have nearly no uncommon words, when “hobbit” is an uncommon word?  In fact about a quarter of the instances of the word “hobbit” are in Chapter 1.

Entirely my fault.  In my eagerness to respond to SonofSaradoc’s excellent question, I cut corners.  Let this be a lesson to us all.  I entered every word from the Concordance page, minus the proper names and my own text, as the MarkWords file.  Ahem.  So every capitalized instance of “Hobbit” was tagged “uncommon”, but most were not.  If “Blackberry” had been in our text, which it wasn’t, it would have been marked… but “blackberrying” was not [04.003], and it should be.  I will now put the kettle on, go back to the drawing board, and enter all the uncommon words, both uppercase and lowercase, not lemmatized.

In fact, this may take a smallish bribe to Tech Support, as his original code did not preserve letter case.  I may just run through the list and capitalize things so that we get both sorts into our MarkWords file, but we will obviously need case preserved in future.

Please enjoy this soothing cup of tea while you’re waiting.

What the Lexos Graphs Tell Us

To explore how to use the graphs created by Lexos, let’s supplement the great documentation on their web site focusing just on what we need for this project.  Lexos looks at a rolling window, which means this:  What if we said, “Looking for the word “IN”, looking in blocks of five words, where do we find our target word?

  • Block 1: In a hole in the – 2 instances of “IN”
  • Block 2: a hole in the ground – 1 instance of “IN”
  • Block 3: hole in the ground there – 1 instance of “IN”
  • Block 4: in the ground there lived – 1 instance of “IN”
  • Block 5: the ground there lived a – 0 instances of “IN”

If we were standing in the book on the word “hole”, each block that contains that word (blocks 1, 2, and 3) would be considered and the number of instances averaged (2 instances, 1 instance, 1 instance,  – average = 1.33).  That’s a rolling average with a window of five, considered from the point of the word “hole”.

Oy.  It is too much.  I shall sum up with a picture.  I asked Lexos to create a graph of the instances of “Gandalf” in the text with a window of 100 words.

GandalfGraph5K

Those are word numbers along the x-bottom axis, which is why we made a word-number-to-chapter-break table in our previous post.  See that gap around word number 44,000?  That’s when he leaves Thorin & Company at the edge of Mirkwood!  Those small spikes between 55,000 and 65,000 are three references to Gandalf, not descriptions of the wizard’s actions or words such as:

[09.031]  ‘Upon my word!’ said Thorin, when Bilbo whispered to him to come out and join his friends, ‘Gandalf spoke true, as usual! A pretty fine burglar you make, it seems, when the time comes.”

Then at 85,000 Gandalf re-enters the action of the book!  We can see the action of the book in the graphic!  I’m so excited that there was a bit of jumping up and down here at Taigh Connlaich when I saw the first lovely picture!   The gap in “Gandalfs” around word 21,000 is, of course, Chapter 5.  Now that we know how to glean information from the Lexos graphs, hold on to your hats.  Tomorrow, Word Fans, we will graph our categories of tagged words!

Words per chapter

When we look at our graphs from Lexos, it’s going to be very useful to know where the Chapter breaks are in the stream of words. For curiosity, of course, I’ve also listed the number of words per chapter.

Chapter ends on word Words in Chapter
One 8,753 8,753
Two 14,029 5,276
Three 16,943 2,914
Four 21,024 4,081
Five 28,045 7,021
Six 34,801 6,756
Seven 43,874 9,073
Eight 54,127 10,253
Nine 59,986 5,859
Ten 63,946 3,960
Eleven 66,962 3,016
Twelve 74,141 7,179
Thirteen 78,082 3,941
Fourteen 81,334 3,252
Fifteen 84,716 3,382
Sixteen 86,876 2,160
Seventeen 90,836 3,960
Eighteen 93,662 2,826
Nineteen 96,154 2,492

Sound Play

I am preparing to graph the frequencies of those words which show sound play – which we have tagged “onomatopoeia” in the word entries.  I cannot but help to include all of Gollum’s extra-sibilant utterances as well as his call-name.  Given that I’m counting the name “Gollum” in this analysis, I must use the best sound-names of the whole novel, Roäc and Carc. Their names suggest the entire Raven language and a suspicion that we readers have heard ravens talking among themselves, not making simple sounds, an idea we have encountered before.

Including those names and Gollumisms, there are 488 sound-play words in our spreadsheet of 7172 uncommon words.

More corrections

How did I get in my head that the book’s total word count is 27,000?  In trying to replicate everything, I found I made this error a few times.  Grievous error.  There are just over 96,000 words in The Hobbit.  Good thing I caught it now!

Update 2015.06.20: I believe I have corrected all the countings and mathings up to this point.  Word Fans, please let me know if you see an error!