My Tech Support

Tech Support solved our situation in the time it took me to make a second cup of coffee.  He made the MarkWords script convert everything to lower case letters, which will certainly take care of business for now.  We’re going to explore preserving capitalization in the original concordance-making script, but first we must be certain of how spreadsheets will handle capitals.

Not only am I grateful for Tech Support’s skills, but is it boastful to say that I am grateful for my child’s can-help attitude?

Constant Vigilance!

You see the problem, don’t you, Word Fans?  I must admit I had to have a good night’s sleep before my subconscious knocked me over the head with it.  How can a paragraph about hobbits have nearly no uncommon words, when “hobbit” is an uncommon word?  In fact about a quarter of the instances of the word “hobbit” are in Chapter 1.

Entirely my fault.  In my eagerness to respond to SonofSaradoc’s excellent question, I cut corners.  Let this be a lesson to us all.  I entered every word from the Concordance page, minus the proper names and my own text, as the MarkWords file.  Ahem.  So every capitalized instance of “Hobbit” was tagged “uncommon”, but most were not.  If “Blackberry” had been in our text, which it wasn’t, it would have been marked… but “blackberrying” was not [04.003], and it should be.  I will now put the kettle on, go back to the drawing board, and enter all the uncommon words, both uppercase and lowercase, not lemmatized.

In fact, this may take a smallish bribe to Tech Support, as his original code did not preserve letter case.  I may just run through the list and capitalize things so that we get both sorts into our MarkWords file, but we will obviously need case preserved in future.

Please enjoy this soothing cup of tea while you’re waiting.

Uncommon Words: High Points in Chapter One

UncommonGraph

The first little local peak comes in the middle of the beautiful song “Far Over the Misty Mountains” – uncommon words and beautiful turns of phrase in the poetry and surrounding it.

The second local peak falls in the middle of the discussion of the adventure.

[01.116] ’That is why I settled on burglary – especially when I remembered the existence of a Side-door. And here is our little Bilbo Baggins, the burglar, the chosen and selected burglar.

There are dragons in this part of the discussion, and heroes and swords… and our dear burglar.  You may recall, Word Fans, that “burglar” is well outside The Ten Thousand most common words.

UPDATE 2015.06.13 There was a grievous error in my method, Word Fans, as reviewed in my post on Constant Vigilance!  I’ve removed any tags from this post so that only you good folks who are reading chronologically will see this little detour into error.

Uncommon Words: Low Point in Mirkwood

I’m sure your eye was caught as mine was by the extreme low point at 44,877.

UncommonGraph

[08.012] ‘Twelve yards! I should have thought it was thirty at least, but my eyes don’t see as well as they used a hundred years ago.

The company in Mirkwood is facing hunger and the effects of prolonged dimness.  They’re trying to cross the river which seems like an impossible task; it’s not their first low point, but they are certainly without time for any fancy words.  You can see that our graph leaps upward sharply right afterward?  That sudden peak comes when Thorin shoots the deer and Bombur falls in the water – hope and despair!

[08.029]  Thorin was the only one who had kept his feet and his wits. As soon as they had landed he had bent his bow and fitted an arrow in case any hidden guardian of the boat appeared. Now he sent a swift and sure shot into the leaping beast. …

[08.030] Before they could shout in praise of the shot, however, a dreadful wail from Bilbo put all thoughts of venison out of their minds. ‘Bombur has fallen in! Bombur is drowning!’ he cried.

UPDATE 2015.06.13 There was a grievous error in my method, Word Fans, as reviewed in my post on Constant Vigilance!  I’ve removed any tags from this post so that only you good folks who are reading chronologically will see this little detour into error.

Mistakenly Presenting: The Uncommon Words

Good morning, Word Fans!  Alert Reader SonofSaradoc asked the very straightforward question, “Where are the uncommon words?”  Frankly, I just imagined that they would be scattered evenly throughout – thank you, SonofSaradoc, for saving me from that very poor bit of judgement!  I ran Lexos with all the uncommon words tagged, and LOOK!  Great news – when I use Lexos right on their web site, I can click on a particular point and get the exact number of the word in the middle of that point.

The clearest graph of this question is the rolling average graph with a window of 1000 words:

UncommonGraph

Sweet quaffle.

The very first low point – a place abundantly wealthy in Common words – is in the middle of this paragraph:

[01.004] The mother of our particular hobbit – what is a hobbit? I suppose hobbits need some description nowadays, since they have become rare and shy of the Big People, as they call us.

Prosaic, common, everyday, ordinary words.  Workhorse words.  You can tell what these words would say on any question without the bother of asking them.  These beautiful, common words are our gateway into Middle Earth, ten furry toes solidly on familiar ground.

The first uncommon words in the novel are at 01.002, “porthole”, “knob”, “tunnel”, and “fond”.

UPDATE 2015.06.13 There was a grievous error in my method, Word Fans, as reviewed in my post on Constant Vigilance!  I’ve removed any tags from this post so that only you good folks who are reading chronologically will see this little detour into error.

The Very Middlemost Word

Were you enchanted by the word and number play of The Faerie Queen, too, Word Fans?  I’m pleased to report that by word count, and including Chapter titles, the very middlemost word of The Hobbit is “creeping”.

[08.058]  After a good deal of creeping and crawling they peered round the trunks and looked into a clearing where some trees had been felled and the ground levelled. There were many people there, elvish-looking folk, all dressed in green and brown and sitting on sawn rings of the felled trees in a great circle. There was a fire in their midst and there were torches fastened to some of the trees round about; but most splendid sight of all: they were eating and drinking and laughing merrily.

Pivotal  moment?  Yes, indeed!

What the Lexos Graphs Tell Us

To explore how to use the graphs created by Lexos, let’s supplement the great documentation on their web site focusing just on what we need for this project.  Lexos looks at a rolling window, which means this:  What if we said, “Looking for the word “IN”, looking in blocks of five words, where do we find our target word?

  • Block 1: In a hole in the – 2 instances of “IN”
  • Block 2: a hole in the ground – 1 instance of “IN”
  • Block 3: hole in the ground there – 1 instance of “IN”
  • Block 4: in the ground there lived – 1 instance of “IN”
  • Block 5: the ground there lived a – 0 instances of “IN”

If we were standing in the book on the word “hole”, each block that contains that word (blocks 1, 2, and 3) would be considered and the number of instances averaged (2 instances, 1 instance, 1 instance,  – average = 1.33).  That’s a rolling average with a window of five, considered from the point of the word “hole”.

Oy.  It is too much.  I shall sum up with a picture.  I asked Lexos to create a graph of the instances of “Gandalf” in the text with a window of 100 words.

GandalfGraph5K

Those are word numbers along the x-bottom axis, which is why we made a word-number-to-chapter-break table in our previous post.  See that gap around word number 44,000?  That’s when he leaves Thorin & Company at the edge of Mirkwood!  Those small spikes between 55,000 and 65,000 are three references to Gandalf, not descriptions of the wizard’s actions or words such as:

[09.031]  ‘Upon my word!’ said Thorin, when Bilbo whispered to him to come out and join his friends, ‘Gandalf spoke true, as usual! A pretty fine burglar you make, it seems, when the time comes.”

Then at 85,000 Gandalf re-enters the action of the book!  We can see the action of the book in the graphic!  I’m so excited that there was a bit of jumping up and down here at Taigh Connlaich when I saw the first lovely picture!   The gap in “Gandalfs” around word 21,000 is, of course, Chapter 5.  Now that we know how to glean information from the Lexos graphs, hold on to your hats.  Tomorrow, Word Fans, we will graph our categories of tagged words!

Words per chapter

When we look at our graphs from Lexos, it’s going to be very useful to know where the Chapter breaks are in the stream of words. For curiosity, of course, I’ve also listed the number of words per chapter.

Chapter ends on word Words in Chapter
One 8,753 8,753
Two 14,029 5,276
Three 16,943 2,914
Four 21,024 4,081
Five 28,045 7,021
Six 34,801 6,756
Seven 43,874 9,073
Eight 54,127 10,253
Nine 59,986 5,859
Ten 63,946 3,960
Eleven 66,962 3,016
Twelve 74,141 7,179
Thirteen 78,082 3,941
Fourteen 81,334 3,252
Fifteen 84,716 3,382
Sixteen 86,876 2,160
Seventeen 90,836 3,960
Eighteen 93,662 2,826
Nineteen 96,154 2,492

Sound Play

I am preparing to graph the frequencies of those words which show sound play – which we have tagged “onomatopoeia” in the word entries.  I cannot but help to include all of Gollum’s extra-sibilant utterances as well as his call-name.  Given that I’m counting the name “Gollum” in this analysis, I must use the best sound-names of the whole novel, Roäc and Carc. Their names suggest the entire Raven language and a suspicion that we readers have heard ravens talking among themselves, not making simple sounds, an idea we have encountered before.

Including those names and Gollumisms, there are 488 sound-play words in our spreadsheet of 7172 uncommon words.