I finally feel like it’s OK to brag a little. Brenton Dickieson, Tolkien scholar and lecturing professor at Signum University, has called this project “one of the nerdiest Tolkien projects I know about.” I’m just over the moon with this compliment – Dickieson would know from nerdy. His delicious blog, A Pilgrim in Narnia, has a new post about C. S. Lewis, Inklings, fantasy, and other treats at least weekly.
As my Word Fans know, in the last two years I have found a handful more hyphenated words that escaped my 2015 analysis. While I would have been surprised if these few had changed the overall hyphen picture, it’s best to be certain.
This passage is the densest region of hyphens.
[01.117] Swords in these parts are mostly blunt, and axes are used for trees, and shields as cradles or dish-covers; and dragons are comfortably far-off (and therefore legendary). That is why I settled on burglary – especially when I remembered the existence of a Side-door. And here is our little Bilbo Baggins, the burglar, the chosen and selected burglar. So now let’s get on and make some plans.’
[01.118] ‘Very well then,’ said Thorin, ‘supposing the burglar-expert gives us some ideas or suggestions.’ He turned with mock-politeness to Bilbo.
Our graph was created using LEXOS and marked up with GIMP. It acts as a map of the frequency of hyphenated words in The Hobbit across the chapters.
Sometimes it is spelled strung together with no hyphens. Sometimes it is spelled with double Ss and spaces. Twice, however, it is spelled with hyphens, and if I’m trying to be scrupulous about the hyphens, then here we are – Gollum’s signature sound:
- 05.066 S-s-s-s-s,’ hissed Gollum.
- 05.070 S-s-s-s-s,’ said Gollum
Both of these sounds are in the 1937 edition of The Hobbit, for those of you who have been tracking the differences between the elder and younger editions with me.
How did this one escape me? It’s included in “Drip”, of course, but in this behyphenated, reduplicative form, I believe it deserves its own entry – and so it shows up in my hyphen work!
- 05.010 drops drip-drip-dripping from an unseen roof into the water below;
Let’s look at our new word lists carefully. I have used Lexos for today’s numbers, so we will move to the official Lexos count of words:
- In the Shire text: 11,073 words
- Distinct words within the Shire: 3,013 words
- Words which occur only once in The Shire: 2,029
- Words which occur only in the Shire: 559 (that’s from me, not Lexos)
That rate of 27% distinct words (3,013/11,073) is crazy high! And 18% unique words? In a non-technical work? pretty much unheard of in contemporary work – I would be excited to compare this to Lewis Carroll, C.S. Lewis, and perhaps Patrick Rothfuss! (Oh, why is Lexos’ count different from mine? Lexos counted the word “Chapter” nineteen times, as well as the Roman numerals for chapters, and any numerals in the text).
But before I hand-type the Slow Regard of Silent Things (about half the length of The Hobbit), let’s compare The Hobbit to The Hobbit. I used our new random-text-grab script to create a same-size file using words from the whole work (Dear Dave Kale and other number fans, the random grab works with replacement).
- In the Random text: 11,073 words
- Distinct words: 2,050 words
- Words which occur only once in the grabbed text: 1,109
So that’s a rate of 18.5% distinct words and 10% unique words. Drat it all. You know me, now I have to see those rates for the whole work.
- In the whole text: 97,436 words
- Distinct words: 12,325
- Words which occur only once in the total text: 7,091
We have in the total text a rate of 12.5% distinct words and 7.2% unique words. The randomly-grabbed text is not the same.
… and Tech Support just texted me that they would be able to moosh on the code so that the floating-point math definitely doesn’t get in the way. For total transparency, this is where Tech Support goes far and away beyond me. I’ve heard of floating points, and I cheered in the 90s when they were available because I saw the numbers behave better… but I wouldn’t know a floating point from a non-floating point to save my soul.
From earlier this week: The Shire text uses 11,119 words, of which 1,484 do not appear in Mirkwood, this is counting every word used – “yes” counts as six words. That’s 13.3% Shire words.
What we learned today: The Shire text compared to a random word grab of the same sample size – 1,339 Shire words do not match my random text. That is basically indistinguishable from the Mirkwood difference. Hmm, fascinating! Yet most of our Lexos graphs which show both regions paint them as very different from one another at the word level. Hold on…
Oho! the Mirkwood text has more words – 16,400 – and only 1,265 are different from a random grab of 16,400 words in the whole novel. That’s 7.7%. Very different, my friends!
Let’s clean that up a bit:
- Shire text: 11,119 words
- Shire words not appearing in Mirkwood: 13.3%
- Shire words not appearing in Random text: 12%
- Mirkwood text: 16,400 words
- Mirkwood words not appearing in the Shire text: 14.6
- Mirkwood words not appearing in Random text: 7.7%
Well, well, well. time to poke at Mirkwood a bit, friends. Also, it’s time to use the newly-discovered Lexos feature “how many of these words are unique”! See you soon!