Hyphenated words in the Shire?

Eighty seven.

hobbit-hole
tube-shaped
left-hand
deep-set
well-to-do
good-natured
took-clan
hobbit-hole
make-up
hobbit-hole
smoke-ring
good-morninged
front-door
tea-time
front-door
dark-green
word
old-looking
seed-cake
beer-mug
after-supper
ding-dong-a-ling-dang
hobbit-boy
rat-tat
pop-gun
sky-blue
late-comers
apple-tart
mince-pies
pork-pie
smoke-ring
clay-pipe
smoke-ring
hobbit-hole
deep-throated
dragon-fire
long-forgotten
walking-stick
wood-fire
bag-end
under-hill
beer-barrels
hearth-rug
drawing-room
great-grand-uncle
rabbit-hole
drawing-room
were-worms
great-great-great-grand-uncle
treasure-hunter
tea-time
dish-covers
far-off
side-door
burglar-expert
mock-politeness
out-of-pocket
food-supplies
now-a-days
now-a-days
blacksmith-work
side-door
twenty-first
side-door
spare-rooms
long-forgotten
dressing-gown
dining-room
thank-you
dining-room
dining-room
note-paper
walking-stick
half-finished
pocket-handkerchief
pocket-handkerchief
dark-green
dark-green
hobbit-lands
twenty-second
bag-end
hobbit-hole
elf-friend
lake-town
lake-people
dragon-sickness
tobacco-jar

Distinct and unique words in The Shire

Let’s look at our new word lists carefully.  I have used Lexos for today’s numbers, so we will move to the official Lexos count of words:

  • In the Shire text: 11,073 words
  • Distinct words within the Shire: 3,013 words
  • Words which occur only once in The Shire: 2,029
  • Words which occur only in the Shire: 559 (that’s from me, not Lexos)

That rate of 27% distinct words (3,013/11,073) is crazy high!  And 18% unique words?  In a non-technical work?  pretty much unheard of in contemporary work – I would be excited to compare this to Lewis Carroll, C.S. Lewis, and perhaps Patrick Rothfuss!  (Oh, why is Lexos’ count different from mine?  Lexos counted the word “Chapter” nineteen times, as well as the Roman numerals for chapters, and any numerals in the text).

But before I hand-type the Slow Regard of Silent Things (about half the length of The Hobbit), let’s compare The Hobbit to The Hobbit.  I used our new random-text-grab script to create a same-size file using words from the whole work (Dear Dave Kale and other number fans, the random grab works with replacement).

  • In the Random text: 11,073 words
  • Distinct words: 2,050 words
  • Words which occur only once in the grabbed text: 1,109

So that’s a rate of 18.5% distinct words and 10% unique words.  Drat it all.  You know me, now I have to see those rates for the whole work.

  • In the whole text: 97,436 words
  • Distinct words: 12,325
  • Words which occur only once in the total text: 7,091

We have in the total text a rate of 12.5% distinct words and 7.2% unique words.  The randomly-grabbed text is not the same.

… and Tech Support just texted me that they would be able to moosh on the code so that the floating-point math definitely doesn’t get in the way.  For total transparency, this is where Tech Support goes far and away beyond me.  I’ve heard of floating points, and I cheered in the 90s when they were available because I saw the numbers behave better… but I wouldn’t know a floating point from a non-floating point to save my soul.

 

The Shire and Mirkwood compared to random text grabs.

From earlier this week: The Shire text uses 11,119 words, of which 1,484 do not appear in Mirkwood, this is counting every word used – “yes” counts as six words.  That’s 13.3% Shire words.

What we learned today: The Shire text compared to a random word grab of the same sample size – 1,339 Shire words do not match my random text.  That is basically indistinguishable from the Mirkwood difference.  Hmm, fascinating!  Yet most of our Lexos graphs which show both regions paint them as very different from one another at the word level.  Hold on…

Oho!  the Mirkwood text has more words – 16,400 – and only 1,265 are different from a random grab of 16,400 words in the whole novel.  That’s 7.7%.  Very different, my friends!

Let’s clean that up a bit:

  • Shire text: 11,119 words
  • Shire words not appearing in Mirkwood: 13.3%
  • Shire words not appearing in Random text: 12%
  • Mirkwood text: 16,400 words
  • Mirkwood words not appearing in the Shire text: 14.6
  • Mirkwood words not appearing in Random text: 7.7%

Well, well, well.  time to poke at Mirkwood a bit, friends.  Also, it’s time to use the newly-discovered Lexos feature “how many of these words are unique”!  See you soon!

 

 

Thank you, Tech Support

My very dear Tech Support has added a new tool to the Digital Humanities Toolkit (which is also linked on our About page).  It is random-choice.py and it will grab your choice of a number of words from a given text file as randomly as a computer can grab and present them on your Terminal window along with the ordinal number of that word in the text.  It will grab numbers as though they are words, but it will not grab things inside of square brackets (like our paragraph references) or double-x (like our phrase separator).  It’s a short little bit of code, so I simply copy/pasted it from github to a text file and named it random-choice.py.  Seems to have worked.

Thank you to Daroc Alden, who always has the time to write a little script for their Mama, even during mid-terms.

Then they came to lands where people spoke strangely

I am following a little rabbit-trail, Word Fans, about dialogue and narration in the Shire.  What are the characteristics of these bits which distinguish it from all the other bits?  Won’t this be fun!

[02.028] At first they had passed through hobbit-lands, a wide respectable country inhabited by decent folk, with good roads, an inn or two, and now and then a dwarf or a farmer ambling by on business. Then they came to lands where people spoke strangely, and sang songs Bilbo had never heard before.

It would be luxurious to include all the prose about the Shire as well, but my current project has made me stare at a deadline and hmph at it.  For our purposes, then, I am counting “In the Shire” as from [01.001] to [02.028], up to but not including the words in the title of this post, plus [19.028] to the end, [19.048], inclusive.

To pass on a tantalizing bit of my thought, I’m calling “In Mirkwood” from [07.154] through [09.069], inclusive.

The plan is to use the Mirkwood text as the stopwords to look at the Shire text and vice-versa…  I wonder if I need to do this for all regions and chart their differences from the Shire?  I may have to.  If I don’t come up for air in a few days, please send chocolate.

The Shire text uses 11,119 words, of which 1484 do not appear in Mirkwood, this is counting every word used – “yes” counts as six words.  That’s 13.3% Shire words.  There are 562 words used in the Shire which are not used anywhere else in the book – 5%.  And yes, I see the logical error there and am going to – soon! – compare the Shire Text with a similarly sized sample.  If I’m lucky, Tech Support can create a “grab a random sample of text from here of size N” script.

The Mirkwood text uses 16,400 words, of which 2,400 do not appear in the Shire, and variations on “spider” account for about 60 of these.  14.6% . Nearly identical.  I do find it odd that the Mirkwood text numbers come out on an even “400” – I will chase that for a while with your indulgence, Word Fans.

A Secret Vice

Today I am enjoying the Fimi & Higgins edition of Tolkien’s A Secret Vice. Thrilling to hear in the professor’s own words his thoughts on sound-play.

For us departed are the unsophisticated days, when even Homer could pervert a word to suit sound-music; or such merry freedom as one sees in the Kalevala, when a line can be adorned by words phonetic trills – as in enkä lähe Inkerelle, Penkerelle, pänkerelle (Kal. xi 55), or Ihveniä ahvenia, tuimenia, taimenia (Kal. xlviii 100), where pänkerelle, ihveniä, taimenia are ‘non-significant’, mere notes in a phonetic tune struck to harmonize with penkerelle, or tuimenia which do ‘mean’ something.

Tolkien, J. R. R.. A Secret Vice: Tolkien on Invented Languages, edited by Dimitra Fimi and Andrew Higgins. (Kindle Locations 1347-1352). HarperCollins Publishers. Kindle Edition.