I am using the tag “archaic” to two purposes. First, if the OED calls a word “archaic”, then “archaic” it is tagged! Second, I am using it as an umbrella term to embrace the obsolete words and the rare words as well (as they are labeled by OED). One tag to mean “old”. If you see “rare” or “obsolete”, know that that’s the dictionary’s primary classification of the word in question (not that there aren’t a few words which have earned many different OED labels!).
method
Checking the Tags
Last weekend’s task, extending to today, is double-checking the accuracy of our tags. I’m choosing one where two are closely related, two where I seem to have used one for different things on different days, making sure that every post has at least one tag.
All of this is preparation for tagging the original text in preparation for using Michael Drout’s fascinating lexomics software! My goal is to see if Tech Support’s app works to replace several different words with the same tag, and if so to make at least my first lexomics run!
A Proofreading Day
Today, June 3, 2015, I made a proofreading run through all of the posts so far. I checked spelling, formatting, grammar, syntax, logic. I updated old posts which had incorrect numbers. I fixed tags and added tags; for example, I made sure to add the tag “100K” to those entries which had been made before those outside of the Hundred Thousand were discovered. It has been a delicious day of review and synthesis.
Now I will sleep on all these lovely words which I have gathered up as Smaug did his gold and let them work their way into my dreams and thoughts until they and their patterns are as known to me as every gleaming cup in the treasure-hoard.
Two classes of words remain unrecorded which were in the My Goodness post: compound words and those whose inflections as used in The Hobbit are actually within the Hundred Thousand. As none of them are obsolete or archaic, or food words, or onomatopoeia, I will get to them when next I enter individual words. Those three categories seem to be the best ones to move forward with to the next phase: Lexomics.
Also, bless them, the home servers for this blog seem to be at least four hours east of New Hampshire, as this post has been given a date stamp of June 4th. We run on New Hampshire time at Signum University, however, so I’ll let my date notation of June 3d stand.
Vocables
I learned many years ago from Professor Catriona Parsons that Gàidhlig waulking songs, the work songs which keep the rhythm for hand-fulling woolen cloth, are full of “vocables”. In the first song in the linked video, the group’s words between the solo lines are vocables.
“These are not like fa-la-la,” she said. “They are very ancient sounds and they have meaning, but we have lost the meaning.”
She then taught us very carefully to pronounce these syllables, which usually alternate in the songs with phrases in current lexical use, just as she had heard them growing up on the Isle of Lewis. I fancied that it did not matter if we knew the meaning, as long as those to whom we sang could understand.
Similarly, what’s up with tra-la-la-lally? Corey Olsen, The Tolkien Professor, makes this point: ” tra-la-la-lally
here down in the valley!” [03.014] sounds very much like “tra-la-la-lally” is the name of the thing which is happening down in the valley. These vocables are definitely sound play, only spoken by elves. Do these sounds make those singers a bit alien? Do they remind us that they speak other languages natively? I believe they do. In honor of the play of sound-on-sound in these vocables, I am giving them the ‘Onomatopoeia” tag.
- 03.014 O! tra-la-la-lally
- 03.015 O! tril-lil-lil-lolly
- 19.002 Come! Tra-la-la-lally!
- 19.003 O! Tra-la-la-lally
- 19.004 Fa-la!
- 19.004 Fa-la-la-lally
- 19.004 With Tra-la-la-lally
- 19.004 Tra-la-la-lally
I am separating out the Non-Lexical-Vocables after a bloody morning of trying to find a more suitable word. Haven’t found one yet, might have to ask my fellow scholar Jamie Stinnett.
- 06.077 Ya hey!
- 06.078 Ya hey!
- 06.078 Ya harri-hey!
- 06.078 Ya hoy!
- 06.079 And with that Ya Hoy!
My goodness
Believe it or not, sometimes I get a bit focused on doing a thing thoroughly. I looked at the list after removing The Twenty Thousand – several hundred words. The Thirty Thousand? Several hundred words. I realized I was going to take this right to the end. The Project Gutenberg word frequency lists are not lemmatized, so I am sanguine about using right up to word 100,000 in my stopwords file.
That word is “Apennine”, by the way. The last improper English word is “withes”, the plural of “withe”:
a. A band, tie, or shackle consisting of a tough flexible twig or branch, or of several twisted together; such a twig or branch, as of willow or osier, used for binding or tying, and sometimes for plaiting.
We get “Withywindle” from this word and it is etymologically related to “willow”. Yes, indeed. Word 99,996 is used in Tolkien’s corpus.
When The Hundred Thousand are eliminated from the text of The Hobbit, ninety words remain. The Project Gutenberg words were not lemmatized, so I first went back and checked for the original forms of the words in the text. In a handful of cases my headword (like “rune”) is not in The Hundred Thousand, but the form in the text (“runes”) is. Those words are not among these ninety.
adjoin attercop baa bannock bash bebother beeswax befoul belch benight bewuther boatload bottommost burgle buttertub carrock coalmining cockscomb confusticate crunchable daylong draggle drat firework fizzle flummox fluster foreleg frizzle gammer glede goggle greybeard guardroom haymaking hmmm hobbit homecoming hotfoot jibber kindhearted laburnum lazybone lunchtime manflesh moneybags ninepins nosebag oddment ogres orc parch pinewoods pitter plop plump plunk poach poof porthole quoits rockhewn rockrose roundshield ruddy rune scone scrabble scrumptious shoreland skrike slither slowcoach smithereens snapdragon snivel snuffle spearman summertime thrum tomnoddy undercut underparts upkeep uptake waterlog whizz wobble yammer zig-zag
Words beyond The Hundred Thousand have earned the “100K” tag.
“withe | with, n.” OED Online. Oxford University Press, March 2015. Web. 30 May 2015.
What do we do from here?
First, we have already made entries on seventy five words which are not in the 975 Uncommon Words. We’ll enjoy their entries, and hope that some time in the future I’ll be able to do all the nouns, verbs, and adjectives as Blackwelder did. Most of these are between 5,000th and 10,000th in the frequency lists, although a few like “eyebrows” are very common words which I could not resist making an entry for.
Second, we will consider the riches before us: the Project Gutenberg frequency lists take us all the way through 100,000 words. I originally chose to eliminate ten thousand words in order to have a smaller group I could work with in my limited time frame, words which were uncommon enough to be of interest, words which Richard Blackwelder would have loved to notice.
Let’s see what happens when I increase my filter. Will we reveal the gem-words, the scop’s treasure-hoard, if we remove even more words from the list? Pack a sandwich, the road beckons!
New Numbers
Well, this is an adventure, and I’m tickled pink to be on it!
Now that I have more accurately eliminated The Ten Thousand most common words in Project Gutenberg, I’m ready to report to my fellow Word Fans. I love the fact that our source material is the corpus of older written English – out-of-copyright works which include much of the literature which Tolkien himself knew, like William Morris and Lewis Carroll. The previous list included current written work, and from looking at the word lists I suspected that newspapers and magazines had a proportion of influence which was not quite what we were looking for.
Of the 96,152 words of The Hobbit, 7212 of them are uncommon! That’s about 7.5% of the book, including names of things (like “Bilbo”) as well as plain words (like “wobble”). In English, most writers use uncommon words less than 5% of the time (reports vary from 2.5 to just under 5%).
These 7172 are made of about a thousand individual words plus about 75 names. I have returned to the beginning of this blog and updated the numbers so that new Word Fans are not led astray be contradicting reports.
Double-checking, Project Gutenberg, and The Ten Thousand
As I approach the end of the month, I am double-checking details, spelling, formatting. I checked my stopwords file – the list we have called The Ten Thousand. In their original format, they’re listed as words which are used at least 20 times per million. I set up my filter to take the first ten thousand of those words in order of use, and thus was born my stopwords file. This morning, I checked that not only were there no more than ten thousand words, but that there were indeed at least that many.
Oh, dear. There were just over five thousand.
In searching for my next resource, I discovered that Project Gutenberg’s corpus is being used to create just such a list! It’s a work in progress, updating as more work enters the project, and going out to 100,000 words.
Thank you, Project Gutenberg!
I’ve easily created my new stopwords file and let’s see how things turn out, shall we?
Onomatopoeia
I hope you have enjoyed our survey of sound play in the uncommon words! I am charmed to learn that eighty four of the words – more than 8% of our uncommon words! – were sound-play words such as “Hum, whistle, sh”. Many of those words are repeated, of course: they comprise 316, about one-third of 1% of the total words of the book!
The formation of a word from a sound associated with the thing or action being named; the formation of words imitative of sounds.
The use of echoic or suggestive language
I began with the idea that sound-play words would be light and funny, and that I would be able to tag and track them to identify light-hearted passages. Then a leaf rustled and the dragon hummed. This poetic technique quite simply adds sensation to each scene, intensifying the mood. Sometimes Tolkien even uses the onomatopoeic words to create tone – brightening the scariest parts of his children’s bedtime tale.
Alert Word Fans will see that I captured a few more sound-play words after this post – they are included in this post’s total.
“onomatopoeia, n.” OED Online. Oxford University Press, March 2015. Web. 29 May 2015.
Words held aside
As I began to make entries for individual words, I strove to find words that not just anyone would use, eliminating the Ten Thousand most common, and unique author-created names (although the specific words will change by author, authors have the privilege of creating names for their worlds), and fantasy-genre names, guessing that within the genre, those words would be like the Ten Thousand, and anyone could use them.
But what makes a fantasy word?
Lively dinner-table conversation ensued. Did Tolkien used a word because it’s a fantasy word or is it a fantasy word because Tolkien used it? I have a mattock in the shed, so that’s a humble word, but I classed arrows as fantasy – yet my daughter learns archery at summer camp. The classification removed perhaps 80 words from a field of over 900. Just a drop in the bucket.
I am no longer holding out words – like “elf” or “arrow” – which are uncommon but seem common to fantasy novel fans.
Note on June 3, 2015: If you read the blog chronologically, I have already mentioned that I held fantasy words aside, then abandoned the practice. This post dated May 29th is the day on which I made the decision. As of today June 3d, I edited for retroactive continuity so that new Word Fans would not be confused by changing methods.