Lemmatizing woes: bid v bid

My goal right now is to lemmatize my list of eighty five hundred uncommon words from The Hobbit.  In other words, if “knit” is in The Ten Thousand most common words, then I should remove “knits, knitted, knitting” from my list of words under examination.  These inflected forms are still “knit” in fancy clothes.

In the course of doing this, we will lose some of the gems.  The Ten Thousand list doesn’t distinguish between “bid, bid, bidden” (offer, as bid at an auction) and “bid, bade, bidden” (entreat, as [06.092] “The Lord of the Eagles bids you”).  I must settle for eliminating those words whose stems match a stem in The Ten Thousand most frequent.

Tonight we must say farewell to arms (weapons), bid adieu to bid (entreat), and blow fair winds to blow (strong hit) in service of certainty in the specialness of the words we end up with.

See you at the other end of the alphabet!

Leave a comment