wn.morphy

A simple English lemmatizer that finds and removes known suffixes.

See also

The Princeton WordNet documentation describes the original implementation of Morphy.

The Lemmatization and Normalization guide describes how Wn handles lemmatization in general.

Initialized and Uninitialized Morphy

There are two ways of using Morphy in Wn: initialized and uninitialized.

Unintialized Morphy is a simple callable that returns lemma candidates for some given wordform. That is, the results might not be valid lemmas, but this is not a problem in practice because subsequent queries against the database will filter out the invalid ones. This callable is obtained by creating a Morphy object with no arguments:

>>> from wn import morphy
>>> m = morphy.Morphy()

As an uninitialized Morphy cannot predict which lemmas in the result are valid, it always returns the original form and any transformations it can find for each part of speech:

>>> m('lemmata', pos='n')  # exceptional form
{'n': {'lemmata'}}
>>> m('lemmas', pos='n')   # regular morphology with part-of-speech
{'n': {'lemma', 'lemmas'}}
>>> m('lemmas')            # regular morphology for any part-of-speech
{None: {'lemmas'}, 'n': {'lemma'}, 'v': {'lemma'}}
>>> m('wolves')            # invalid forms may be returned
{None: {'wolves'}, 'n': {'wolf', 'wolve'}, 'v': {'wolve', 'wolv'}}

This lemmatizer can also be used with a wn.Wordnet object to expand queries:

>>> import wn
>>> ewn = wn.Wordnet('ewn:2020')
>>> ewn.words('lemmas')
[]
>>> ewn = wn.Wordnet('ewn:2020', lemmatizer=morphy.Morphy())
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]

An initialized Morphy is created with a wn.Wordnet object as its argument. It then uses the wordnet to build lists of valid lemmas and exceptional forms (this takes a few seconds). Once this is done, it will only return lemmas it knows about:

>>> ewn = wn.Wordnet('ewn:2020')
>>> m = morphy.Morphy(ewn)
>>> m('lemmata', pos='n')  # exceptional form
{'n': {'lemma'}}
>>> m('lemmas', pos='n')   # regular morphology with part-of-speech
{'n': {'lemma'}}
>>> m('lemmas')            # regular morphology for any part-of-speech
{'n': {'lemma'}}
>>> m('wolves')            # invalid forms are pre-filtered
{'n': {'wolf'}}

In order to use an initialized Morphy lemmatizer with a wn.Wordnet object, it must be assigned to the object after creation:

>>> ewn = wn.Wordnet('ewn:2020')  # default: lemmatizer=None
>>> ewn.words('lemmas')
[]
>>> ewn.lemmatizer = morphy.Morphy(ewn)
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]

There is little to no difference in the results obtained from a wn.Wordnet object using an initialized or uninitialized Morphy object, but there may be slightly different performance profiles for future queries.

Default Morphy Lemmatizer

As a convenience, an uninitialized Morphy lemmatizer is provided in this module via the morphy member.

wn.morphy.morphy

A Morphy object created without a wn.Wordnet object.

The Morphy Class

class wn.morphy.Morphy(wordnet=None)

The Morphy lemmatizer class.

Objects of this class are callables that take a wordform and an optional part of speech and return a dictionary mapping parts of speech to lemmas. If objects of this class are not created with a wn.Wordnet object, the returned lemmas may be invalid.

Parameters

wordnet (Optional[wn.Wordnet]) – optional wn.Wordnet instance

Example

>>> import wn
>>> from wn.morphy import Morphy
>>> ewn = wn.Wordnet('ewn:2020')
>>> m = Morphy(ewn)
>>> m('axes', pos='n')
{'n': {'axe', 'ax', 'axis'}}
>>> m('geese', pos='n')
{'n': {'goose'}}
>>> m('gooses')
{'n': {'goose'}, 'v': {'goose'}}
>>> m('goosing')
{'v': {'goose'}}