wn.morphy¶
A simple English lemmatizer that finds and removes known suffixes.
See also
The Princeton WordNet documentation describes the original implementation of Morphy.
The Lemmatization and Normalization guide describes how Wn handles lemmatization in general.
Initialized and Uninitialized Morphy¶
There are two ways of using Morphy in Wn: initialized and uninitialized.
Unintialized Morphy is a simple callable that returns lemma
candidates for some given wordform. That is, the results might not
be valid lemmas, but this is not a problem in practice because
subsequent queries against the database will filter out the invalid
ones. This callable is obtained by creating a Morphy
object
with no arguments:
>>> from wn import morphy
>>> m = morphy.Morphy()
As an uninitialized Morphy cannot predict which lemmas in the result are valid, it always returns the original form and any transformations it can find for each part of speech:
>>> m('lemmata', pos='n') # exceptional form
{'n': {'lemmata'}}
>>> m('lemmas', pos='n') # regular morphology with part-of-speech
{'n': {'lemma', 'lemmas'}}
>>> m('lemmas') # regular morphology for any part-of-speech
{None: {'lemmas'}, 'n': {'lemma'}, 'v': {'lemma'}}
>>> m('wolves') # invalid forms may be returned
{None: {'wolves'}, 'n': {'wolf', 'wolve'}, 'v': {'wolve', 'wolv'}}
This lemmatizer can also be used with a wn.Wordnet
object to
expand queries:
>>> import wn
>>> ewn = wn.Wordnet('ewn:2020')
>>> ewn.words('lemmas')
[]
>>> ewn = wn.Wordnet('ewn:2020', lemmatizer=morphy.Morphy())
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]
An initialized Morphy is created with a wn.Wordnet
object as
its argument. It then uses the wordnet to build lists of valid lemmas
and exceptional forms (this takes a few seconds). Once this is done,
it will only return lemmas it knows about:
>>> ewn = wn.Wordnet('ewn:2020')
>>> m = morphy.Morphy(ewn)
>>> m('lemmata', pos='n') # exceptional form
{'n': {'lemma'}}
>>> m('lemmas', pos='n') # regular morphology with part-of-speech
{'n': {'lemma'}}
>>> m('lemmas') # regular morphology for any part-of-speech
{'n': {'lemma'}}
>>> m('wolves') # invalid forms are pre-filtered
{'n': {'wolf'}}
In order to use an initialized Morphy lemmatizer with a
wn.Wordnet
object, it must be assigned to the object after
creation:
>>> ewn = wn.Wordnet('ewn:2020') # default: lemmatizer=None
>>> ewn.words('lemmas')
[]
>>> ewn.lemmatizer = morphy.Morphy(ewn)
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]
There is little to no difference in the results obtained from a
wn.Wordnet
object using an initialized or uninitialized
Morphy
object, but there may be slightly different
performance profiles for future queries.
Default Morphy Lemmatizer¶
As a convenience, an uninitialized Morphy lemmatizer is provided in
this module via the morphy
member.
- wn.morphy.morphy¶
A
Morphy
object created without awn.Wordnet
object.
The Morphy Class¶
- class wn.morphy.Morphy(wordnet=None)¶
The Morphy lemmatizer class.
Objects of this class are callables that take a wordform and an optional part of speech and return a dictionary mapping parts of speech to lemmas. If objects of this class are not created with a
wn.Wordnet
object, the returned lemmas may be invalid.- Parameters
wordnet (Optional[wn.Wordnet]) – optional
wn.Wordnet
instance
Example
>>> import wn >>> from wn.morphy import Morphy >>> ewn = wn.Wordnet('ewn:2020') >>> m = Morphy(ewn) >>> m('axes', pos='n') {'n': {'axe', 'ax', 'axis'}} >>> m('geese', pos='n') {'n': {'goose'}} >>> m('gooses') {'n': {'goose'}, 'v': {'goose'}} >>> m('goosing') {'v': {'goose'}}