wn.taxonomy

Functions for working with hypernym/hyponym taxonomies.

Overview

Among the valid synset relations for wordnets (see wn.constants.SYNSET_RELATIONS), those used for describing is-a taxonomies are given special treatment and they are generally the most well-developed relations in any wordnet. Typically these are the hypernym and hyponym relations, which encode is-a-type-of relationships (e.g., a hermit crab is a type of decapod, which is a type of crustacean, etc.). They also include instance_hypernym and instance_hyponym, which encode is-an-instance-of relationships (e.g., Oregon is an instance of American state).

The taxonomy forms a multiply-inheriting hierarchy with the synsets as nodes. In the English wordnets, such as the Princeton WordNet and its derivatives, nearly all nominal synsets form such a hierarchy with single root node, while verbal synsets form many smaller hierarchies without a common root. Other wordnets may have different properties, but as many are based off of the Princeton WordNet, they tend to follow this structure.

Functions to find paths within the taxonomies form the basis of all wordnet similarity measures. For instance, the Leacock-Chodorow Similarity measure uses both shortest_path() and (indirectly) taxonomy_depth().

Wordnet-level Functions

Root and leaf synsets in the taxonomy are those with no ancestors (hypernym, instance_hypernym, etc.) or hyponyms (hyponym, instance_hyponym, etc.), respectively.

Finding root and leaf synsets

wn.taxonomy.roots(wordnet, pos=None)

Return the list of root synsets in wordnet.

Parameters
  • wordnet (Wordnet) – The wordnet from which root synsets are found.

  • pos (Optional[str]) – If given, only return synsets with the specified part of speech.

Return type

List[Synset]

Example

>>> import wn, wn.taxonomy
>>> ewn = wn.Wordnet('ewn:2020')
>>> len(wn.taxonomy.roots(ewn, pos='v'))
573
wn.taxonomy.leaves(wordnet, pos=None)

Return the list of leaf synsets in wordnet.

Parameters
  • wordnet (Wordnet) – The wordnet from which leaf synsets are found.

  • pos (Optional[str]) – If given, only return synsets with the specified part of speech.

Return type

List[Synset]

Example

>>> import wn, wn.taxonomy
>>> ewn = wn.Wordnet('ewn:2020')
>>> len(wn.taxonomy.leaves(ewn, pos='v'))
10525

Computing the taxonomy depth

The taxonomy depth is the maximum depth from a root node to a leaf node within synsets for a particular part of speech.

wn.taxonomy.taxonomy_depth(wordnet, pos)

Return the list of leaf synsets in wordnet.

Parameters
  • wordnet (Wordnet) – The wordnet for which the taxonomy depth will be calculated.

  • pos (str) – The part of speech for which the taxonomy depth will be calculated.

Return type

int

Example

>>> import wn, wn.taxonomy
>>> ewn = wn.Wordnet('ewn:2020')
>>> wn.taxonomy.taxonomy_depth(ewn, 'n')
19

Synset-level Functions

wn.taxonomy.hypernym_paths(synset, simulate_root=False)

Return the list of hypernym paths to a root synset.

Parameters
  • synset (Synset) – The starting synset for paths to a root.

  • simulate_root (bool) – If True, find the path to a simulated root node.

Return type

List[List[Synset]]

Example

>>> import wn, wn.taxonomy
>>> dog = wn.synsets('dog', pos='n')[0]
>>> for path in wn.taxonomy.hypernym_paths(dog):
...     for i, ss in enumerate(path):
...         print(' ' * i, ss, ss.lemmas()[0])
...
 Synset('pwn-02083346-n') canine
  Synset('pwn-02075296-n') carnivore
   Synset('pwn-01886756-n') eutherian mammal
    Synset('pwn-01861778-n') mammalian
     Synset('pwn-01471682-n') craniate
      Synset('pwn-01466257-n') chordate
       Synset('pwn-00015388-n') animal
        Synset('pwn-00004475-n') organism
         Synset('pwn-00004258-n') animate thing
          Synset('pwn-00003553-n') unit
           Synset('pwn-00002684-n') object
            Synset('pwn-00001930-n') physical entity
             Synset('pwn-00001740-n') entity
 Synset('pwn-01317541-n') domesticated animal
  Synset('pwn-00015388-n') animal
   Synset('pwn-00004475-n') organism
    Synset('pwn-00004258-n') animate thing
     Synset('pwn-00003553-n') unit
      Synset('pwn-00002684-n') object
       Synset('pwn-00001930-n') physical entity
        Synset('pwn-00001740-n') entity
wn.taxonomy.min_depth(synset, simulate_root=False)

Return the minimum taxonomy depth of the synset.

Parameters
  • synset (Synset) – The starting synset for paths to a root.

  • simulate_root (bool) – If True, find the depth to a simulated root node.

Return type

int

Example

>>> import wn, wn.taxonomy
>>> dog = wn.synsets('dog', pos='n')[0]
>>> wn.taxonomy.min_depth(dog)
8
wn.taxonomy.max_depth(synset, simulate_root=False)

Return the maximum taxonomy depth of the synset.

Parameters
  • synset (Synset) – The starting synset for paths to a root.

  • simulate_root (bool) – If True, find the depth to a simulated root node.

Return type

int

Example

>>> import wn, wn.taxonomy
>>> dog = wn.synsets('dog', pos='n')[0]
>>> wn.taxonomy.max_depth(dog)
13
wn.taxonomy.shortest_path(synset, other, simulate_root=False)

Return the shortest path from synset to the other synset.

Parameters
  • other (Synset) – endpoint synset of the path

  • simulate_root (bool) – if True, ensure any two synsets are always connected by positing a fake root node

  • synset (Synset) –

Return type

List[Synset]

Example

>>> import wn, wn.taxonomy
>>> dog = ewn.synsets('dog', pos='n')[0]
>>> squirrel = ewn.synsets('squirrel', pos='n')[0]
>>> for ss in wn.taxonomy.shortest_path(dog, squirrel):
...     print(ss.lemmas())
...
['canine', 'canid']
['carnivore']
['eutherian mammal', 'placental', 'placental mammal', 'eutherian']
['rodent', 'gnawer']
['squirrel']
wn.taxonomy.common_hypernyms(synset, other, simulate_root=False)

Return the common hypernyms for the current and other synsets.

Parameters
  • other (Synset) – synset that is a hyponym of any shared hypernyms

  • simulate_root (bool) – if True, ensure any two synsets always share a hypernym by positing a fake root node

  • synset (Synset) –

Return type

List[Synset]

Example

>>> import wn, wn.taxonomy
>>> dog = ewn.synsets('dog', pos='n')[0]
>>> squirrel = ewn.synsets('squirrel', pos='n')[0]
>>> for ss in wn.taxonomy.common_hypernyms(dog, squirrel):
...     print(ss.lemmas())
...
['entity']
['physical entity']
['object', 'physical object']
['unit', 'whole']
['animate thing', 'living thing']
['organism', 'being']
['fauna', 'beast', 'animate being', 'brute', 'creature', 'animal']
['chordate']
['craniate', 'vertebrate']
['mammalian', 'mammal']
['eutherian mammal', 'placental', 'placental mammal', 'eutherian']
wn.taxonomy.lowest_common_hypernyms(synset, other, simulate_root=False)

Return the common hypernyms furthest from the root.

Parameters
  • other (Synset) – synset that is a hyponym of any shared hypernyms

  • simulate_root (bool) – if True, ensure any two synsets always share a hypernym by positing a fake root node

  • synset (Synset) –

Return type

List[Synset]

Example

>>> import wn, wn.taxonomy
>>> dog = ewn.synsets('dog', pos='n')[0]
>>> squirrel = ewn.synsets('squirrel', pos='n')[0]
>>> len(wn.taxonomy.lowest_common_hypernyms(dog, squirrel))
1
>>> wn.taxonomy.lowest_common_hypernyms(dog, squirrel)[0].lemmas()
['eutherian mammal', 'placental', 'placental mammal', 'eutherian']