wn.compat.sensekey

Functions Related to Sense Keys

Sense keys are identifiers of senses that (mostly) persist across wordnet versions. They are only used by the English wordnets. For the OMW lexicons derived from the Princeton WordNet and the EWN 2019/2020 lexicons, the sense key is encoded in the identifier metadata of a Sense:

>>> import wn
>>> en = wn.Wordnet("omw-en:1.4")
>>> sense = en.sense("omw-en-carrousel-02966372-n")
>>> sense.metadata()
{'identifier': 'carrousel%1:06:01::'}

For OEWN 2021+ lexicons, the sense key is encoded in the sense ID, but some characters are escaped or replaced to ensure it is a valid XML ID.

>>> oewn = wn.Wordnet("oewn:2024")
>>> sense = oewn.sense("oewn-carousel__1.06.01..")
>>> sense.id
'oewn-carousel__1.06.01..'

This module has four functions:

  1. escape() transforms a sense key into a form that is valid for XML IDs. The flavor keyword argument specifies the escaping mechanism and it defaults to "oewn", which is currently the only available flavor.

  2. unescape() transforms an escaped sense key back into the original form. The flavor keyword is the same as with escape().

  3. sense_key_getter() creates a function for retrieving the sense key for a given wn.Sense object. Depending on the lexicon, it will retrieve the sense key from metadata or it will unescape the sense ID.

  4. sense_getter() creates a function for retrieving a wn.Sense object given a sense key. Depending on the lexicon, it will build and use a mapping of sense key metadata to wn.Sense objects, or it will escape the sense key and use the escaped form as the id argument for wn.Wordnet.sense().

See also

The documentation from the Princeton WordNet: https://wordnet.princeton.edu/documentation/senseidx5wn

wn.compat.sensekey.escape(sense_key: str, /, flavor='oewn') str

Return an escaped sense key that is valid for XML IDs.

The flavor argument specifies how the escaping will be done. Its default (and currently only) value is "oewn", which escapes like the Open English Wordnet.

>>> from wn.compat import sensekey
>>> sensekey.escape("ceramic%3:01:00::")
'ceramic__3.01.00..'
wn.compat.sensekey.unescape(s: str, /, flavor='oewn') str

Return the original form of an escaped sense key.

The flavor argument specifies how the unescaping will be done. Its default (and currently only) value is "oewn", which escapes like the Open English Wordnet.

>>> from wn.compat import sensekey
>>> sensekey.unescape("ceramic__3.01.00..")
'ceramic%3:01:00::'

Note that this function does not remove any lexicon ID prefixes on sense IDs, so that may need to be done manually:

>>> sensekey.unescape("oewn-ceramic__3.01.00..")
'oewn-ceramic%3:01:00::'
>>> sensekey.unescape("oewn-ceramic__3.01.00..".removeprefix("oewn-"))
'ceramic%3:01:00::'
wn.compat.sensekey.sense_key_getter(lexicon: str) Callable[[Sense], str | None]

Return a function that gets sense keys from senses.

The lexicon argument determines how the function will retrieve the sense key; i.e., whether it is from the identifier metadata or unescaping the sense ID. For any unsupported lexicon, an error is raised.

The function that is returned accepts one argument, a wn.Sense (ideally from the same lexicon specified in the lexicon argument), and returns a str if the sense key exists in the lexicon or None otherwise.

>>> import wn
>>> from wn.compat import sensekey
>>> oewn = wn.Wordnet("oewn:2024")
>>> get_sense_key = sensekey.sense_key_getter("oewn:2024")
>>> get_sense_key(oewn.senses("alabaster")[0])
'alabaster%3:01:00::'

When unescaping a sense ID, if the ID starts with its lexicon's ID and a hyphen (e.g., "oewn-"), it is assumed to be a conventional ID prefix and is removed prior to unescaping.

wn.compat.sensekey.sense_getter(lexicon: str, wordnet: Wordnet | None = None) Callable[[str], Sense | None]

Return a function that gets the sense for a sense key.

The lexicon argument determines how the function will retrieve the sense; i.e., whether a mapping between a sense's identifier metadata and the sense will be created and used or the escaped sense key is used as the sense ID. For any unsupported lexicon, an error is raised.

The optional wordnet object is used as the source of the returned wn.Sense objects. If none is provided, a new wn.Wordnet object is created using the lexicon argument.

The function that is returned accepts one argument, a str of the sense key, and returns a wn.Sense if the sense key exists in the lexicon or None otherwise.

>>> import wn
>>> from wn.compat import sensekey
>>> get_sense = sensekey.sense_getter("oewn:2024")
>>> get_sense("alabaster%3:01:00::")
Sense('oewn-alabaster__3.01.00..')

Warning

The mapping built for the omw-en* or ewn lexicons requires significant memory—around 100MiB—to use. The oewn lexicons do not require such a mapping and the memory usage is negligible.