wn¶
Wordnet Interface.
Project Management Functions¶
- wn.download(project_or_url, add=True, progress_handler=<class 'wn.util.ProgressBar'>)¶
Download the resource specified by project_or_url.
First the URL of the resource is determined and then, depending on the parameters, the resource is downloaded and added to the database. The function then returns the path of the cached file.
If project_or_url starts with 'http://' or 'https://', then it is taken to be the URL for the resource. Otherwise, project_or_url is taken as a project specifier and the URL is taken from a matching entry in Wn's project index. If no project matches the specifier,
wn.Error
is raised.If the URL has been downloaded and cached before, the cached file is used. Otherwise the URL is retrieved and stored in the cache.
If the add paramter is
True
(default), the downloaded resource is added to the database.>>> wn.download('ewn:2020') Added ewn:2020 (English WordNet)
The progress_handler parameter takes a subclass of
wn.util.ProgressHandler
. An instance of the class will be created, used, and closed by this function.- Parameters
project_or_url (str) –
add (bool) –
progress_handler (Optional[Type[wn.util.ProgressHandler]]) –
- Return type
- wn.add(source, progress_handler=<class 'wn.util.ProgressBar'>)¶
Add the LMF file at source to the database.
The file at source may be gzip-compressed or plain text XML.
>>> wn.add('english-wordnet-2020.xml') Added ewn:2020 (English WordNet)
The progress_handler parameter takes a subclass of
wn.util.ProgressHandler
. An instance of the class will be created, used, and closed by this function.- Parameters
source (Union[str, pathlib.Path]) –
progress_handler (Optional[Type[wn.util.ProgressHandler]]) –
- Return type
None
- wn.remove(lexicon, progress_handler=<class 'wn.util.ProgressBar'>)¶
Remove lexicon(s) from the database.
The lexicon argument is a lexicon specifier. Note that this removes a lexicon and not a project, so the lexicons of projects containing multiple lexicons will need to be removed individually or, if applicable, a star specifier.
The progress_handler parameter takes a subclass of
wn.util.ProgressHandler
. An instance of the class will be created, used, and closed by this function.>>> wn.remove('ewn:2019') # removes a single lexicon >>> wn.remove('*:1.3+omw') # removes all lexicons with version 1.3+omw
- Parameters
lexicon (str) –
progress_handler (Optional[Type[wn.util.ProgressHandler]]) –
- Return type
None
- wn.export(lexicons, destination, version='1.0')¶
Export lexicons from the database to a WN-LMF file.
More than one lexicon may be exported in the same file, subject to these conditions:
identifiers on wordnet entities must be unique in all lexicons
lexicons extensions may not be exported with their dependents
>>> w = wn.Wordnet(lexicon='cmnwn zsmwn') >>> wn.export(w.lexicons(), 'cmn-zsm.xml')
- Parameters
lexicons (Sequence[wn.Lexicon]) – sequence of
wn.Lexicon
objectsdestination (Union[str, pathlib.Path]) – path to the destination file
version (str) – LMF version string
- Return type
None
- wn.projects()¶
Return the list of indexed projects.
This returns the same dictionaries of information as
wn.config.get_project_info
, but for all indexed projects.Example
>>> infos = wn.projects() >>> len(infos) 36 >>> infos[0]['label'] 'Open English WordNet'
Wordnet Query Functions¶
- wn.word(id, *, lexicon=None, lang=None)¶
Return the word with id in lexicon.
This will create a
Wordnet
object using the lang and lexicon arguments. The id argument is then passed to theWordnet.word()
method.>>> wn.word('ewn-cell-n') Word('ewn-cell-n')
- wn.words(form=None, pos=None, *, lexicon=None, lang=None)¶
Return the list of matching words.
This will create a
Wordnet
object using the lang and lexicon arguments. The remaining arguments are passed to theWordnet.words()
method.>>> len(wn.words()) 282902 >>> len(wn.words(pos='v')) 34592 >>> wn.words(form="scurry") [Word('ewn-scurry-n'), Word('ewn-scurry-v')]
- wn.sense(id, *, lexicon=None, lang=None)¶
Return the sense with id in lexicon.
This will create a
Wordnet
object using the lang and lexicon arguments. The id argument is then passed to theWordnet.sense()
method.>>> wn.sense('ewn-flutter-v-01903884-02') Sense('ewn-flutter-v-01903884-02')
- wn.senses(form=None, pos=None, *, lexicon=None, lang=None)¶
Return the list of matching senses.
This will create a
Wordnet
object using the lang and lexicon arguments. The remaining arguments are passed to theWordnet.senses()
method.>>> len(wn.senses('twig')) 3 >>> wn.senses('twig', pos='n') [Sense('ewn-twig-n-13184889-02')]
- wn.synset(id, *, lexicon=None, lang=None)¶
Return the synset with id in lexicon.
This will create a
Wordnet
object using the lang and lexicon arguments. The id argument is then passed to theWordnet.synset()
method.>>> wn.synset('ewn-03311152-n') Synset('ewn-03311152-n')
- wn.synsets(form=None, pos=None, ili=None, *, lexicon=None, lang=None)¶
Return the list of matching synsets.
This will create a
Wordnet
object using the lang and lexicon arguments. The remaining arguments are passed to theWordnet.synsets()
method.>>> len(wn.synsets('couch')) 4 >>> wn.synsets('couch', pos='v') [Synset('ewn-00983308-v')]
- wn.ili(id, *, lexicon=None, lang=None)¶
Return the interlingual index with id.
This will create a
Wordnet
object using the lang and lexicon arguments. The id argument is then passed to theWordnet.ili()
method.>>> wn.ili(id='i1234') ILI('i1234') >>> wn.ili(id='i1234').status 'presupposed'
- wn.ilis(status=None, *, lexicon=None, lang=None)¶
Return the list of matching interlingual indices.
This will create a
Wordnet
object using the lang and lexicon arguments. The remaining arguments are passed to theWordnet.ilis()
method.>>> len(wn.ilis()) 120071 >>> len(wn.ilis(status='proposed')) 2573 >>> wn.ilis(status='proposed')[-1].definition() 'the neutrino associated with the tau lepton.' >>> len(wn.ilis(lang='de')) 13818
The Wordnet Class¶
- class wn.Wordnet(lexicon=None, *, lang=None, expand=None, normalizer=<function normalize_form>, lemmatizer=None, search_all_forms=True)¶
Class for interacting with wordnet data.
A wordnet object acts essentially as a filter by first selecting matching lexicons and then searching only within those lexicons for later queries. On instantiation, a lang argument is a BCP 47 language code that restricts the selected lexicons to those whose language matches the given code. A lexicon argument is a space-separated list of lexicon specifiers that more directly selects lexicons by their ID and version; this is preferable when there are multiple lexicons in the same language or multiple version with the same ID.
Some wordnets were created by translating the words from a larger wordnet, namely the Princeton WordNet, and then relying on the larger wordnet for structural relations. An expand argument is a second space-separated list of lexicon specifiers which are used for traversing relations, but not as the results of queries. Setting expand to an empty string (
expand=''
) disables expand lexicons.The normalizer argument takes a callable that normalizes word forms in order to expand the search. The default function downcases the word and removes diacritics via NFKD normalization so that, for example, searching for san josé in the English WordNet will find the entry for San Jose. Setting normalizer to
None
disables normalization and forces exact-match searching.The lemmatizer argument may be
None
, which is the default and disables lemmatizer-based query expansion, or a callable that takes a word form and optional part of speech and returns base forms of the original word. To support lemmatizers that use the wordnet for instantiation, such aswn.morphy
, the lemmatizer may be assigned to thelemmatizer
attribute after creation.If the search_all_forms argument is
True
(the default), searches of word forms consider all forms in the lexicon; ifFalse
, only lemmas are searched. Non-lemma forms may include, depending on the lexicon, morphological exceptions, alternate scripts or spellings, etc.- Parameters
- lemmatizer¶
A lemmatization function or
None
.
- word(id)¶
Return the first word in this wordnet with identifier id.
- words(form=None, pos=None)¶
Return the list of matching words in this wordnet.
Without any arguments, this function returns all words in the wordnet's selected lexicons. A form argument restricts the words to those matching the given word form, and pos restricts words by their part of speech.
- sense(id)¶
Return the first sense in this wordnet with identifier id.
- senses(form=None, pos=None)¶
Return the list of matching senses in this wordnet.
Without any arguments, this function returns all senses in the wordnet's selected lexicons. A form argument restricts the senses to those whose word matches the given word form, and pos restricts senses by their word's part of speech.
- synset(id)¶
Return the first synset in this wordnet with identifier id.
- synsets(form=None, pos=None, ili=None)¶
Return the list of matching synsets in this wordnet.
Without any arguments, this function returns all synsets in the wordnet's selected lexicons. A form argument restricts synsets to those whose member words match the given word form. A pos argument restricts synsets to those with the given part of speech. An ili argument restricts synsets to those with the given interlingual index; generally this should select a unique synset within a single lexicon.
- ili(id)¶
Return the first ILI in this wordnet with identifer id.
- ilis(status=None)¶
Return the list of ILIs in this wordnet.
If status is given, only return ILIs with a matching status.
- lexicons()¶
Return the list of lexicons covered by this wordnet.
- Return type
- expanded_lexicons()¶
Return the list of expand lexicons for this wordnet.
- Return type
- describe()¶
Return a formatted string describing the lexicons in this wordnet.
Example
>>> oewn = wn.Wordnet('oewn:2021') >>> print(oewn.describe()) Primary lexicons: oewn:2021 Label : Open English WordNet URL : https://github.com/globalwordnet/english-wordnet License: https://creativecommons.org/licenses/by/4.0/ Words : 163161 (a: 8386, n: 123456, r: 4481, s: 15231, v: 11607) Senses : 211865 Synsets: 120039 (a: 7494, n: 84349, r: 3623, s: 10727, v: 13846) ILIs : 120039
- Return type
The Word Class¶
- class wn.Word(id, pos, forms, _lexid=0, _id=0, _wordnet=None)¶
A class for words (also called lexical entries) in a wordnet.
- Parameters
- id¶
The identifier used within a lexicon.
- pos¶
The part of speech of the Word.
- lemma()¶
Return the canonical form of the word.
Example
>>> wn.words('wolves')[0].lemma() 'wolf'
- Return type
- forms()¶
Return the list of all encoded forms of the word.
Example
>>> wn.words('wolf')[0].forms() ['wolf', 'wolves']
- senses()¶
Return the list of senses of the word.
Example
>>> wn.words('zygoma')[0].senses() [Sense('ewn-zygoma-n-05292350-01')]
- synsets()¶
Return the list of synsets of the word.
Example
>>> wn.words('addendum')[0].synsets() [Synset('ewn-06411274-n')]
- derived_words()¶
Return the list of words linked through derivations on the senses.
Example
>>> wn.words('magical')[0].derived_words() [Word('ewn-magic-n'), Word('ewn-magic-n')]
- translate(lexicon=None, *, lang=None)¶
Return a mapping of word senses to lists of translated words.
- Parameters
- Return type
Example
>>> w = wn.words('water bottle', pos='n')[0] >>> for sense, words in w.translate(lang='ja').items(): ... print(sense, [jw.lemma() for jw in words]) ... Sense('ewn-water_bottle-n-04564934-01') ['水筒']
The Form Class¶
- class wn.Form¶
The return value of
Word.lemma()
and the members of the list returned byWord.forms()
areForm
objects. These are a basic subclass of Python'sstr
class with an additional attribute,script
, and methodspronunciations()
andtags()
. Form objects without any specified script behave exactly as a regular string (they are equal and hash to the same value), but if two Form objects are compared and they have different script values, then they are unequal and hash differently, even if the string itself is identical. When comparing a Form object to a regular string, the script value is ignored.>>> inu = wn.words('犬', lexicon='wnja')[0] >>> inu.forms()[3] 'いぬ' >>> inu.forms()[3].script 'hira'
The
script
is often unspecified (i.e.,None
) and this carries the implicit meaning that the form uses the canonical script for the word's language or wordnet, whatever it may be.- pronunciations()¶
Return the list of
Pronunciation
objects.
The Pronunciation Class¶
- class wn.Pronunciation(value, variety=None, notation=None, phonemic=True, audio=None)¶
A class for word form pronunciations.
- Parameters
- value¶
The encoded pronunciation.
- variety¶
The language variety this pronunciation belongs to.
- notation¶
The notation used to encode the pronunciation. For example: the International Phonetic Alphabet (IPA).
- phonemic¶
True
when the encoded pronunciation is a generalized phonemic description, orFalse
for more precise phonetic transcriptions.
- audio¶
A URI to an associated audio file.
The Tag Class¶
The Sense Class¶
- class wn.Sense(id, entry_id, synset_id, _lexid=0, _id=0, _wordnet=None)¶
Class for modeling wordnet senses.
- Parameters
- id¶
The identifier used within a lexicon.
- word()¶
Return the word of the sense.
Example
>>> wn.senses('spigot')[0].word() Word('pwn-spigot-n')
- Return type
- synset()¶
Return the synset of the sense.
Example
>>> wn.senses('spigot')[0].synset() Synset('pwn-03325088-n')
- Return type
- adjposition()¶
Return the adjective position of the sense.
Values include
"a"
(attributive),"p"
(predicative), and"ip"
(immediate postnominal). Note that this is only relevant for adjectival senses. Senses for other parts of speech, or for adjectives that are not annotated with this feature, will returnNone
.
- relations(*args)¶
Return a mapping of relation names to lists of senses.
One or more relation names may be given as positional arguments to restrict the relations returned. If no such arguments are given, all relations starting from the sense are returned.
See
get_related()
for getting a flat list of related senses.
Return a list of related senses.
One or more relation types should be passed as arguments which determine the kind of relations returned.
Example
>>> physics = wn.senses('physics', lexicon='ewn')[0] >>> for sense in physics.get_related('has_domain_topic'): ... print(sense.word().lemma()) ... coherent chaotic incoherent
- relation_paths(*args, end=None)¶
- translate(lexicon=None, *, lang=None)¶
Return a list of translated senses.
- Parameters
- Return type
Example
>>> en = wn.senses('petiole', lang='en')[0] >>> pt = en.translate(lang='pt')[0] >>> pt.word().lemma() 'pecíolo'
The Count Class¶
- class wn.Count(value, _id=0)¶
A count of sense occurrences in some corpus.
Some wordnets store computed counts of senses across some corpus or corpora. This class models those counts. It is a subtype of
int
with one additional method,metadata()
, which may be used to give information about the source of the count (if provided by the wordnet).- Parameters
_id (int) –
The Synset Class¶
- class wn.Synset(id, pos, ili=None, _lexid=0, _id=0, _wordnet=None)¶
Class for modeling wordnet synsets.
- Parameters
- id¶
The identifier used within a lexicon.
- pos¶
The part of speech of the Synset.
- ili¶
The interlingual index of the Synset.
- definition()¶
Return the first definition found for the synset.
Example
>>> wn.synsets('cartwheel', pos='n')[0].definition() 'a wheel that has wooden spokes and a metal rim'
- examples()¶
Return the list of examples for the synset.
Example
>>> wn.synsets('orbital', pos='a')[0].examples() ['"orbital revolution"', '"orbital velocity"']
- senses()¶
Return the list of sense members of the synset.
Example
>>> wn.synsets('umbrella', pos='n')[0].senses() [Sense('ewn-umbrella-n-04514450-01')]
- words()¶
Return the list of words linked by the synset's senses.
Example
>>> wn.synsets('exclusive', pos='n')[0].words() [Word('ewn-scoop-n'), Word('ewn-exclusive-n')]
- lemmas()¶
Return the list of lemmas of words for the synset.
Example
>>> wn.synsets('exclusive', pos='n')[0].words() ['scoop', 'exclusive']
- hypernyms()¶
Return the list of synsets related by any hypernym relation.
Both the
hypernym
andinstance_hypernym
relations are traversed.
- hyponyms()¶
Return the list of synsets related by any hyponym relation.
Both the
hyponym
andinstance_hyponym
relations are traversed.
- holonyms()¶
Return the list of synsets related by any holonym relation.
Any of the following relations are traversed:
holonym
,holo_location
,holo_member
,holo_part
,holo_portion
,holo_substance
.
- meronyms()¶
Return the list of synsets related by any meronym relation.
Any of the following relations are traversed:
meronym
,mero_location
,mero_member
,mero_part
,mero_portion
,mero_substance
.
- relations(*args)¶
Return a mapping of relation names to lists of synsets.
One or more relation names may be given as positional arguments to restrict the relations returned. If no such arguments are given, all relations starting from the synset are returned.
See
get_related()
for getting a flat list of related synsets.Example
>>> button_rels = wn.synsets('button')[0].relations() >>> for relname, sslist in button_rels.items(): ... print(relname, [ss.lemmas() for ss in sslist]) ... hypernym [['fixing', 'holdfast', 'fastener', 'fastening']] hyponym [['coat button'], ['shirt button']]
Return the list of related synsets.
One or more relation names may be given as positional arguments to restrict the relations returned. If no such arguments are given, all relations starting from the synset are returned.
This method does not preserve the relation names that lead to the related synsets. For a mapping of relation names to related synsets, see
relations()
.Example
>>> fulcrum = wn.synsets('fulcrum')[0] >>> [ss.lemmas() for ss in fulcrum.get_related()] [['pin', 'pivot'], ['lever']]
- relation_paths(*args, end=None)¶
- translate(lexicon=None, *, lang=None)¶
Return a list of translated synsets.
- Parameters
- Return type
Example
>>> es = wn.synsets('araña', lang='es')[0] >>> en = es.translate(lexicon='ewn')[0] >>> en.lemmas() ['spider']
- hypernym_paths(simulate_root=False)¶
Shortcut for
wn.taxonomy.hypernym_paths()
.
- min_depth(simulate_root=False)¶
Shortcut for
wn.taxonomy.min_depth()
.
- max_depth(simulate_root=False)¶
Shortcut for
wn.taxonomy.max_depth()
.
- shortest_path(other, simulate_root=False)¶
Shortcut for
wn.taxonomy.shortest_path()
.
- common_hypernyms(other, simulate_root=False)¶
Shortcut for
wn.taxonomy.common_hypernyms()
.
- lowest_common_hypernyms(other, simulate_root=False)¶
Shortcut for
wn.taxonomy.lowest_common_hypernyms()
.
The ILI Class¶
- class wn.ILI(id, status, definition=None, _id=0)¶
A class for interlingual indices.
- id¶
The interlingual index identifier. Unlike
id
attributes forWord
,Sense
, andSynset
, ILI identifers may beNone
(see the proposedstatus
).
- status¶
The known status of the interlingual index. Loading an interlingual index into the database provides the following explicit, authoritative status values:
active
– the ILI is in useprovisional
– the ILI is being staged for permanent inclusiondeprecated
– the ILI is, or should be, no longer in use
Without an interlingual index loaded, ILIs present in loaded lexicons get an implicit, temporary status from the following:
presupposed
– a synset uses the ILI, assuming it exists in an ILI fileproposed
– a synset introduces a concept not yet in an ILI and is suggesting that one should be added for it in the future
The Lexicon Class¶
- class wn.Lexicon(id, label, language, email, license, version, url=None, citation=None, logo=None, _id=0)¶
A class representing a wordnet lexicon.
- Parameters
- id¶
The lexicon's identifier.
- label¶
The full name of lexicon.
- language¶
The BCP 47 language code of lexicon.
- email¶
The email address of the wordnet maintainer.
- license¶
The URL or name of the wordnet's license.
- version¶
The version string of the resource.
- url¶
The project URL of the wordnet.
- citation¶
The canonical citation for the project.
- logo¶
A URL or path to a project logo.
- requires()¶
Return the lexicon dependencies.
- Return type
- extends()¶
Return the lexicon this lexicon extends, if any.
If this lexicon is not an extension, return None.
- Return type
- extensions(depth=1)¶
Return the list of lexicons extending this one.
By default, only direct extensions are included. This is controlled by the depth parameter, which if you view extensions as children in a tree where the current lexicon is the root, depth=1 are the immediate extensions. Increasing this number gets extensions of extensions, or setting it to a negative number gets all "descendant" extensions.
- Parameters
depth (int) –
- Return type
- describe(full=True)¶
Return a formatted string describing the lexicon.
The full argument (default:
True
) may be set toFalse
to omit word and sense counts.Also see:
Wordnet.describe()
The wn.config Object¶
Wn's data storage and retrieval can be configured through the
wn.config
object.
See also
Installation and Configuration describes how to configure Wn using the
wn.config
instance.
- wn.config = <wn._config.WNConfig object>¶
It is an instance of the WNConfig
class, which is
defined in a non-public module and is not meant to be instantiated
directly. Configuration should occur through the single
wn.config
instance.
- class wn._config.WNConfig¶
- data_directory¶
The file system directory where Wn's data is stored.
- database_path¶
The path to the database file.
- allow_multithreading¶
If set to
True
, the database connection may be shared across threads. In this case, it is the user's responsibility to ensure that multiple threads don't try to write to the database at the same time. The default isFalse
.
- downloads_directory¶
The file system directory where downloads are cached.
- add_project(id, type='wordnet', label=None, language=None, license=None, error=None)¶
Add a new wordnet project to the index.
- Parameters
id (str) – short identifier of the project
type (str) – project type (default 'wordnet')
language (Optional[str]) – BCP 47 language code of the resource
license (Optional[str]) – link or name of the project's default license
error (Optional[str]) – if set, the error message to use when the project is accessed
- Return type
None
- add_project_version(id, version, url=None, error=None, license=None)¶
Add a new resource version for a project.
Exactly one of url or error must be specified.
- Parameters
id (str) – short identifier of the project
version (str) – version string of the resource
url (Optional[str]) – space-separated list of web addresses for the resource
license (Optional[str]) – link or name of the resource's license; if not given, the project's default license will be used.
error (Optional[str]) – if set, the error message to use when the project is accessed
- Return type
None
- get_project_info(arg)¶
Return information about an indexed project version.
If the project has been downloaded and cached, the
"cache"
key will point to the path of the cached file, otherwise its value isNone
.Example
>>> info = wn.config.get_project_info('oewn:2021') >>> info['label'] 'Open English WordNet'
- get_cache_path(url)¶
Return the path for caching url.
Note that in general this is just a path operation and does not signify that the file exists in the file system.
- Parameters
url (str) –
- Return type
- update(data)¶
Update the configuration with items in data.
Items are only inserted or replaced, not deleted. If a project index is provided in the
"index"
key, then either the project must not already be indexed or any project fields (label, language, or license) that are specified must be equal to the indexed project.- Parameters
data (dict) –
- Return type
None
- load_index(path)¶
Load and update with the project index at path.
The project index is a TOML file containing project and version information. For example:
[ewn] label = "Open English WordNet" language = "en" license = "https://creativecommons.org/licenses/by/4.0/" [ewn.versions.2019] url = "https://en-word.net/static/english-wordnet-2019.xml.gz" [ewn.versions.2020] url = "https://en-word.net/static/english-wordnet-2020.xml.gz"
- Parameters
path (Union[str, pathlib.Path]) –
- Return type
None
Exceptions¶
- exception wn.Error¶
Generic error class for invalid wordnet operations.
- exception wn.DatabaseError¶
Error class for issues with the database.
- exception wn.WnWarning¶
Generic warning class for dubious worndet operations.