FAQ¶

Is Wn compatible with the NLTK's module?¶

The API is intentionally similar, but not exactly the same (for instance see the next question), and there are differences in the ways that results are retrieved, particularly for non-English wordnets. See Migrating from the NLTK for more information. Also see Where is the Princeton WordNet data?.

Where are the `Lemma` objects? What are `Word` and `Sense` objects?¶

Unlike the original WNDB data format of the original WordNet, the WN-LMF XML format grants words (called lexical entries in WN-LMF and a Word object in Wn) and word senses (Sense in Wn) explicit, first-class status alongside synsets. While senses are essentially links between words and synsets, they may contain metadata and be the source or target of sense relations, so in some ways they are more like nodes than edges when the wordnet is viewed as a graph. The NLTK's module, using the WNDB format, combines the information of a word and a sense into a single object called a Lemmas. Wn also has an unrelated concept called a lemma(), but it is merely the canonical form of a word.

Where is the Princeton WordNet data?¶

The original English wordnet, named simply WordNet but often referred to as the Princeton WordNet to better distinguish it from other projects, is specifically the data distributed by Princeton in the WNDB format. The Open Multilingual Wordnet (OMW) packages an export of the WordNet data as the OMW English Wordnet based on WordNet 3.0 which is used by Wn (with the lexicon ID omw-en). It also has a similar export for WordNets 1.5, 1.6, 1.7, 1.7.1, 2.0, 2.1, and 3.1 data (omw-en15, omw-en16, omw-en17, omw-en171, omw-en20, omw-en21, and omw-en31, respectively). All of these are highly compatible with the original data and can be used as drop-in replacements.

Prior to Wn version 0.9 (and, correspondingly, prior to the OMW data version 1.4), the pwn:3.0 and pwn:3.1 English wordnets distributed by OMW were incorrectly called the Princeton WordNet (for WordNet 3.0 and 3.1, respectively). From Wn version 0.9 (and from version 1.4 of the OMW data), these are called the OMW English Wordnet based on WordNet 3.0/3.1 (omw-en:1.4 and omw-en31:1.4, respectively). These lexicons are intentionally compatible with the original WordNet data, and the 1.4 versions are even more compatible than the previous pwn:3.0 and pwn:3.1 lexicons, so it is strongly recommended to use them over the previous versions. Similarly, the 2.0 version of OMW is more compatible yet. The data corresponding to WordNet versions 1.5 through 2.1 are only available from OMW 2.0.

Why does Wn's database get so big?¶

The OMW English Wordnet based on WordNet 3.0 takes about 114 MiB of disk space in Wn's database, which is only about 8 MiB more than it takes as a WN-LMF XML file. The NLTK, however, uses the obsolete WNDB format which is more compact, requiring only 35 MiB of disk space. The difference with the Open Multilingual Wordnet 1.4 is more striking: it takes about 659 MiB of disk space in the database, but only 49 MiB in the NLTK. Part of the difference here is that the OMW files in the NLTK are simple tab-separated-value files listing only the words added to each synset for each language. In addition, Wn creates new synsets for each wordnet added (see the previous question). One more reason is that Wn creates various indexes in the database for efficient lookup.

[VOSSEN1998]

Piek Vossen. 1998. Introduction to EuroWordNet. Computers and the Humanities, 32(2): 73–89.

FAQ¶

Is Wn related to the NLTK's nltk.corpus.wordnet module?¶

Is Wn compatible with the NLTK's module?¶

Where are the Lemma objects? What are Word and Sense objects?¶

Where is the Princeton WordNet data?¶

Why don't all wordnets share the same synsets?¶

Why does Wn's database get so big?¶

Where are the `Lemma` objects? What are `Word` and `Sense` objects?¶