The Structure of a Wordnet

A wordnet is an online lexicon which is organized by concepts.

The basic unit of a wordnet is the synonym set (synset), a group of words that all refer to the same concept. Words and synsets are linked by means of conceptual-semantic relations to form the structure of wordnet.

Words, Senses, and Synsets

We all know that words are the basic building blocks of languages, a word is built up with two parts, its form and its meaning, but in natural languages, the word form and word meaning are not in an elegant one-to-one match, one word form may connect to many different meanings, so hereforth, we need senses, to work as the unit of word meanings, for example, the word bank has at least two senses:

  1. bank1: financial institution, like City Bank;

  2. bank2: sloping land, like river bank;

Since synsets are group of words sharing the same concept, bank1and bank2are members of two different synsets, although they have the same word form.

On the other hand, different word forms may also convey the same concept, such as cab and taxi, these word forms with the same concept are grouped together into one synset.

image/svg+xml sense sense2 word synset A synset B word word sense sense1 sense

Figure: relations between words, senses and synsets

Synset Relations

In wordnet, synsets are linked with each other to form various kinds of relations. For example, if the concept expressed by a synset is more general than a given synset, then it is in a hypernym relation with the given synset. As shown in the figure below, the synset with car, auto and automobile as its member is the hypernym of the other synset with cab, taxi and hack. Such relation which is built on the synset level is categorized as synset relations.

image/svg+xml cab taxi hack car auto automobile synset synset hypernym

Figure: example of synset relations

Sense Relations

Some relations in wordnet are also built on sense level, which can be further divided into two types, relations that link sense with another sense, and relations that link sense with another synset.

Note

In wordnet, synset relation and sense relation can both employ a particular relation type, such as domain topic.

Sense-Sense

Sense to sense relations emphasize the connections between different senses, especially when dealing with morphologically related words. For example, behavioral is the adjective to the noun behavior, which is known as in the pertainym relation with behavior, however, such relation doesn't exist between behavioral and conduct, which is a synonym of behavior and is in the same synset. Here pertainym is a sense-sense relation.

image/svg+xml behavioral conduct behavior synset synset pertainym

Figure: example of sense-sense relations

Sense-Synset

Sense-synset relations connect a particular sense with a synset. For example, cursor is a term in the computer science discipline, in wordnet, it is in the has domain topic relation with the computer science synset, but pointer, which is in the same synset with cursor, is not a term, thus has no such relation with computer science synset.

image/svg+xml pointer cursor computing computer science synset synset has domain topic

Figure: example of sense-synset relations

Other Information

A wordnet should be built in an appropriate form, two schemas are accepted:

  • XML schema based on the Lexical Markup Framework (LMF)

  • JSON-LD using the Lexicon Model for Ontologies

The structure of a wordnet should contain below info:

Definition

Definition is used to define senses and synsets in a wordnet, it is given in the language of the wordnet it came from.

Example

Example is used to clarify the senses and synsets in a wordnet, users can understand the definition more clearly with a given example.

Metadata

A wordnet has its own metadata, based on the Dublin Core, to state the basic info of it, below table lists all the items in the metadata of a wordnet:

contributor

Optional

str

coverage

Optional

str

creator

Optional

str

date

Optional

str

description

Optional

str

format

Optional

str

identifier

Optional

str

publisher

Optional

str

relation

Optional

str

rights

Optional

str

source

Optional

str

subject

Optional

str

title

Optional

str

type

Optional

str

status

Optional

str

note

Optional

str

confidence

Optional

float