The RELEX network
Unitex works with dictionaries built by the members of the RELEX network, an international network of laboratories specialized in Computational Linguistics that was created by Maurice Gross and his LADL team. Most of the universities listed in the link page are members of this network.
Members of the RELEX network have built and are building exhaustive dictionaries of simple and compound words for French, English, Greek, Portuguese, Russian, Thai, Korean, Italian, Spanish, Norwegian, Arabic, German, Polish and more. They also build lexicon-grammar tables.
For more information about RELEX resources, please consult our bibliography.
All the dictionaries conform to the DELAF formalism. A DELAF dictionary is a text file, each line representing an entry. The line representing a word contains the inflected form of the word, the lemma of the word and some grammatical, semantic and inflectional information. Here is a sample of the English simple word dictionary:
DELAF dictionaries can contain both simple and compound words. Here is a sample of the English compound word dictionary:
Chamber of Commerce,.N+NPN+z1:s
To get more information about the DELAF formalism, please consult the manual.
The lexicon-grammar methodology was developed by Maurice Gross, according to the following principle: every verb has a specific set of arguments (i.e. subject and complements), to the point that this set is often unique. Hence, the syntactic properties of verbs, or rather of the elementary sentences defined for each verb, have to be systematically described. No system predicting sentence forms from semantic features could exist. The systematic description consists in matrices whose rows are verbs (i.e. elementary sentences) and columns are sentence forms into which verbs may enter or not. The sentence forms are the usual transformations of elementary sentences, often simple declarative forms. Matrices are binary: a "+" sign appears at the intersection of a given row and a given column when the verb in the row enters the structure represented in the given column, a minus sign appears in the opposite situation.
A lexicon of the 12,000 main verbs of French has been subdivided into about 50 classes (C. Leclère 1991). Each class has a particular matrix. The sentence forms number about 400, including pronominalisation, passivization, sentential complement reductions, and nominalizations with support verbs.
A lexicon of 25,000 elementary sentences with at least one frozen argument is also available. Their representation by binary matrices follows the same principles. Partial lexicons of sentences with support verbs (être, avoir, faire, etc.) and predicative nouns have also been built (J. Giry-Schneider 1978, 1987, A. Meunier 1977).
Resources distributed with Unitex
The resources included in Unitex are distributed under the LGPLLR license. According to this license, you can obtain readable versions of these resources. You can download them for English and French here. You can also use the Uncompress program included in Unitex>=2.1 to get the text version of binary dictionaries distributed with Unitex.
The latest Unitex package contains resources for many languages. Here is a brief presentation of these resources. THESE RESOURCES ARE NOT THE WHOLE DICTIONARIES. Please follow the links for more information.
|University Paris-Est Marne-la-Vallée IGM Our NLP team||Unitex/GramLab forum|