Eric Laporte's Publications
1986-1989
1990-1994
1995-1999
2000-2004
2005-
1986-1989
1990-1994
1995-1999
2000-2004
2005-
Abstracts
Danlos, Laurence, Françoise Emerard, Eric Laporte,
1986. "Synthesis
of Spoken Messages from Semantic Representations
(Semantic-Representation-to-Speech System)", Proceedings of Coling 1986, Bonn,
pp. 599-604.
doi:10.3115/991365.991540
Abstract. A semantic-representation-to-speech system communicates orally the information given in a semantic representation. Such a system must integrate a text generation module, a phonetic conversion module, a prosodic module and a speech synthesizer. We will see how the syntactic information elaborated by the text generation module is used for both phonetic conversion and prosody, so as to produce the data that must be supplied to the speech synthesizer, namely a phonetic chain including prosodic information.
Keywords: natural language, phonetics, text generation, speech synthesis.
1987. "Prise en compte des variations phonétiques
en reconnaissance de la parole", Actes des 16es Journées d'étude
sur la parole, Société française d'acoustique,
Hammamet, pp.153-156.
Abstract. This paper deals with ways of taking into account phonetic variations in speech recognition systems. Several recognition methods are considered. Particular emphasis is placed on recognition systems, based on pattern-matching, in which the decision unit is the fraction of speech between two adjacent syllabic centres. The phonetic data involved in this method include a list of references, which should contain variants. Such a method underlines the applicative interest of describing variants precisely ans sytematically. As an example of such a description, some phonetic alternations related to hiatuses in French are studied in detail.
Keywords: natural language, phonetics, phonology, speech recognition.
1994.
"Experiments
in Lexical Disambiguation Using Local Grammars", Papers
in Computational Lexicography, COMPLEX '94, Ferenc Kiefer, Gabor Kiss
and Julia Pajzs eds., Budapest: Linguistics Institute of the Hungarian
Academy of Sciences, pp.163-172.
Abstract. Lexical disambiguation is one of the major challenges facing those who devise automatic word tagging systems for processing written text. Grammatical disambiguation algorithms reduce the number of possible tags. We will consider here a framework where a large grammatical lexicon is looked up to associate every token in the text, either a simple or a compound word, with the set of all grammatical tags a priori possible for it. (Such a framework for French is now integrated into the INTEX system.) This problem was investigated by M. Silberztein (1989) and E. Roche (1992). We provide formal descriptions of both algorithms. They share a striking common background and purpose. However, they show real formal and computational differences. From a formal point of view, we compare the formal power of the algorithms. From a practical point of view, we examine whether the algorithms are better adapted to particular types of grammatical disambiguation.
Keywords: natural language, lexical analysis, lexical ambiguity, finite-state
automata.
1996.
"Context-free parsing
with finite-state transducers", in Proceedings of the
3rd South
American Workshop on String Processing,
N. Ziviani et al. (eds.), International Informatics Series 4, Montréal:
McGill-Queen's University Press; and
Ottawa: Carleton University Press, pp. 171-182.
190 Kb.
Abstract. This article is a study of an algorithm designed and implemented by Roche for parsing natural language sentences according to a context-free grammar. This algorithm is based on the construction and use of a finite-state transducer. Roche successfully applied it to a context-free grammar with very numerous rules. In contrast, the complexity of parsing words according to context-free grammars is usually considered in practice as a function of one parameter: the length of the input sequence; the size of the grammar is generally taken to be a constant of a reasonable value. In this article, we first explain why a context-free grammar with a correct lexical and grammatical coverage is bound to have a very large number of rules and we review work related with this problem. Then we exemplify the principle of Roche's algorithm on a small grammar. We provide formal definitions of the construction of the parser and of the operation of the algorithm and we prove that the parser can be built for a large class of context-free grammars, and that it outputs the set of parsing trees of the input sequence.
Keywords: natural language, parsing, finite-state automata, context-free
grammars.
Eric Laporte, Anne
Monceaux, 1997. Grammatical disambiguation of French words using part
of speech, inflectional features and lemma of words in the context. GRAMLEX
report no. 3D2, 11 p.
Abstract.We describe ELAG (Elimination of lexical ambiguity with grammars), a new system of lexical disambiguation using grammatical information about words in the context.The disambiguation takes place after a lexical analysis of input text, but before syntactic parsing. The linguistic data of the disambiguator are organised in separate, compact, readable modules, that we call disambiguation grammars. The respective effects of several disambiguation grammars on an input text are independent of each other. This feature of the disambiguation is mathematically guaranteed by the formula used to apply grammars to sentences. The effects of disambiguation grammars are cumulative: if one writes new grammars and uses them with existing ones, the effect of the existing grammars is not modified. Different grammars can apply to the same sequence, or to overlapping sequences, or to sequences included in other sequences: their effects are cumulative. The order of application of grammars is indifferent. The effects of a grammar on various analyses of a sentence are independent. ELAG is INTEX-compatible.
Keywords: natural language, lexical ambiguity, finite-state automata.
1997. "Rational
Transductions for Phonetic Conversion
and Phonology", in E. Roche and Y.Schabès eds.,
Finite-State
Language Processing, chap. 14. Language, Speech and Communication series.
Cambridge: MIT Press, pp. 407-429.
415 Kb.
Abstract. Phonetic conversion, and other conversion problems related to phonetics, can be performed by finite-state tools. This chapter presents a finite-state conversion system, BiPho, based on tranducers and bimachines. The linguistic data used by this system are described in a readable format and actual computation is efficient. The system constitutes a spelling-to-phonetics conversion system for French.
Keywords: natural language, phonetics, finite-state automata.
1995. "Appropriate nouns with obligatory modifiers",
Language Research 31(2), Seoul National University, ISSN 0254-4474, pp. 251-289. Presented at
the 4th Korean-French Conference on Grammar and the Lexicon, National University
of Seoul, 1994. French version in Langages 126.
Abstract. The notion of appropriate sequence as introduced by Z. Harris provides a powerful syntactic way of analysing the detailed meaning of various sentences, including ambiguous ones. In an adjectival sentence like The leather was yellow, the introduction of an appropriate noun, here colour, specifies which quality the adjective describes. In some other adjectival sentences with an appropriate noun, that noun plays the same part as colour and seems to be relevant to the description of the adjective. These appropriate nouns can usually be used in elementary sentences like The leather had some colour, but in many cases they have a more or less obligatory modifier. For example, you can hardly mention that an object has some colour without qualifying that colour at all. About 300 French nouns are appropriate in at least one adjectival sentence and have an obligatory modifier. They enter in a number of sentence structures related by several syntactic transformations. The appropriateness of the noun and the fact that the modifier is obligatory are reflected in these transformations. The description of these syntactic phenomena provides a basis for a classification of these nouns. It also concerns the lexical properties of thousands of predicative adjectives, and in particular the relations between the sentence without the noun: The leather was yellow and the adjectival sentence with the noun: The colour of the leather was yellow.
Keywords: lexicon-grammar, syntax, lexicology.
1997. "Les Mots.
Un demi-siècle de traitements",
Traitement
automatique des langues (t.a.l.) 38(2),
État de l'art,
Paris:
ATALA, pp. 47-68.
[INIST
link]
Abstract. We survey those domains of natural language processing where the notion of word can be considered as the fundamental unit. We examine the results aimed at, the resultsachieved, the data acquired and the methods used in these domains. Our ambition is that this critical evaluation could contribute to orientate research and development effort towads practical results.
Keywords: natural language.
1998.
"Lexical
disambiguation with fine-grained tagsets",
in J. Ginzburg et al., ed.,
The
Tbilisi Symposium in Logic, Language and Computation: Selected Papers.
19-22 October 1995,
Gudauri, Georgia.
Studies in Logic, Language and
Information,
Cambridge: Cambridge University Press; and
Stanford: CSLI and
FoLLI, pp. 203-210.
Abstract. We describe the mathematical models underlying two constraint-based, finite-state methods for lexical disambiguation with fine-grained tagsets. They are more powerful variants of the methods described by Roche 1992 and Silberztein 1993. Both have the full theoretical expressive power of finite-state devices.
Keywords: natural language, lexical ambiguity, finite-state automata.
Strahil Ristov, Éric Laporte, 1999.
"Ziv
Lempel Compression of Huge Natural Language Data Tries Using Suffix Arrays",
in
LNCS 1645,
Combinatorial Pattern Matching, 10th Annual Symposium,
Warwick University, UK, July 1999, Proceedings,
M. Crochemore, M. Paterson., eds., Berlin: Springer, pp. 196-211. 949 Kb.
Abstract. We present a very efficient data structure, in terms of space and access speed, for storing huge natural language data sets. The structure is described as a Ziv Lempel compressed linked list trie and is a step further beyond directed acyclic word graph in automata compression. We are using the structure to store DELAF, a huge French lexicon with syntactical, grammatical and lexical information associated with each word. The compressed structure can be produced in O(N) time using suffix trees for finding repetitions in trie. For large data sets, space requirements are more prohibitive than time, so suffix arrays are used instead, with compression time complexity O(N log N) for all but for the largest data sets.
Keywords: natural language, data compression, natural language.
Éric Laporte, Anne
Monceaux, 1999. "Elimination
of lexical ambiguities by grammars. The ELAG system",
Lingvisticae Investigationes
XXII,
Amsterdam-Philadelphie : Benjamins, pp. 341-367.
Ingenta link.
RTF (1 Mb).
Abstract. We present a new, INTEX-compatible formalism for the description of distributional constraints, ELAG (Elimination of lexical ambiguity by grammars). The constraints may be checked against text, and the lexical ambiguity of the text may thus be partly resolved. We describe and exemplify the main properties of ELAG with the aid of simple rules, formalizing exploitable constraints. We specify in detail the effect of applying an ELAG rule or grammar to a text. We examine the practical properties of the formalism from the point of view of a rule writer. We describe our evaluation procedure for the lexical disambiguation results.
Keywords: natural language, lexical ambiguity, finite-state automata.
2001. "Reduction of lexical ambiguity",
Lingvisticae
Investigationes XXIV:1, Amsterdam-Philadelphie : Benjamins, pp. 67-103. RTF.
Abstract. We examine various issues faced during the elaboration of lexical disambiguators, e.g. issues related with linguistic analyses underlying disambiguators, and we exemplify these issues with grammatical constraints. We also examine computational problems: the influence of the granularity of tagsets, the definition of realistic and useful objectives, and the construction of the data required for the reduction of ambiguity; and we study how they are connected with linguistic problems. We show why a formalism is required for automatic ambiguity reduction, we analyse its function and we present a typology of such formalisms.
Keywords: natural language, lexical ambiguity.
2005. "Une
classe d'adjectifs de localisation", in
Cahiers de lexicologie 86, Les adjectifs non prédicatifs,
Paris: Garnier, pp. 145-161.
Abstract. We propose a homogeneous class of French location adjectives, ADJLOC, and a lexicon-grammar approach to their description. The adjectives are those which never constitute a predicate with a support verb, and optionally or obligatorily occur in free sentences like This is the south front of the house. ADJLOC's admit various other syntactic constructions. Thus, some of them occur in a sentence with have related to a sentence with a locative preposition: the car has a rear bumper, the car has a bumper in its rear part. Two nominalization relations lead to nominal constructions: this is the central area of the screen, this is the centre of the screen, this is the area of the centre of the screen. The constructions discussed in this article are represented in a table of syntactic properties.
Keywords: lexicology, adjective, location.
2005. "Lexicon management
and standard formats",
Archives of Control Sciences 15:3, pp.
329-340; also in Proceedings of the Language and Technology
Conference, Poznan (Poland) : University Adam Mickiewicz, pp. 318-322.
Abstract. International standards for lexicon formats are in preparation. To a certain extent, the proposed formats converge with prior results of standardization projects. However, their adequacy for (i) lexicon management and (ii) lexicon-driven applications have been little debated in the past, nor are they as a part of the present standardization effort. We examine these issues. IGM has developed XML formats compatible with the emerging international standards, and we report experimental results on large-coverage lexicons.
Keywords: language resource, lexicon management,
standardization, inflection, morphology.
Marcelo C.M. Muniz, Maria das Graças V. Nunes, Eric Laporte, 2005.
"UNITEX-PB, a set of flexible language resources
for Brazilian Portuguese", in Proceedings of the Workshop
on Technology on Information and Human Language (TIL), São Leopoldo (Brazil):
Unisinos, pp. 2059-2068.
Abstract. This work documents the project and development of various computational linguistic resources that support the Brazilian Portuguese language according to the formal methodology used by the corpus processing system called UNITEX. The delivered resources include computational lexicons, libraries to access compressed lexicons, and additional tools to validate those resources.
Keywords: language resource, lexicon management, inflection, morphology.
Hyun-gue HUH, Eric Laporte, 2005.
"A Resource-Based Korean morphological annotation
system", in Companion to the Proceedings of the International
Joint Conference on Natural Language Processing, Jeju (Korea), pp. 37-42.
Abstract. We describe a resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. The output of our system is a graph of morphemes annotated with accurate linguistic information. The language resources used by the system can be easily updated, which allows users to control the evolution of the performances of the system. We show that morphological annotation of Korean text can be performed directly with a lexicon of words and without morphological rules.
Keywords:
language resource, Korean, annotation, morphology, agglutinative language.
Ivan Berlocher,
Hyun-gue HUH, Eric Laporte, Jee-sun NAM. 2006. "Morphological
annotation of Korean with Directly Maintainable Resources", in
Proceedings of LREC,
Genoa.
Keywords: language resource, evaluation, Korean, annotation, morphology, agglutinative language.
Olivier Blanc, Matthieu
Constant, Éric Laporte, 2006. "Outilex, plate-forme logicielle de traitement de textes
écrits", Verbum ex
machina. Proceedings
of TALN, Cahiers du Cental Series,
2(1), Presses universitaires de Louvain, pp. 83-92.
Abstract. The Outilex software platform, which will be made available to research, development and industry, comprises software components implementing all the fundamental operations of written text processing: processing without lexicons, exploitation of lexicons and grammars, language resource management. All data are structured in XML formats, and also in more compact formats, either readable or binary, whenever necessary; the required format converters are included in the platform; the grammar formats allow for combining statistical approaches with resource-based approaches. Manually constructed lexicons for French and English, originating from the LADL, and of substantial coverage, will be distributed with the platform under LGPL-LR license.
Keywords: lexical tagging, linguistic resource, lexicon, grammar, finite
automaton, XML.
Éric Laporte, Sébastien
Paumier, 2006. "Graphes paramétrés et outils de lexicalisation",
Poster, Verbum ex
machina. Proceedings
of TALN, Cahiers du Cental Series,
2(1), Presses universitaires de Louvain, pp. 532-540. — HAL
link.
Abstract. Shifting to a lexicalized grammar reduces the number of parsing errors and improves application results. However, such an operation affects a syntactic parser in all its aspects. One of our research objectives is to design a realistic model for grammar lexicalization. We carried out experiments for which we used a grammar with a very simple content and formalism, and a very informative syntactic lexicon, the lexicon-grammar of French elaborated by the LADL. Lexicalization was performed by applying the parameterized-graph approach. Our results tend to show that most information in the lexicon-grammar can be transferred into a grammar and exploited successfully for the syntactic parsing of sentences.
Keywords: lexicalisation, parser,
syntactic parsing, French, lexicon-grammar.
Maria Carmelita P. Dias, Éric Laporte, Christian Leclère, 2006. "Verbs with very strictly selected complements",
Collocations and Idioms: The First Nordic Conference on Syntactic Freezes, University of Joensuu,
Finland.
Abstract. We discuss the characteristics
and behaviour of two parallel classes of verbs in two Romance languages, French
and Portuguese. Examples of these verbs are Port. abater
[gado] and Fr. abattre [bétail],
both meaning "slaughter [cattle]". In both languages, the
definition of the class of verbs includes several features:
- They have only one essential complement, which is a direct object.
- The nominal distribution of
the complement is very limited, i.e., few nouns can be selected as head nouns of
the complement. However, this selection is not restricted to a single noun, as
would be the case for verbal idioms such as Fr. monter
la garde "mount guard".
- We excluded from the class
constructions which are reductions of more complex constructions, e.g. Port. afinar
[instrumento] com "tune [instrument] with".
Keywords: multi-word
expressions, syntax, French, Portuguese, lexicon-grammar.
Éric Laporte, 2007. "Evaluation of a Grammar of French Determiners",
Annals of the 27th Congress of the Brazilian Society of
Computation,
Workshop on Information Technology and Human Language (TIL),
Rio de Janeiro.
Abstract. Existing syntactic grammars of natural languages, even with a far from complete coverage, are complex objects. Assessments of the quality of parts of such grammars are useful for the validation of their construction. We evaluated the quality of a grammar of French determiners that takes the form of a recursive transition network. The result of the application of this local grammar gives deeper syntactic information than chunking or information available in treebanks. We performed the evaluation by comparison with a corpus independently annotated with information on determiners. We obtained 86% precision and 92% recall on text not tagged for parts of speech.
Keywords : determiner, definite, indefinite, quantity, syntax, French, grammar, local grammar, evaluation, annotated corpus.
2008 (to appear). "Exemples
attestés et exemples construits dans la pratique du lexique-grammaire", Mémoires
de la Société de linguistique de Paris. Louvain/Paris/Dudley: Peeters.
Abstract. Croft (1993) contrasts an ‘experimental method’ with an ‘observational method’, thus renewing the discussion between introspective linguistics and corpus linguistics, by suggesting a parallel with experimental sciences, which these terms come from. The example of lexicon-grammar, a method of syntactic-semantic description constructed with explicit reference to experimental sciences, confirms that formulating rules in accordance with the real usage of a language is not only a matter of observing examples, but also that it nevertheless requires intensive observation of examples, as well as rigorous methodological precautions in this observation. Thus, the apparently opposed traditions of introspective linguistics and of corpus linguistics are complementary and should be combined for the success of such an enterprise. These thoughts are an invitation for linguists to overcome their historical resistance to combining both types of methods. Similarly, in natural language processing, most of the community sticks to the stochastic approach, which amounts to giving up co-operation between computer technology and descriptive linguistics.
Keywords: corpus linguistics, introspection.