Main Article Content
Similarité entre textes basées sur les noms propres
Abstract
Mots clés : Similarité/ Classification hiérarchique/ Noms propres.
Similarites between proper namer besed texts
Abstract: Proper naner represent about 10% newspaper articles in English or French texts. Thier quantity and informational qualité are already usen in different Information Extraction systems. Proper names have widely been studied in the MUC confrences designed to promote research in Information Extraction. We have created our own named entity extraction tool based on a linguistic description with automata. The extracted names are used in an iformation retrieval a topic description of the clusters. We verify the interest of the use of proper names in a similarity measure to improve cluster the interest of the use of propre names in a similarity measure to improve clustering. This measure merge a similarity besed on all the words with a similarity based on the propre names.
Key words : Similarity/ Hierarchic clustering/ Proper names.
Revue d'Information Scientifique & Technique Vol.12(2) 2002: 61-76