[Mead] summarizing french

GARCIA FLORES Jorge 704360 IRSN jorge.garcia-flores at cea.fr
Thu Mar 27 11:22:56 EST 2008


FIRST SCENARIO  (without touching Document.pm). Here's the error produced when 
I run MEAD with a one file cluster (an UTF-8 file with french characters)

11111111111111111111111111111111111111111111111111111111
/home/jg704360/evaluation/mead/bin$ ./mead.pl MORCAS 
Using system rc-file: /home/jg704360/evaluation/mead/bin/../.meadrc
Warning: Can't find user rc-file
Cluster: /home/jg704360/evaluation/mead/bin/../data/MORCAS/MORCAS.cluster
iconv: Séquence d'échappement illégale à la position 196

no element found at line 6, column 53, byte 196 
at /usr/local/lib/perl/5.8.8/XML/Parser.pm line 187

no element found at line 1, column 0, byte 0 
at /usr/local/lib/perl/5.8.8/XML/Parser.pm line 187

no element found at line 1, column 0, byte 0 
at /usr/local/lib/perl/5.8.8/XML/Parser.pm line 187
11111111111111111111111111111111111111111111111111111111111

SECOND SCENARIO (with Document.pm::read_document, line 42 commented
open (INSTREAM, "iconv -f BIG5 -t UTF-8 $document_filename |");
and changed for 
open (INSTREAM, "$document_filename");

I get a blank summary, except for those lines where there are no french 
characters

222222222222222222222222222222222222222222222222222222
/home/jg704360/evaluation/mead/bin$ ./mead.pl MORCAS
Using system rc-file: /home/jg704360/evaluation/mead/bin/../.meadrc
Warning: Can't find user rc-file
Cluster: /home/jg704360/evaluation/mead/bin/../data/MORCAS/MORCAS.cluster
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]  Cette loi peut, bien entendu, s'appliquer aux sportifs, spectateurs, 
organisateurs et journalistes venus assister aux Jeux.
[31]  D'autres groupes sociaux, ethniques, religieux ou politiques profiteront 
de l'afflux de journalistes pour attirer l'attention sur leur situation et 
leurs revendications.
22222222222222222222222222222222222222222222222222222222222222222

Any ideas? 


Greetings 

Jorge




On Thursday 27 March 2008 16:33, radev at umich.edu wrote:
> Mead should be 8-bit (UTF-8) compliant. Waht sort of error are you
> getting?
>
> Drago
>
> > Hi. In the French Atomic Energy Comission we would like
> > to summarize french documents with MEAD. I wonder if ther's already a
> > french version of the IDF database, or at least a way to summarize
> > <docsent> documents with foreign characters (UTF-8 encoding)... right now
> > its impossible to treat <docsent> documents with accented characters
> > (MEAD produces an error).
> >
> > Thanks in advance for your answer
> >
> > Jorge Garcia-Flores
> > Post-doc au CEA/IRSN
> > Centre de Fontenay-aux-Roses
> > Laboratoire d'Ingénierie de la Connaissance Multimédia Multilingue
> > (LIC2M) (Multimedia and Multilingual Knowledge Engineering Laboratory)
> > Bat. 38-2 ; 18, rue du Panorama ; BP 6
> > 92265 Fontenay aux Roses Cedex ; France
> > _______________________________________________
> > Mead mailing list
> > Mead at lists.si.umich.edu
> > http://lists.si.umich.edu/mailman/listinfo/mead


More information about the Mead mailing list