🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Problem with text processing plugin

User: "datasunny"
New Altair Community Member
Updated by Jocelyn
Hi all,

I encountered a problem in RM text processing plugin.
The program was working fine before but failed for some text files with non ascii characters.
The setup is using "Process Documents from Files" operator, what's in that operator are:
Transform Cases -> Tokenize -> Filter Stopwords -> Stem -> Filter Tokens (by Length)

Is it a bug in the text processing plugin or sth wrong with my setup/program? Thanks.

--------------------------------------------------------------------------------------------------
SEVERE: Process failed: operator cannot be executed (The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".). Check the log messages...
org.jdom.IllegalNameException: The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".
...
...
---------------------------------------------------------------------------------------------------
Exception in thread "main" org.jdom.IllegalNameException: The name "home" is not legal for JDOM/XML attributes: XML names cannot begin with the character "h".
at org.jdom.Attribute.setName(Attribute.java:361)
at org.jdom.Attribute.<init>(Attribute.java:228)
at org.jdom.Attribute.<init>(Attribute.java:276)
at org.jdom.DefaultJDOMFactory.attribute(DefaultJDOMFactory.java:93)
at org.jdom.input.SAXHandler.startElement(SAXHandler.java:544)
at org.ccil.cowan.tagsoup.Parser.push(Parser.java:794)
at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1061)
at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1016)
at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:388)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
at com.rapidminer.operator.text.io.filereader.HTMLFileReader.readStream(HTMLFileReader.java:72)
at com.rapidminer.operator.text.io.filereader.AbstractFileReader.readFile(AbstractFileReader.java:37)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:94)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:43)
at com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:228)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at Filter.filter(PornFilter.java:84)
at Filter.main(PornFilter.java:128)

Find more posts tagged with