Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Mining a PDF document
Gjor
I'm new to rapid miner. i would like to mine a pdf to create a word and number vector. I using the following operators:
Operators as follows;
1. Read document ( Content type: PDF and Encoding: system)
2. Process Document from Data (Prune method: absolute and datamanagement: double_sparsey_array)
Inside Process Document from Data
2.a Extract information ( Query type:string matching)
2.b Tokenize (mode:non letter)
2.c Transform case (Transform to: Lower case)
Error Message: com.rapidminer.operator.text.Document cannot be cast to com.rapidminer.example.ExampleSet
Stack trace:
------------
Exception: java.lang.ClassCastException
Message: com.rapidminer.operator.text.Document cannot be cast to com.rapidminer.example.ExampleSet
Stack trace:
com.rapidminer.operator.text.io.ExampleSetDocumentInputOperator.getTextObjects(ExampleSetDocumentInputOperator.java:110)
com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:224)
com.rapidminer.operator.Operator.execute(Operator.java:833)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
com.rapidminer.operator.Operator.execute(Operator.java:833)
com.rapidminer.Process.run(Process.java:925)
com.rapidminer.Process.run(Process.java:848)
com.rapidminer.Process.run(Process.java:807)
com.rapidminer.Process.run(Process.java:802)
com.rapidminer.Process.run(Process.java:792)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Hi Neil. I'm getting "com.rapidminer.operator.text.Document cannot be cast to com.rapidminer.example.ExampleSet
". The sequence includes: 1. Read document (pdf) ---> 2. Process Document from Data 2a. Tokenize 2.b Transform case. I'm trying to create word vector. Thank you for your assistance.
Find more posts tagged with
AI Studio
Accepted answers
All comments
Andrew2
Hello
The output from the Read Document operator is a document whereas the Process Documents from Data expects an Example Set.
One option is to insert a Documents to Data operator between them.
Another better option would be to use the Read Documents from Files operator.
regards
Andrew
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups