"automate rapidminer process in java program"
datasunny
New Altair Community Member
Hi all,
I use rapidminer 5.1.011 to design a text classification process, it's working fine in GUI. Now i'm trying to automate this process in java program. I wrote a simple Java program and when i run it i got the following errors:
I use rapidminer 5.1.011 to design a text classification process, it's working fine in GUI. Now i'm trying to automate this process in java program. I wrote a simple Java program and when i run it i got the following errors:
Process config XML:
com.rapidminer.operator.UserError: Could not read file '/home/some_user/wordlist': java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList.
at com.rapidminer.operator.io.IOObjectReader.read(IOObjectReader.java:100)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:369)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:920)
at com.rapidminer.Process.run(Process.java:843)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:797)
at com.rapidminer.Process.run(Process.java:787)
at ProcessCreator.main(ProcessCreator.java:18)
Caused by: java.io.IOException: Cannot read from XML stream, wrong format: WordList : WordList
at com.rapidminer.tools.XMLSerialization.fromXML(XMLSerialization.java:141)
...
Java code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.011">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
<process expanded="true" height="695" width="989">
<operator activated="true" class="read" compatibility="5.1.011" expanded="true" height="60" name="Read" width="90" x="45" y="210">
<parameter key="object_file" value="/home/some_user/wordlist"/>
<parameter key="io_object" value="WordList"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.1.003" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="210">
<list key="text_directories">
<parameter key="porn" value="/home/some_user/test_data/porn"/>
</list>
<process expanded="true" height="695" width="989">
<operator activated="true" class="web:extract_html_text_content" compatibility="5.1.004" expanded="true" height="60" name="Extract Content" width="90" x="45" y="30"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.003" expanded="true" height="60" name="Transform Cases" width="90" x="180" y="30"/>
<operator activated="true" class="text:tokenize" compatibility="5.1.003" expanded="true" height="60" name="Tokenize" width="90" x="315" y="30"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.003" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="450" y="30"/>
<operator activated="true" class="text:stem_snowball" compatibility="5.1.003" expanded="true" height="60" name="Stem (Snowball)" width="90" x="585" y="30"/>
<operator activated="true" class="text:filter_by_length" compatibility="5.1.003" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="787" y="30">
<parameter key="min_chars" value="2"/>
<parameter key="max_chars" value="99"/>
</operator>
<connect from_port="document" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
<connect from_op="Stem (Snowball)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="210">
<parameter key="attribute_filter_type" value="no_missing_values"/>
<parameter key="attributes" value="|label|text"/>
</operator>
<operator activated="true" class="read_model" compatibility="5.1.011" expanded="true" height="60" name="Read Model" width="90" x="380" y="30">
<parameter key="model_file" value="/home/some_user/svm_model"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="Set Role" width="90" x="447" y="210">
<parameter key="name" value="label"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="581" y="120">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="715" y="120">
<list key="class_weights"/>
</operator>
<connect from_op="Read" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Read Model" from_port="output" to_op="Apply Model" to_port="model"/>
<connect from_op="Set Role" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Appreciated ur help!
import com.rapidminer.tools.OperatorService;
import com.rapidminer.RapidMiner;
import com.rapidminer.Process;
import java.io.*;
import java.io.IOException;
public class ProcessCreator {
public static void main(String[] argv) {
try {
RapidMiner.setExecutionMode(com.rapidminer.RapidMiner.ExecutionMode.EMBEDDED_WITHOUT_UI);
RapidMiner.init();
Process process = new Process(new File(argv[0]));
// perform process
process.run();
} catch (Exception e) { e.printStackTrace(); }
}
}
0
Answers
-
It looks like i have to specify mode as "ExecutionMode.COMMAND_LINE" instead of "ExecutionMode.EMBEDDED_WITHOUT_UI", though i still dont know the difference between these two ???
From API doc:
public static final RapidMiner.ExecutionMode COMMAND_LINE
RM is executed using RapidMinerCommandLine.main(String[]).
public static final RapidMiner.ExecutionMode EMBEDDED_WITHOUT_UI
RM is embedded into another program.
0 -
Hi,
have a look at the constructor of the ExecutionMode enum:
COMMAND_LINE sets loadManagedExtensions to true, however EMBEDDED_WITHOUT_UI sets it to false.
private ExecutionMode(boolean isHeadless, boolean canAccessFilesystem, boolean hasMainFrame, boolean loadManagedExtensions) {
//...
}
WordList is an IOObject from the Text Processing Extension, therefore managed plugins need to be loaded, otherwise a process using anything from these plugins will fail (as yours did).
Regards,
Marco0 -
If I have a model build up using rapidminer and need to check the exact java code for this model, like where is the file that does stemporter in code for example, can I do that?0
-
Hi.,
Yes, the source is here, in the Plugins directory of the SVN repository..
https://rapidminer.svn.sourceforge.net/svnroot/rapidminer/Plugins/TextProcessing/Vega/src/com/rapidminer/operator/text/io/stemmer/PorterStemming.java0 -
Thanks haddock, I did go to the com.rapidminer.operator.text package and it was empty, may be when I did the run configuration steps to make RM runs through eclipse did miss something, actually I have other packages showing as empty, can anybody tell me how to get the contents of these packages? Thanks0
-
How did you input the process through command line. Please let me know.0
-
Also, what do i input as command line argument to the program?0
-
Hi,
for RapidMiner Studio 6, call the rapidminer-batch.bat/rapidminer-batch.sh file and pass the repository location, i.e. "//Local Repository/My folder/my_process". Alternatively you can add "-f" as a parameter followed by a whitespace and then the path to the .rmp file on the harddrive, i.e. -f C:\Users\xyz\.RapidMiner\repositories\Local Repository\Process.rmp
Regards,
Marco0 -
Thank you0