"[SOLVED] Setting the parameters of TextMining Extension operator via java!"
Reem
New Altair Community Member
I was trying to set a Parameter of an operator via java, so I can give the my trianed model a text file and it would classify it.
The operator is "Process Documents from Files", the parameter is "test directories" which contains "class name" and "directory"
I tried to follow the post http://rapid-i.com/rapidforum/index.php/topic,5807.0.html, but I didn't find the class name in the OperatorCore.xml
So, where can I find the classes related to the parameters of text mining extension .
My second question is, the parameter has 2 sub parameters "class name" and "directory". So, how to set them? using dot or under score or what?
Note: the process is already running correctly in Rapid-miner, and now I deleted the parameter values and I am tried to set them via java.
Any help or tips are appreciated,
The operator is "Process Documents from Files", the parameter is "test directories" which contains "class name" and "directory"
I tried to follow the post http://rapid-i.com/rapidforum/index.php/topic,5807.0.html, but I didn't find the class name in the OperatorCore.xml
So, where can I find the classes related to the parameters of text mining extension .
My second question is, the parameter has 2 sub parameters "class name" and "directory". So, how to set them? using dot or under score or what?
Note: the process is already running correctly in Rapid-miner, and now I deleted the parameter values and I am tried to set them via java.
Any help or tips are appreciated,
Tagged:
0
Answers
-
Hi,
the operator classname is from the text extension, i.e. you will find the name in the OperatorsTextProcessing.xml file of it. The class is com.rapidminer.operator.text.io.FileDocumentInputOperator.
You can use the setListParameter(String, List) method in your case. just pass it a list of string arrays (each array of size = number of columns in the parameter).
Regards,
Marco0 -
Thanks for replying!
I've the following run java code:package RapidMiner;
but got the following error:
import com.rapidminer.Process;
import com.rapidminer.RapidMiner;
import com.rapidminer.example.Attribute;
import com.rapidminer.example.Example;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.operator.IOContainer;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.repository.ProcessEntry;
import com.rapidminer.repository.RepositoryException;
import com.rapidminer.repository.RepositoryLocation;
import com.rapidminer.tools.XMLException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class RapidMinerClassifier {
static Example exampleSet = null;
String category;
public RapidMinerClassifier() {
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();
}//end of constructor
public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
ExampleSet resultSet = null;
try {
// loads the process from the repository
RepositoryLocation pLoc = new RepositoryLocation(rapidMinerProcess);
ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
String processXML = pEntry.retrieveXML();
Process myProcess = new Process(processXML);
Operator ProcessDocumentsOperator = myProcess.getOperator("Process Documents from Files");
ProcessDocumentsOperator.setParameter("FileDocumentInputOperator.TEXT_DIRECTORIES.DIRECTORY", DocumentsToBeClassfied);
ProcessDocumentsOperator.setParameter("FileDocumentInputOperator.TEXT_DIRECTORIES.CLASS_NAME", "Unknown");
List<String[]> parametersList = new ArrayList<String[]> ();
String[] directoryParameterValues = {"DIRECTORY", DocumentsToBeClassfied};
String[] classNameParameterValues = {"CLASS_NAME", "Unknown"};
parametersList.add(directoryParameterValues);
parametersList.add(classNameParameterValues);
ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.TEXT_DIRECTORIES", parametersList) ;
IOContainer ioResult = myProcess.run();
if (ioResult.getElementAt(0) instanceof ExampleSet) {
resultSet = (ExampleSet) ioResult.getElementAt(0);
}
for (Example example : resultSet) {
Iterator<Attribute> allAtts = exampleSet.getAttributes().allAttributes();
while (allAtts.hasNext()) {
Attribute attribute = allAtts.next();
category = example.getValueAsString(attribute);
System.out.println(category);
}
}
} catch (IOException | XMLException | OperatorException ex) {
ex.printStackTrace();
}
return category;
}//end of classifyDocumentByTopic
public static void main(String[] args) throws RepositoryException {
RapidMinerClassifier rapidMinerClassifier = new RapidMinerClassifier();
rapidMinerClassifier.classifyDocumentByTopic("//Local Repository/SVMtesting.rmp", "D:\\Dropbox\\SeniorProject\\_Spring2014\\_3 Gurus\\Reem_Classification of documents based on Topic\\Corpus\\Processed\\singleFileForRMTesting");
}//end of main
}
INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
[Fatal Error] :1:1: Premature end of file.
Exception in thread "main" java.lang.NullPointerException
at RapidMiner.RapidMinerClassifier.classifyDocumentByTopic(RapidMinerClassifier.java:37)
at RapidMiner.RapidMinerClassifier.main(RapidMinerClassifier.java:75)
Java Result: 1
here is the XML content: (I've created the model using the disabled operators in r=the following process)
svmModel and svmWordlist are located in the local repository
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<parameter key="encoding" value="UTF-8"/>
<process expanded="true">
<operator activated="false" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="42" y="75">
<list key="text_directories">
<parameter key="Geography" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Geography"/>
<parameter key="Religion" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Religion"/>
<parameter key="Science" value="D:\Dropbox\Senior Project\_Spring2014\_3 Gurus\Reem_Classification of documents based on Topic\Corpus\Processed\Science"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<parameter key="prune_below_rank" value="5.0"/>
<parameter key="prune_above_rank" value="5.0"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" name="Tokenize (3)"/>
<connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
<connect from_op="Tokenize (3)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store Wordlist" width="90" x="179" y="30">
<parameter key="repository_entry" value="svmWordlist"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Model" width="90" x="179" y="165">
<parameter key="repository_entry" value="svmModel"/>
</operator>
<operator activated="false" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="447" y="75">
<process expanded="true">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.015" expanded="true" name="SVM">
<parameter key="gamma" value="0.9"/>
<parameter key="C" value="8.0"/>
<parameter key="epsilon" value="0.0010"/>
<list key="class_weights"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" name="Apply Model">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.3.015" expanded="true" name="Performance"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="false" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store Model" width="90" x="582" y="75">
<parameter key="repository_entry" value="svmModel"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="45" y="300">
<parameter key="repository_entry" value="svmWordlist"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="246" y="300">
<list key="text_directories"/>
<process expanded="true">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="255">
<list key="application_parameters"/>
</operator>
<connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Validation" to_port="training"/>
<connect from_op="Process Documents from Files (2)" from_port="word list" to_op="Store Wordlist" to_port="input"/>
<connect from_op="Retrieve Model" from_port="output" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="model" to_op="Store Model" to_port="input"/>
<connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
I always get the same error when I want to use the local repository as recommended http://rapid-i.com/rapidforum/index.php/topic,5807.0.html
Another problem is when I run the process from java without changing the parameter and when I give the whole path like: \\Local Repository\\SVMTesting.rmp
the returned output is the training set, not the predicted class of the new unseen documents, So what to change in the code to get the predicted labels?
and If I change the code to set the operators as useing the path of the repository as \\Local Repository\\SVMTesting.rmp, the error state that, it can't reach svmModel and svmWordlist files!
Any hints?
Your help is appreciated,
0 -
Hi,
1) change your rapidMinerProcess parameter to "//Local Repository/SVMtesting", file endings do not exist for the repository. That's why you are getting the NPE.
2) Your posted process returns nothing - no operator is connected to the "res" ports on the right side of the process, so I can't tell you why you are receiving the training data instead of the classified results. Note that the clasified data also includes the input data, just with additional attribute columns.
3) I don't understand you last sentence. Please include the actual error message.
Regards,
Marco0 -
I've run the code based on the information given for the repository
package RapidMiner;
The XML:
import com.rapidminer.Process;
import com.rapidminer.RapidMiner;
import com.rapidminer.example.Attribute;
import com.rapidminer.example.Example;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.operator.IOContainer;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.repository.ProcessEntry;
import com.rapidminer.repository.RepositoryException;
import com.rapidminer.repository.RepositoryLocation;
import com.rapidminer.tools.XMLException;
import java.io.IOException;
public class RapidMinerClassifier1 {
static Example exampleSet = null;
String category;
public RapidMinerClassifier1() {
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();
}//end of constructor
public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
ExampleSet resultSet = null;
try {
// loads the process from the repository
RepositoryLocation pLoc = new RepositoryLocation(rapidMinerProcess);
ProcessEntry pEntry = (ProcessEntry) pLoc.locateEntry();
String processXML = pEntry.retrieveXML();
Process myProcess = new Process(processXML);
IOContainer ioResult = myProcess.run();
if (ioResult.getElementAt(0) instanceof ExampleSet) {
resultSet = (ExampleSet) ioResult.getElementAt(0);
}
Attribute att = resultSet.getAttributes().get("prediction");
for (Example example : resultSet) {
example.getValue(att);
}
} catch (IOException | XMLException | OperatorException ex) {
ex.printStackTrace();
}
return category;
}//end of classifyDocumentByTopic
public static void main(String[] args) throws RepositoryException {
RapidMinerClassifier1 rapidMinerClassifier = new RapidMinerClassifier1();
rapidMinerClassifier.classifyDocumentByTopic("//RapidMinerRepository/Applyingk-NNModel", "D:\\CORPUS\\Testing");
}//end of main
}<?xml version="1.0" encoding="UTF-8" standalone="no"?>
I don't know why it appeared for you as not connected to to the "res" port, however, I am connecting the the "label" port of Apply model operator to the "res" port. In rapidminer, I got the the predicted labels in the "Data View". My problem is how to get them (the predicted labels) via java if I used java to set the parameters of "Process text from files" operator?
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<parameter key="encoding" value="UTF-8"/>
<process expanded="true" height="371" width="671">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve k-NN Model" width="90" x="112" y="30">
<parameter key="repository_entry" value="k-NNModel"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="112" y="120">
<parameter key="repository_entry" value="Wordlist"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="313" y="120">
<list key="text_directories">
<parameter key="Unknown" value="D:\CORPUS\Testing"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<process expanded="true" height="371" width="671">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply k-NN Model" width="90" x="450" y="30">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve k-NN Model" from_port="output" to_op="Apply k-NN Model" to_port="model"/>
<connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Apply k-NN Model" to_port="unlabelled data"/>
<connect from_op="Apply k-NN Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply k-NN Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="234"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
When I run the above process in java class; I got the following error:
Jars in my java project are
May 01, 2014 10:46:20 PM com.rapidminer.tools.ParameterService init
INFO: Reading configuration resource com/rapidminer/resources/rapidminerrc.
May 01, 2014 10:46:20 PM com.rapidminer.tools.I18N <clinit>
INFO: Set locale to en.
May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Property rapidminer.home is not set. Guessing.
May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\rapidminer.jar'...gotcha!
May 01, 2014 10:46:20 PM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\launcher.jar'...gotcha!
May 01, 2014 10:46:22 PM com.rapidminer.parameter.ParameterTypePassword decryptPassword
WARNING: Password in XML file looks like unencrypted plain text.
May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties <init>
WARNING: Missing database driver class name for ODBC Bridge (e.g. Access)
May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
May 01, 2014 10:46:25 PM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
INFO: JDBC driver oracle.jdbc.driver.OracleDriver not found. Probably the driver is not installed.
May 01, 2014 10:46:25 PM com.rapidminer.tools.WrapperLoggingHandler log
INFO: No filename given for result file, using stdout for logging results!
May 01, 2014 10:46:25 PM com.rapidminer.Process run
INFO: Process starts
com.rapidminer.operator.UserError: Cannot resolve relative repository location 'k-NNModel'. Process is not associated with a repository.
at com.rapidminer.Process.resolveRepositoryLocation(Process.java:1210)
at com.rapidminer.operator.Operator.getParameterAsRepositoryLocation(Operator.java:1383)
at com.rapidminer.operator.io.RepositorySource.getRepositoryEntry(RepositorySource.java:91)
at com.rapidminer.operator.io.RepositorySource.read(RepositorySource.java:105)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at RapidMinerTopicClassifier.classifyDocumentByTopic(RapidMinerTopicClassifier.java:49)
at RapidMinerTopicClassifier.main(RapidMinerTopicClassifier.java:68)
- RapidMiner
- Luncher
- Vldocking
- All jars inside (JDBC folder) - 4 jar files
0 -
Hi,
sorry, the FAQ was missing a bit:
this links the process to the repository so it can resolve relative paths. Otherwise RapidMiner Studio does not know where to look when you reference other data in the repository w/o a fully qualified location.
myProcess.setProcessLocation(pLoc);
Regards,
Marco0 -
I added that line and it didn't work,
I changed the way to load the process to java application as follows:Process process = new Process(Tools.readTextFile(new File(rapidMinerProcess)));
Also, I changed the value to be the Repository Location as follows it did work for retrieving the model:<parameter key="repository_entry" value="//RapidMinerRepository/k-NNModel"/>
However, I don't want to rely on the repository, so I set the parameters via Java as follows:process.getOperator("Retrieve k-NN Model").setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "D:\\k-NNModel");
NOW, it works fine!
My new task is to run "my course project" on a machine that doesn't have RapidMiner already installed.
I tries to follow what mentioned in "RapidMiner as a library" section here http://rapid-i.com/wiki/index.php?title=Integrating_RapidMiner_into_your_application
However, I didn't get from where I can put the rapidminerrc in my project? and how can I specify the required files fro my f\project from rapidminer.home/lib?
Any further explanation or resources to read to achieve this will be appreciated.
In addition,
I still have problem in setting the parameters via java, I used the way mentioned in FAQ post as follow:
List<String[]> list = new LinkedList<>();
String[] directoryParameterValues = {"Unknown", "D:\\Testing"};
list.add(directoryParameterValues);
Operator ProcessDocumentsOperator = process.getOperator("Process Documents from Files"); ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES", list);
Note: the process works in Rapidminer, but when passing the parameters via Java it doesn't work!!!
regards,
0 -
Hi,
your post is a bit confusing now, so I'll try to point some things out.
1) When you have a process which makes use of operators that load further data/models/etc from the repository, your process needs a repository location. Otherwise it does not know how to resolve the relative locations for the "Retrieve" operators, as the process does not know where itself is from. That's why the process location has to be set and is set by RapidMiner Studio when starting the process from the GUI.
2) process.getOperator("Retrieve k-NN Model").setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "D:\\k-NNModel"); That this works is surprising to me as that parameter is intended for repository access which is an abstraction of a file system..
3) The wiki section - good grief, I didn't even know that existed. Looks quite outdated to be honest. If you want to run rapidminer from your own application, make sure that all jars from RapidMiner/lib are in the classpath of your application. Just as any other library jars.
4) Where exactly do we stand now - what is the latest error message you get?
Regards,
Marco0 -
Hi,
I just realized yesterday that I was wrong about setting the repository_entry with a path of my file system.
This is my mistake. Sorry for confusing you!
1+2) Thanks for your clear explanation about the repository.
So, what I understood is that having a repository is a "must" in using RapidMiner.
but I still didn't get how to rely on a repository if we want to run this code on a computer that doesn't have rapidminer.
3). Is there any resource/documentation to read the description for each jar in rapidminer/lib?
Sorry for asking this question again and again; can I run my java application on a computer that does not have RapidMiner installed. Actually, I tried to run the last working code on a laptop that does not have RapidMiner and it gave me errors (I'll report them later to not confuse you with all the problems at once).
4) sorry again for confusing you,
the process was working correctly yesterday but I run in now and It give me new error:May 06, 2014 12:09:16 AM com.rapidminer.tools.ParameterService init
The java code:
INFO: Reading configuration resource com/rapidminer/resources/rapidminerrc.
May 06, 2014 12:09:16 AM com.rapidminer.tools.I18N <clinit>
INFO: Set locale to en.
May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Property rapidminer.home is not set. Guessing.
May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\rapidminer.jar'...gotcha!
May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'C:\Program Files\Rapid-I\RapidMiner5\lib\launcher.jar'...gotcha!
May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'D:\Senior\RapidMinerClassifier\lib\launcher.jar'...gotcha!
May 06, 2014 12:09:16 AM com.rapid_i.Launcher ensureRapidMinerHomeSet
INFO: Trying parent directory of 'D:\Senior\RapidMinerClassifier\lib\rapidminer.jar'...gotcha!
May 06, 2014 12:09:18 AM com.rapidminer.parameter.ParameterTypePassword decryptPassword
WARNING: Password in XML file looks like unencrypted plain text.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Program%20Files/Rapid-I/RapidMiner5/lib/slf4j-simple-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/Senior/RapidMinerClassifier/lib/slf4j-simple-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties <init>
WARNING: Missing database driver class name for ODBC Bridge (e.g. Access)
May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
INFO: JDBC driver ca.ingres.jdbc.IngresDriver not found. Probably the driver is not installed.
May 06, 2014 12:09:21 AM com.rapidminer.tools.jdbc.JDBCProperties registerDrivers
INFO: JDBC driver oracle.jdbc.driver.OracleDriver not found. Probably the driver is not installed.
java.io.FileNotFoundException: \\RapidMinerRepository\Applyingk-NNModel (The network path was not found)
null
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileReader.<init>(FileReader.java:72)
at com.rapidminer.tools.Tools.readTextFile(Tools.java:714)
at RapidMinerTopicClassifier.classifyDocumentByTopic(RapidMinerTopicClassifier.java:57)
at RapidMinerTopicClassifier.main(RapidMinerTopicClassifier.java:93)
.rmp file:
import com.rapidminer.Process;
import com.rapidminer.RapidMiner;
import com.rapidminer.example.Attribute;
import com.rapidminer.example.Example;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.operator.IOContainer;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.OperatorException;
import com.rapidminer.repository.RepositoryException;
import com.rapidminer.tools.Tools;
import com.rapidminer.tools.XMLException;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class RapidMinerTopicClassifier implements RapidMinerClassifier {
private static Example exampleSet = null;
private String topic;
private String classifierModel;
public RapidMinerTopicClassifier() {
RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();
}//end of constructor
/**
*
* @param rapidMinerProcess
* @param DocumentsToBeClassfied
* @return topic
* @throws RepositoryException
*/
@Override
public String classifyDocumentByTopic(String rapidMinerProcess, String DocumentsToBeClassfied) throws RepositoryException {
ExampleSet resultSet = null;
try {
// loads the process from the repository
Process process = new Process(Tools.readTextFile(new File(rapidMinerProcess)));
Operator ProcessDocumentsOperator = process.getOperator("Process Documents from Files");
List<String[]> list = new LinkedList<>();
String[] directoryParameterValues = {"Unknown", DocumentsToBeClassfied};
list.add(directoryParameterValues);
ProcessDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES", list);
Attribute attribute;
IOContainer ioResult = process.run();
for (int i = 0; i < ioResult.size(); i++) {
if (ioResult.getElementAt(i) instanceof ExampleSet) {
resultSet = (ExampleSet) ioResult.getElementAt(0);
}
attribute = resultSet.getAttributes().get("prediction");
for (Example example : resultSet) {
topic = example.getNominalValue(attribute);
}
}
} catch (IOException | XMLException | OperatorException ex) {
ex.printStackTrace();
}
return topic;
}//end of classifyDocumentByTopic
public static void main(String[] args) throws RepositoryException {
RapidMinerClassifier rapidMinerClassifier = new RapidMinerTopicClassifier();
/*
* 1. SVM
* 2. k-NN
* 3. Naive Bayes
*/
String topic = rapidMinerClassifier.classifyDocumentByTopic("//RapidMinerRepository/Applyingk-NNModel" , "/Testing");
System.out.println(topic);
}//end of main
}<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Your help and effort are appreciated,
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<parameter key="encoding" value="UTF-8"/>
<process expanded="true" height="359" width="570">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve k-NN Model" width="90" x="45" y="30">
<parameter key="repository_entry" value="//RapidMinerRepository/k-NNModel"/>
</operator>
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Wordlist" width="90" x="45" y="120">
<parameter key="repository_entry" value="//RapidMinerRepository/Wordlist"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="313" y="120">
<list key="text_directories"/>
<parameter key="encoding" value="UTF-8"/>
<process expanded="true" height="371" width="671">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply k-NN Model" width="90" x="450" y="30">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve k-NN Model" from_port="output" to_op="Apply k-NN Model" to_port="model"/>
<connect from_op="Retrieve Wordlist" from_port="output" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Apply k-NN Model" to_port="unlabelled data"/>
<connect from_op="Apply k-NN Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply k-NN Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="234"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Regards,0 -
Hi,
1) It's very much recommended, yes. You can create a new local repository (pointing to a folder of your choice) and add it to the RepositoryManager to use a repository on a shipped product.
2) Not really, no. Most of them are libraries that RapidMiner Studio itself uses for certain tasks (e.g. a library to read Excel files etc). If in doubt, they are all needed.
3) The error looks like the path on the filesystem where the repository is stored cannot be accessed. Sounds like the repository points to a network drive which is no longer available when executing the process.
Regards,
Marco0 -
Hi,
Thank you for reply,
As I go through the FAQ many time,
I found the following// loads the process from the repository (if you do not have one, see alternative below)
the second line showed me a error because the setProcessLocation() method receives a ProcessLocation object not a RepositoryLocation object!
RepositoryLocation pLoc = new RepositoryLocation("//LocalRepository/folder/as/needed/yourProcessName"));..
...
myProcess.setProcessLocation(pLoc);
So, I tried to do it this way which doesn't make sense:myProcess.setProcessLocation(myProcess.getProcessLocation());
So, did I miss any point?0 -
Hi,
sorry about that! You spotted an error in the FAQ.
It should actually be:
Regards,
myProcess.setProcessLocation(new RepositoryProcessLocation(pLoc));
Marco0 -
Hi,
Many Thanks for your continues help!
returning back to my main problem of setting the parameters, I tried to follow the way as mentioned above and in FAQ:
I also tried the following for "Process Documents from Files" operator:
Operator retrieveOperator = process.getOperator("Retrieve");
retrieveOperator.setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "//Repository/modelname");
Operator processOperator = process.getOperator("Process Documents from Files");
List<String[]> list = new LinkedList<>();
String[] values = {"Unknown", "D:/Files"};//size of array must be the same as number of columns in the parameter GUI
list.add(values);
processOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT", list);
This doesn't work, the output is the following line:
processDocumentsOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT_DIRECTORIES.Unknown", DocumentsToBeClassfied);WARNING: Kernel Model: The given example set does not contain a regular attribute with name 'day'. This might cause problems for some models depending on this particular attribute.
This warning line appears for every attribute in the training set!
When I open the process.rmp file I don't see any changes even for the retrieve operator.
Also, if I set the parameters' values in RapidMiner GUI, every thing works fine!
Any tips?
Thanks !0 -
Hi,
1)
Will not work, you are not using the String constant but instead creating your own string which consists of the constant name
retrieveOperator.setParameter("RepositorySource.PARAMETER_REPOSITORY_ENTRY", "//Repository/modelname");
processOperator.setListParameter("FileDocumentInputOperator.PARAMETER_TEXT", list);
Correct would be:
2) When you change a process in your Java application, you are NOT working on the stored .rmp file. You have created a local in memory copy on which you are working. To persist your changes, you need to manually store your process again. To do so have a look at the StoreProcessAction.
processOperator.setListParameter(FileDocumentInputOperator.PARAMETER_TEXT, list);
Regards,
Marco0 -
Ops, silly mistake!
Thanks for your patience and help!0