"Classifying with SVM through the java API"
Legacy User
New Altair Community Member
Hi,
I am trying to do a simple classification by integrating RapidMiner into Java. This is approximately the same as a Process I have defined in the GUI which works great. This is how I try and do it in code:
(I call train() once and the classify() for each text).
The problem is all texts always get the same classification, as if no learning had occured or even just some default is taken. These are texts that I classify in the GUI properly (they belong to 5 different classes - polynominal problem), and in different classifiers (lingPipe and a homebrewed one).
setting them is done there this way:
Thanks a lot,
Nimrod.
I am trying to do a simple classification by integrating RapidMiner into Java. This is approximately the same as a Process I have defined in the GUI which works great. This is how I try and do it in code:
(I call train() once and the classify() for each text).
The problem is all texts always get the same classification, as if no learning had occured or even just some default is taken. These are texts that I classify in the GUI properly (they belong to 5 different classes - polynominal problem), and in different classifiers (lingPipe and a homebrewed one).
This works whether I set or not set parameters in //Should we set parameters here?
public void train(List<Text> documents) {
RapidMiner.init(false, false, false, true);
wvtoolOperator = (OperatorChain) OperatorService
.createOperator(TextInputOperator.class);
wvtoolOperator.addOperator(OperatorService
.createOperator("StringTokenizer"));
wvtoolOperator.addOperator(OperatorService
.createOperator("EnglishStopwordFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("TokenLengthFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("PorterStemmer"));
List list = new ArrayList();
for (Text text : documents) {
String filename = ...
String classname = ...
list.add(new Object[] { filename, classname});
}
wvtoolOperator.setListParameter("texts", list);
IOContainer container = wvtoolOperator.apply(new IOContainer());
ExampleSet exampleSet = container.get(ExampleSet.class);
Learner learner = (Learner)OperatorService.createOperator(LibSVMLearner.class);
//Maybe set parameters here?
model = learner.learn(exampleSet);
// Create the model applier
modelApplier = OperatorService.createOperator("ModelApplier");
//Create a new SingleTextInput, for processing test Strings
wvtoolOperator = (OperatorChain) OperatorService
.createOperator(SingleTextInput.class);
// Add additional processing steps.
// Note the setup must be same as the one you used when creating the classification model
wvtoolOperator.addOperator(OperatorService
.createOperator("StringTokenizer"));
wvtoolOperator.addOperator(OperatorService
.createOperator("EnglishStopwordFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("TokenLengthFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("PorterStemmer"));
}
public String classify(String text) {
try{
// Set the text
wvtoolOperator.setParameter("text", text);
// Call the text input operator
IOContainer container = wvtoolOperator.apply(new IOContainer());
container = container.append(model);
// Call the model applier (the model was added already before calling the text input)
container = modelApplier.apply(container);
// Obtain the example set from the io container. It contains only a single example with our text in it.
ExampleSet eset = container.get(ExampleSet.class);
Example e = eset.iterator().next();
//This does the same thing as what two lines later happens...
//return e.getValueAsString(eset.getAttributes().getPredictedLabel()));
int predLabelIndex = (int) e.getPredictedLabel();
return e.getAttributes().getPredictedLabel().getMapping().mapIndex(predLabelIndex);
} catch (Exception ex) {
//...
}
}
setting them is done there this way:
I am probably overlooking something simple but I'm completely out of ideas, I have looked around a lot and tried many approaches.
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_SVM_TYPE, new Integer(LibSVMLearner.SVM_TYPE_C_SVC).toString());
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_KERNEL_TYPE, "0");//linear
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_EPSILON, "0.001");
//((Operator)learner).setParameter(LibSVMLearner.PARAMETER_C, "0.0");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_P, "0.1");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_CONFIDENCE_FOR_MULTICLASS, "true");
Thanks a lot,
Nimrod.
Tagged:
0
Answers
-
Hi,
Okay, I have understood that I have to save and load the wordlist via the parameters. However, I feel like there should be some kind of object I could pass around between the filters instead of having to write it to a file and load it. Is this supported?
Also, does that mean it will be loaded every time I apply() the SingleTextInput()?
Thanks,
Nimrod
0 -
Hi,
And another question while I'm at it... Using the code shown above it takes me about 1.5 seconds to classify each text (around 200 words) after learning a model containing a few hundreds of documents. In the GUI it is closer to your published performance of 25ms per post: It takes 66 seconds to cross-validate the same 350 or so documents in 10 folds (I end up classifying around 700+ documents, so it's actually even much faster). I'm running the example in the text plugin samples, 04_Learning/01_TextClassificationXVal.xml .
The slow step is ModelApplier.apply()... What could it be? Inherently my java development environment it over 1,000 times slower? or is something done in a different manner in the GUI environment for the said sample?
Thank you,
Nimrod0 -
Hi,
Okay I see now this is because of pruning, which seriously affects the performance of the SVM Model.
Thanks,
I hope this will be useful to someone for posterity.
But please answer my last question if you have the time (in the previous thread).
Nimrod.0