🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Classifying with SVM through the java API"

User: "Legacy User"
New Altair Community Member
Updated by Jocelyn
Hi,

I am trying to do a simple classification by integrating RapidMiner into Java. This is approximately the same as a Process I have defined in the GUI which works great. This is how I try and do it in code:
(I call train() once and the classify() for each text).
The problem is all texts always get the same classification, as if no learning had occured or even just some default is taken. These are texts that I classify in the GUI properly (they belong to 5 different classes - polynominal problem), and in different classifiers (lingPipe and a homebrewed one).

public void train(List<Text> documents) {
RapidMiner.init(false, false, false, true);

wvtoolOperator = (OperatorChain) OperatorService
.createOperator(TextInputOperator.class);

wvtoolOperator.addOperator(OperatorService
.createOperator("StringTokenizer"));
wvtoolOperator.addOperator(OperatorService
.createOperator("EnglishStopwordFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("TokenLengthFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("PorterStemmer"));

List list = new ArrayList();
for (Text text : documents) {
String filename = ...
String classname = ...
list.add(new Object[] { filename, classname});
}

wvtoolOperator.setListParameter("texts", list);


IOContainer container = wvtoolOperator.apply(new IOContainer());
ExampleSet exampleSet = container.get(ExampleSet.class);
Learner learner = (Learner)OperatorService.createOperator(LibSVMLearner.class);
//Maybe set parameters here?
model = learner.learn(exampleSet);
// Create the model applier
modelApplier = OperatorService.createOperator("ModelApplier");

//Create a new SingleTextInput, for processing test Strings
wvtoolOperator = (OperatorChain) OperatorService
.createOperator(SingleTextInput.class);

// Add additional processing steps.
// Note the setup must be same as the one you used when creating the classification model
wvtoolOperator.addOperator(OperatorService
.createOperator("StringTokenizer"));
wvtoolOperator.addOperator(OperatorService
.createOperator("EnglishStopwordFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("TokenLengthFilter"));
wvtoolOperator.addOperator(OperatorService
.createOperator("PorterStemmer"));

}

public String classify(String text) {
try{

// Set the text
wvtoolOperator.setParameter("text", text);

// Call the text input operator
IOContainer container = wvtoolOperator.apply(new IOContainer());

container = container.append(model);
// Call the model applier (the model was added already before calling the text input)
container = modelApplier.apply(container);

// Obtain the example set from the io container. It contains only a single example with our text in it.
ExampleSet eset = container.get(ExampleSet.class);
Example e = eset.iterator().next();

//This does the same thing as what two lines later happens...
//return e.getValueAsString(eset.getAttributes().getPredictedLabel()));

int predLabelIndex = (int) e.getPredictedLabel();
return e.getAttributes().getPredictedLabel().getMapping().mapIndex(predLabelIndex);
} catch (Exception ex) {
//...
}
}

This works whether I set or not set parameters in //Should we set parameters here?
setting them is done there this way:

((Operator)learner).setParameter(LibSVMLearner.PARAMETER_SVM_TYPE, new Integer(LibSVMLearner.SVM_TYPE_C_SVC).toString());
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_KERNEL_TYPE, "0");//linear
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_EPSILON, "0.001");
//((Operator)learner).setParameter(LibSVMLearner.PARAMETER_C, "0.0");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_P, "0.1");
((Operator)learner).setParameter(LibSVMLearner.PARAMETER_CONFIDENCE_FOR_MULTICLASS, "true");

I am probably overlooking something simple but I'm completely out of ideas, I have looked around a lot and tried many approaches.

Thanks a lot,
Nimrod.

Find more posts tagged with