A strange result by repeating invoking the apply() method
Hi,
I am building a text classifier,
Because nothing is changed about the data and the learner, the results should be the same, in my opinion.
Sincerely yours,
gfyang
I am building a text classifier,
xValidation.apply(container) is invoked 3 times, giving 3 completely different results. WHY?
// build the text input
OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
List<String[]> para = new ArrayList<String[]>();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");
Operator stringTokenizer = OperatorService.createOperator("StringTokenizer");
Operator stopWord = OperatorService.createOperator("EnglishStopwordFilter");
Operator tokenLen = OperatorService.createOperator("TokenLengthFilter");
tokenLen.setParameter("min_chars", "3");
Operator stemmer = OperatorService.createOperator("PorterStemmer");
Operator gramGenerator = OperatorService.createOperator("TermNGramGenerator");
textInput.addOperator(stringTokenizer);
textInput.addOperator(stopWord);
textInput.addOperator(tokenLen);
textInput.addOperator(stemmer);
textInput.addOperator(gramGenerator);
// build the validation
OperatorChain xValidation = (OperatorChain) OperatorService.createOperator("XValidation");
OperatorChain applierChain = (OperatorChain) OperatorService.createOperator("OperatorChain");
xValidation.setParameter("keep_example_set", "true");
Operator naiveBayes = OperatorService.createOperator("KernelNaiveBayes");
Operator modelApplier = OperatorService.createOperator("ModelApplier");
Operator performance = OperatorService.createOperator("ClassificationPerformance");
performance.setParameter("accuracy", "true");
applierChain.addOperator(modelApplier);
applierChain.addOperator(performance);
xValidation.addOperator(naiveBayes);
xValidation.addOperator(applierChain);
// start applying
IOContainer container = textInput.apply(new IOContainer());
container = xValidation.apply(container);
PerformanceVector pv = container.get(PerformanceVector.class);
double precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.89
container = xValidation.apply(container);
pv = container.get(PerformanceVector.class);
precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.86
container = xValidation.apply(container);
pv = container.get(PerformanceVector.class);
precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.90
Because nothing is changed about the data and the learner, the results should be the same, in my opinion.
Sincerely yours,
gfyang
Find more posts tagged with
Sort by:
1 - 3 of
31
this should only the case if you set at all operators to use a local random seed. Otherwise they will use the same continuous stream of random numbers and hence will have different results. For example the XValidation then splits the data set in different sets.
Greetings,
Sebastian