imbalanced data
m_r_nour
New Altair Community Member
hi
how can I solve imbalanced problem?
I used adaboostm1 weka, but it doesn't work at all,
1. I used sampling method to balance data and performance developed but as far as I know it should be by far better methods to solve imbalanced problem.
2. Moreover, libsvm can be used in weighted mode, but I do not know how use it and tune libsvm parameters like cost weight ,....
and how it can be used in metacost?
I'd appreciate if you help me
Regards
REZA
Tagged:
0
Answers
-
Hi Reza,
just put the LibSVM learner inside the MetaCost learner. The metaCost learner is a so called Meta Learner using another, inner learning scheme.
This time you could have simply read the manual. This is why haddock repeats it so many times. And although this IS a help forum, other people have to spend their time for giving you hints. So I think it's fair that you made your best efforts to cope with the problem yourself. And this always should include the (admittedly spare) documentation.
Greetings,
Sebastian0 -
Hi
thanks Sebastian
my code is :<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="data_ver5_all.csv"/>
<parameter key="label_name" value="CellCycle"/>
</operator>
<operator name="ExampleFilter" class="ExampleFilter" breakpoints="after">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
</operator>
<operator name="Normalization" class="Normalization">
</operator>
<operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
<parameter key="iterations" value="100"/>
<operator name="Xvalidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="MetaCost" class="MetaCost" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<list key="class_weights">
</list>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
<operator name="Performance (2)" class="Performance">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
<operator name="AverageBuilder" class="AverageBuilder">
</operator>
<operator name="Performance (3)" class="Performance">
<parameter key="keep_example_set" value="true"/>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<parameter key="filename" value="output_%{a}.log"/>
<list key="log">
<parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
<parameter key="Learner" value="operator.Classifier.parameter.select_which"/>
<parameter key="Performance" value="operator.Xvalidation.value.performance"/>
<parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
<parameter key="Feature Selection On|Off" value="operator.FS SAM.parameter.enable"/>
<parameter key="Pro_meta merging" value="operator.Class merging.parameter.enable"/>
</list>
<parameter key="sorting_dimension" value="3"/>
</operator>
</operator>
</operator>
but it halts by "process failed " massage and I donot know why
I'd appreciate if help me about this matter
Regards
REZA0 -
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource" activated="no">
<parameter key="filename" value="data_ver5_all.csv"/>
<parameter key="label_name" value="CellCycle"/>
</operator>
<operator name="ExampleFilter" class="ExampleFilter" breakpoints="after" activated="no">
<description text="This will not work - we've alreeady discussed why - wake up"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
</operator>
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="simple non linear classification"/>
</operator>
<operator name="Normalization" class="Normalization">
</operator>
<operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
<parameter key="iterations" value="10"/>
<operator name="Xvalidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="MetaCost" class="MetaCost" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<list key="class_weights">
</list>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
<operator name="Performance (2)" class="Performance">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
<operator name="AverageBuilder" class="AverageBuilder" activated="no">
</operator>
<operator name="Performance (3)" class="Performance" activated="no">
<description text="Why on earth is this here? Disable it."/>
<parameter key="keep_example_set" value="true"/>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<parameter key="filename" value="output_%{a}.log"/>
<list key="log">
<parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
<parameter key="Performance" value="operator.Xvalidation.value.performance"/>
<parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
</list>
<parameter key="sorting_dimension" value="3"/>
</operator>
</operator>
</operator>0 -
hi
but it doesn't work, it seems you just disabled averagebuilder and performance of it, I do it in my data, but again process failed message . however thanks for your time and consideration<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="I:\WORK\ver5\RM\DATA\data_ver5_all.csv"/>
<parameter key="label_name" value="CellCycle"/>
</operator>
<operator name="ExampleFilter" class="ExampleFilter">
<description text="This will not work - we've alreeady discussed why - wake up"/>
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
</operator>
<operator name="Normalization" class="Normalization">
</operator>
<operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
<parameter key="iterations" value="10"/>
<operator name="Xvalidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="MetaCost" class="MetaCost" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<list key="class_weights">
</list>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
</operator>
<operator name="Performance (2)" class="Performance">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<parameter key="filename" value="output_%{a}.log"/>
<list key="log">
<parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
<parameter key="Performance" value="operator.Xvalidation.value.performance"/>
<parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
</list>
<parameter key="sorting_dimension" value="3"/>
</operator>
</operator>
</operator>0 -
It really helps if you include the error message. This code works on generated examples on my machine right now. I disabled the average builder and final performance operator because that was causing the failure on the original code, as detailed in your original post.
0 -
G Nov 28, 2009 5:50:44 PM: [Fatal] ArrayIndexOutOfBoundsException occured in 1st application of ModelApplier (2) (ModelApplier)
G Nov 28, 2009 5:50:44 PM: [Fatal] Process failed: operator cannot be executed (3). Check the log messages...
Root[1] (Process)
+- CSVExampleSource[1] (CSVExampleSource)
+- ExampleFilter[1] (ExampleFilter)
+- Normalization[1] (Normalization)
+- Random Optimizer[1] (RandomOptimizer)
+- Xvalidation[1] (XValidation)
| +- MetaCost[1] (MetaCost)
| | +- LibSVMLearner[10] (LibSVMLearner)
| +- OperatorChain (2)[1] (OperatorChain)
here ==> | +- ModelApplier (2)[1] (ModelApplier)
| +- Performance (2)[0] (Performance)
+- ProcessLog[0] (ProcessLog)
0 -
Must be the data then, as the generator version works. My guess is that your filter doesn't work as you think, because you use '||' where you should use '|'.0
-
Hi Haddock
no, even I changed the code to as you think, but again same problem
but:
number of classes are 7 and by filter I reduced them to 3, but classifier apply to data as if it has 7 group because when I changed size of matrix 7
it works.?
so.... how can ....?
thanks for your time
regards
REZA
0