imbalanced data

m_r_nour
m_r_nour New Altair Community Member
edited November 2024 in Community Q&A

hi


how can I solve imbalanced problem?

I used adaboostm1 weka, but it doesn't work at all,
1. I used sampling method to balance data and performance developed but as far as I know it should be by far better methods to solve imbalanced problem.
2. Moreover, libsvm can be used in weighted mode, but I do not know how use it and tune libsvm parameters like cost weight ,....
and how it can be used in metacost?




I'd appreciate if you help me





Regards
REZA
Tagged:

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • land
    land New Altair Community Member
    Hi Reza,
    just put the LibSVM learner inside the MetaCost learner. The metaCost learner is a so called Meta Learner using another, inner learning scheme.
    This time you could have simply read the manual. This is why haddock repeats it so many times. And although this IS a help forum, other people have to spend their time for giving you hints. So I think it's fair that you made your best efforts to cope with the problem yourself. And this always should include the (admittedly spare) documentation.

    Greetings,
      Sebastian
  • m_r_nour
    m_r_nour New Altair Community Member
    Hi

    thanks Sebastian



    my code is :
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter" breakpoints="after">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="100"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="AverageBuilder" class="AverageBuilder">
            </operator>
            <operator name="Performance (3)" class="Performance">
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Learner" value="operator.Classifier.parameter.select_which"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                  <parameter key="Feature Selection On|Off" value="operator.FS SAM.parameter.enable"/>
                  <parameter key="Pro_meta merging" value="operator.Class merging.parameter.enable"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>

    but it halts by "process failed " massage and I donot know why

    I'd appreciate if help me about this matter

    Regards
    REZA
  • haddock
    haddock New Altair Community Member
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource" activated="no">
            <parameter key="filename" value="data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter" breakpoints="after" activated="no">
            <description text="This will not work - we've alreeady discussed why - wake up"/>
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="10"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="AverageBuilder" class="AverageBuilder" activated="no">
            </operator>
            <operator name="Performance (3)" class="Performance" activated="no">
                <description text="Why on earth is this here? Disable it."/>
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
  • m_r_nour
    m_r_nour New Altair Community Member
    hi

    but it doesn't work, it seems you just disabled averagebuilder  and performance of it, I do it in my data, but again process failed message . however thanks for your time and consideration
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="I:\WORK\ver5\RM\DATA\data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <description text="This will not work - we've alreeady discussed why - wake up"/>
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="10"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
  • haddock
    haddock New Altair Community Member
    It really helps if you include the error message. This code works on generated examples on my machine right now. I disabled the average builder and final performance operator because that was causing the failure on the original code, as detailed in your original post.

  • m_r_nour
    m_r_nour New Altair Community Member
    G Nov 28, 2009 5:50:44 PM: [Fatal] ArrayIndexOutOfBoundsException occured in 1st application of ModelApplier (2) (ModelApplier)
    G Nov 28, 2009 5:50:44 PM: [Fatal] Process failed: operator cannot be executed (3). Check the log messages...
              Root[1] (Process)
              +- CSVExampleSource[1] (CSVExampleSource)
              +- ExampleFilter[1] (ExampleFilter)
              +- Normalization[1] (Normalization)
              +- Random Optimizer[1] (RandomOptimizer)
                +- Xvalidation[1] (XValidation)
                |  +- MetaCost[1] (MetaCost)
                |  |  +- LibSVMLearner[10] (LibSVMLearner)
                |  +- OperatorChain (2)[1] (OperatorChain)
    here ==>   |    +- ModelApplier (2)[1] (ModelApplier)
                |    +- Performance (2)[0] (Performance)
                +- ProcessLog[0] (ProcessLog)
  • haddock
    haddock New Altair Community Member
    Must be the data then, as the generator version works. My guess is that your filter doesn't work as you think, because you use '||'  where you should use '|'.
  • m_r_nour
    m_r_nour New Altair Community Member
    Hi Haddock


    no, even I changed the code to as you think, but again same problem

    but:

    number of classes are 7 and by filter I reduced them to 3, but classifier apply to data as if it has 7 group because when I changed size of matrix 7
    it works.?

    so.... how can ....?


    thanks for your time

    regards
    REZA

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.